Open-source software (OSS) is very often developed in a public, collaborative, and loosely-coordinated manner. This has several implications to the level of quality of diﬀerent OSS software as well as to the level of support that diﬀerent OSS communities provide to users of the software they produce. On the one hand, there are several high-quality and mature OSS projects which deliver stable and well-documented products. Such projects typically also foster a vibrant expert and user community which provides remarkable levels of support both in answering user questions and in repairing reported defects (bugs) in the provided software. On the other hand, there is also a substantial number of OSS projects which are dysfunctional in one or more of the following ways:
- The development team behind the OSS project invests little time on its development, maintenance and support;
- The development of the project has been altogether discontinued due to lack of commitment or motivation;
- The documentation of the produced software is limited and/or of poor quality;
- The source code contains little or low-quality comments which make studying and maintaining it challenging;
- The community around the project is limited, and as such, questions asked by users receive late/no response and identiﬁed defects either get repaired very slowly or are altogether ignored.
Consequently, developing new software systems by reusing existing open source components raises relevant challenges related to at least the following activities:
- searching for candidate components;
- evaluating a set of retrieved candidate components to ﬁnd the most suitable one;
- understanding how to use the selected components;
- adapting the selected components to ﬁt the speciﬁc requirements.
Eclipse CROSSMETER collects data from open-source repositories (code version management systems, issue trackers, continuous integration systems and discussion forums in natural language). This data is analyzed qualitatively and then stored in a knowledge base. The knowledge base is used to query for specific answers when the programmer is confronted with (a) a design decision (b) a code or design smell. CROSSMETER extends the Eclipse IDE for several languages, at least the Java language at first, with new IDE interactions for code suggestions and problem detection.
To this end, the following tools are in scope:
- source code analysis tools to extract and store actionable knowledge from the source code of a collection of open-source projects
- natural language analysis tools to extract quality metrics related to the communication channels, and bug tracking systems of OSS projects by using Natural Language Processing and text mining techniques;
- system conﬁguration analysis tools to gather and analyse system conﬁguration artefacts and data to provide an integrated DevOps-level view of a considered open source project;
- workﬂow-based knowledge extractors to simplify the development of bespoke analysis and knowledge extraction tools by contributing a framework that will shield engineers from technological issues and allow them to concentrate on the core analysis tasks instead;
- cross-project relationship analysis tools to specify and manage in a homogeneous manner a wider range of open source project relationships, such as dependencies and conﬂicts. The outcomes of the diﬀerent CROSSMETER analysis tools will contribute the deﬁnition of a knowledge base supporting multidimensional classiﬁcations of projects and disclosing a number of applications such as automated identiﬁcation of complementary and competing projects, project incompatibilities and prediction of the future of given projects based on the evolution of other projects that had similar characteristics in the past.
- extensions for the Eclipse IDE that will allow developers to adopt the CROSSMETER knowledge base and analysis tools directly from the development environment. The IDE extensions will also include features for monitoring the developer activity while they work on a given OSS project. Thus the IDE will issue alerts or recommendations and collect user feedback which will help developers to improve their productivity. Depending on the context, recommendations can include suggested code snippets, patterns, automatic ﬁx to coding issues, suggestions to use alternative APIs or components, etc.
Software engineers spend most of their time learning to understand the software they maintain or depend on (or will depend on). The goal of this learning process is to support decision-making. In this project, we focus on the increasing dependence on open-source software (OSS) over the last years and the decisions related to depending on open-source software. Eclipse CROSSMETER will support the eﬃcient and eﬀective decision-making regarding dependence on OSS projects and components thereof. This entails both decisions on the architecture level (to decide to select and OSS project) and on the code level (to design the use of the OSS project). In particular, CROSSMETER will provide techniques and tools for extracting knowledge from existing open source components, and use such knowledge to properly select and reuse existing software to develop new systems. The activity of the developer will be continuously monitored in order to raise alerts related to the quality of the selected OSS projects and to give suggestions that can reduce the development eﬀort and increase the quality of the ﬁnal software products.
Figure 1 shows a high-level overview of the CROSSMETER approach. It sports two major use cases and two minor user channels which are implemented using two architectural stages: online and oﬄine. The common use case features software engineers using their normal IDE, which is enhanced with decision supporting information mined from OSS projects. The second use case is an advanced tool engineer developing bespoke analysis workﬂows which can make use of already available and mined data. Next, to these two major IDE use cases, Figure 1 also features the release of the mined information via two online channels. The ﬁrst is a normal analytics dashboard via the Web to disclose mined information to other stakeholders next to the software engineers (such as project managers). The second online channel is the GitHub API to which we will push information rather than pull it. OSS projects on GitHub can be tagged with useful information (e.g., number succeeded builds and tests by a continuous integration toolkit). CROSSMETER will publish the results of mining qualitative and quantitative information as GitHub project tags as well.
We describe the two major use cases here in some detail to clarify what CROSSMETER as a whole entails. We ﬁrst explain the exceptional case of tool engineers extending the platform, and then the normal case of a software engineering using the CROSSMETER enabled IDE:
- In step 1 the tool engineers of Use case II use a special (graphical) editor in their IDE to compose new workﬂows of data sources and computations. This functionality is commonly available in big data analytics suites; here we specialize this functionality for typical OSS project analysis tasks. This leads to the installation of a new bespoke analysis to the set of existing mining and analysis tools ( 2 ).
- mining tools will run incrementally in step ( 2 ), and possibly on a remote server, to extract relevant information from a pre-conﬁgured set of projects and a list of projects conﬁgured by the software engineers of Use case I.
- The software engineers of Use case I have a wizard to conﬁgure CROSSMETER with a rich set of requirements (step 3 ), which includes not only registering a set of projects of interest but also expressing preferences regarding the algorithms and processed used to project the mined information into the IDE. This conﬁguration is an important step to make meaningful assessment possible later since it makes the context and preferences of the engineer explicit to the platform in terms of technological, quality, conﬁguration, and licensing aspects.
- Finally step 4 is when the acquired information is put into action, actively supporting the engineers via the IDE, managers via the website, and the open-source community via GitHub integration.
Figure 1: CROSSMETER approach at a glance
- Code assist - propose relevant code snippets, ranked by relevance and quality and informed by the earlier conﬁguration;
- Infer/Fix project setup - retrieve a list of ranked relevant reusable components, then set up relevant projects in the IDE and conﬁgure dependent projects to use them;
- Monitoring of development activities of the engineers, who will be notiﬁed of relevant facts pertaining to their current task context.
In short, CROSSMETER analyzes OSS projects oﬄine, in the background, and employs the mined information to support engineers online, directly with their tasks of decision-making through otherwise unobtrusive IDE features which are highly conﬁgurable and extensible.
Eclipse Foundation Europe GmbH is one of the consortium members of the EU H2020 CROSSMINER project (https://www.crossminer.org/). Together with the Eclipse representatives in the project, it will be investigated how to apply the CROSSMINER analysis techniques on the Eclipse projects and repositories. Moreover, hosting the CROSSMETER project as an Eclipse project will help to make the project more sustainable especially once the EU funding ends. In particular, we believe that having CROSSMETER as an Eclipse project as soon as possible will contribute to creating a community of (Eclipse) users, that in turn will benefit the adoption of the envisioned technologies and their novelties.
No legal issues. Nobody owns the trademark to the project name and all the different components are under a supported license.
M12: Eclipse CROSSMETER platform and methodology - initial version
Description: Interim versions of the dependency inference and analysis component, of the text representation system, and of the workﬂow modeling component have been delivered. Initial versions of the Eclipse-based IDE, of the Web-based dashboards and of the tool for the unsupervised classiﬁcation of OSS projects have also been developed.
M18: Eclipse CROSSMETER platform and methodology - interim version
Description: Initial versions of the tools for analysing natural language sources, knowledge base, and the dependency inference and analysis component have been delivered. Interim versions of the developer activity monitoring, IDE integration services, system conﬁguration analysis tools, and of the CROSSMETER platform have been delivered. Interim versions of the workﬂow development tool and of the engine for supporting parallel and distributed workﬂows have also been developed.
M24: Eclipse CROSSMETER platform and methodology - second interim version
Description: Final version of the dependency inference and analysis component, the text representation system, tools for analysing natural language sources and for the unsupervised classiﬁcation of OSS projects have been developed. Interim versions of all use-case demonstrators have been developed and the respective evaluation reports have been delivered.
M30: Eclipse CROSSMETER platform and methodology - second version
Description: The API analysis component, the tool for mining documentation and code snippets, the DevOps Dashboard, and the integration with GitHub have been developed. Final versions of the workﬂow development tools, the workﬂow execution engine, Eclipse-based IDE, Web-based dashboards, and of the knowledge base have been delivered.
Future works include any facilities related to the scope of the project, including, but not limited to:
- Source code analysis
- Configuration analysis
- Natural language processing techniques applied on project communication channels