This proposal has been approved and the Eclipse DataEggs project has been created.
Visit the project page for the latest information and development.

Eclipse DataEggs

Friday, March 26, 2021 - 05:54 by Boris Baldassari
This proposal is in the Project Proposal Phase (as defined in the Eclipse Development Process) and is written to declare its intent and scope. We solicit additional participation and input from the community. Please login and add your feedback in the comments section.
Parent Project
Proposal State
Created
Background

This data-oriented project originates from the EU-funded Crossminer project. As the Eclipse DataEggs grew in size and maturity, with specific audiences and needs from the whole community arising, we decided to create a new project solely dedicated to the availability and disposal of this resource in order to continue providing this service for the Eclipse and research communities. The website presenting the datasets is already working (and continuously updated) and is available on the Scava download page.

Scope

Eclipse DataEggs provides open, anonymised, up-to-date and ready-to-use datasets related to development of Eclipse projects. It includes the following types of data:

Currently, there are 21 projects that have been analysed using this tool. More could be added upon projects' request.

Description

The datasets provided by this project can already be explored at https://download.eclipse.org/scava/ .

Privacy has been a major concern from the beginning, see our documentation for more details.

Why Here?

Although the analysis engine itself is (almost) forge-agnostic, the datasets provided in this project are exclusively related to the Eclipse forge.

Project Scheduling

Code is ready and builds are already running weekly. Everything is deployed to https://download.eclipse.org/scava/projects/ on sundays, around 4am.

It should be noted that the builds are run on our own server (http://ci4.castalia.camp:8080) since it is quite resource-intensive.

Project Leads
Committers
Boris Baldassari (This committer does not have an Eclipse Account)
Mentors
Interested Parties

Eclipse Foundation.

Project developpers and end-users.

Research Labs (see previous requests to access Eclipse forge datasets).

Initial Contribution

All code is already stored at the Eclipse Foundation since it was written for Eclipse Scava. It has been moved recently from Eclipse git repositories to the new GitLab infrastructure. It can be found at https://gitlab.eclipse.org/bbaldassari2kd/scava-datasets .

All code has been written by me (Boris Baldassari) under the usual ECA, and is licenced under the EPL v2.

Source Repository Type