Skip to main content
  • Log in
  • Manage Cookies
projects.eclipse.org
Download
  • Projects
  • Working Groups
  • Members
  • Community
    • Marketplace
    • Events
    • Planet Eclipse
    • Newsletter
    • Videos
    • Blogs
  • Participate
    • Report a Bug
    • Forums
    • Mailing Lists
    • Wiki
    • IRC
    • Research
  • Eclipse IDE
    • Download
    • Learn More
    • Documentation
    • Getting Started / Support
    • How to Contribute
    • IDE and Tools
    • Newcomer Forum
  • More
      • Community

      • Marketplace
      • Events
      • Planet Eclipse
      • Newsletter
      • Videos
      • Blogs
      • Participate

      • Report a Bug
      • Forums
      • Mailing Lists
      • Wiki
      • IRC
      • Research
      • Eclipse IDE

      • Download
      • Learn More
      • Documentation
      • Getting Started / Support
      • How to Contribute
      • IDE and Tools
      • Newcomer Forum
  1. Home
  2. Projects
  3. Eclipse Technology
  4. Eclipse DataEggs
  5. Eclipse DataEggs
×

Informative message

This proposal has been approved and the Eclipse DataEggs project has been created.
Visit the project page for the latest information and development.

Go to Project

Eclipse DataEggs

Basics
This proposal is in the Project Proposal Phase (as defined in the Eclipse Development Process) and is written to declare its intent and scope. We solicit additional participation and input from the community. Please login and add your feedback in the comments section.
Parent Project: 
Eclipse Technology
Background: 

This data-oriented project originates from the EU-funded Crossminer project. As the Eclipse DataEggs grew in size and maturity, with specific audiences and needs from the whole community arising, we decided to create a new project solely dedicated to the availability and disposal of this resource in order to continue providing this service for the Eclipse and research communities. The website presenting the datasets is already working (and continuously updated) and is available on the Scava download page.

Scope: 

Eclipse DataEggs provides open, anonymised, up-to-date and ready-to-use datasets related to development of Eclipse projects. It includes the following types of data:

  • Mailing lists (full mboxes and csv extracts) hosted at the Eclipse forge.
  • AERI exception stacktraces (not updated anymore, historical data only).
  • Development data from Eclipse projects.

Currently, there are 21 projects that have been analysed using this tool. More could be added upon projects' request.

Description: 

The datasets provided by this project can already be explored at https://download.eclipse.org/scava/ .

  • Mailing lists (full mboxes and csv extracts) hosted at the Eclipse forge with their documentation and examples.
  • AERI exception stacktraces (not updated anymore, historical data only) includes 2 datasets: problems (see documentation) and incidents (see documentation).
  • Development data from Eclipse projects. Depending on data sources, the following information is provided:
    • SCM (git).
    • ITS (Bugzilla, GitHub issues, GitLab issues).
    • CI (Jenkins).
    • PMI checks.
    • Stack Overflow statistics.
    • Scancode analysis (executed on our server).

Privacy has been a major concern from the beginning, see our documentation for more details.

Why Here?: 

Although the analysis engine itself is (almost) forge-agnostic, the datasets provided in this project are exclusively related to the Eclipse forge.

Licenses: 
Eclipse Public License 2.0
Legal Issues: 

All code in the GitLab repository has been written by me, under the EPL v2. Project data is fetched from an Alambic instance (hosted on our server) and as such is not impacted by license constraints -- although Alambic itself is licensed under EPL, too.

Project Scheduling: 

Code is ready and builds are already running weekly. Everything is deployed to https://download.eclipse.org/scava/projects/ on sundays, around 4am.

It should be noted that the builds are run on our own server (http://ci4.castalia.camp:8080) since it is quite resource-intensive.

People
Project Leads: 
Boris Baldassari
Committers: 
Boris Baldassari
Mentors: 
Wayne Beaton
Interested Parties: 

Eclipse Foundation.

Project developpers and end-users.

Research Labs (see previous requests to access Eclipse forge datasets).

Source Code
Initial Contribution: 

All code is already stored at the Eclipse Foundation since it was written for Eclipse Scava. It has been moved recently from Eclipse git repositories to the new GitLab infrastructure. It can be found at https://gitlab.eclipse.org/bbaldassari2kd/scava-datasets .

All code has been written by me (Boris Baldassari) under the usual ECA, and is licenced under the EPL v2.

Source Repository Type: 
GitLab
Source Repositories: 
https://gitlab.eclipse.org/bbaldassari2kd/scava-datasets
  • Sign in to post comments.
Incubating - Eclipse DataEggs

Project Links

  • Website

Related Projects

Project Hierarchy:

  • Eclipse Technology
  • Eclipse DataEggs

Tags

Technology Types
  • Tools
Build Technologies
  • Jenkins

Eclipse Foundation

  • About Us
  • Contact Us
  • Donate
  • Members
  • Governance
  • Code of Conduct
  • Logo and Artwork
  • Board of Directors

Legal

  • Privacy Policy
  • Terms of Use
  • Copyright Agent
  • Eclipse Public License
  • Legal Resources

Useful Links

  • Report a Bug
  • Documentation
  • How to Contribute
  • Mailing Lists
  • Forums
  • Marketplace

Other

  • IDE and Tools
  • Projects
  • Working Groups
  • Research@Eclipse
  • Report a Vulnerability
  • Service Status

Copyright © Eclipse Foundation. All Rights Reserved.

Back to the top