ChemClipse

Thursday, December 4, 2014 - 07:57 by Philip Wenig

Basics

This proposal is in the Project Proposal Phase (as defined in the Eclipse Development Process) and is written to declare its intent and scope. We solicit additional participation and input from the community. Please login and add your feedback in the comments section.

Project

Eclipse ChemClipse

Parent Project

Eclipse Technology

Proposal State

Created

Background

P { margin-bottom: 0.21cm; }A.cjk:link { }A.ctl:link { }

Software has become an important part for the evaluation of scientific data. Furthermore, open source software plays a vital role due to its support for collaborative work. Since years, several software projects appeared to handle problems of specific scope. The scope ranges from biology, geology, bioinformatics, physics, linguistics, epidemiology and others. Moreover, several applications are focused to handle chemical issues but only a few took the challenge to create tools to handle data from analytical instruments. Among these are systems for chromatography and mass spectrometry, most commonly used in analytical chemistry. Both techniques are used in combination e.g. for forensics, quality control or medical research. A bottleneck for scientific discoveries is the availability of several different instruments. Each instrument vendor offers its own software package and data format. That makes it hard or in most cases impossible to evaluate data sets independently from the vendor software. Moreover, it prevents finding new insights from existing data records as well as it prevents to handle the measured data in a unique way.

Scope

P { margin-bottom: 0.21cm; }A.cjk:link { }A.ctl:link { }

Chromatography and mass spectrometry are both key technologies used in almost any field of analytical chemistry, for example quality control or forensic science. ChemClipse addresses issues to manage data sets from chromatography and mass spectrometry systems. ChemClipse does not offer any specific instrument control capabilities, but rather provides supporting software to import the recorded raw data, optimize and run evaluations, and export the results into various formats. It adds a new functionality to the hardware due to its flexible and modular approach. Moreover, it offers a rich graphical user interface to facilitate the most intuitive data evaluation possible for scientists in these fields.

Description

P { margin-bottom: 0.21cm; }A.cjk:link { }A.ctl:link { }

ChemClipse supports the user to analyse data acquired from systems used in analytical chemistry. In particular, chromatography coupled with mass spectrometry (GC/MS) or flame-ionization detectors (GC/FID) is used to identify and/or monitor chemical substances. It's an important task e.g. for quality control issues. Groceries, for example, are under strict control. Producers, traders and retailers try to prevent that groceries contain harmful chemical substances. The presence or absence of those chemical substances is identified, among others, by using GC/MS or GC/FID techniques. Nevertheless, it requires some experience to evaluate the data sets, recorded by the instruments. Hence, ChemClipse supports the chemists to evaluate the analytical data sets and to create reports. Moreover, it offers a rich set of functionality to edit the data sets as well as an easy to use GUI. ChemClipse is based on the Eclipse Application Platform. It currently utilizes Eclipse 4.5M3 in mixed mode and is build using Maven/Tycho 0.21.0. Its main functionality is listed as follows:

Converter (import and/or export of raw data sets)
Classifier (non-destructive methods to extract characteristic values)
Filter (destructive methods to optimize the data sets)
Peak detection (finding peaks – each peak is a chemical substance)
Chromatogram/Peak integration (calculation of the chromatogram/peak area)
Identification (identification of each peak mass spectrum)
Quantitation (use the data for calibration issues)
Reporting (report the results for further analytical steps)
Processing (automation of the data handling)

Due to its flexible approach, each functionality can be extended by plugins. For this, ChemClipse makes use of the Eclipse extension point mechanism. Therefore, it is best suited for scientists, students and interested persons to write their own extensions. The data model has been well designed, hence it should be no problem to focus on the necessary methods that are needed to write an own extension. Moreover, its graphical user interface can be extended by additional UI parts as well.

Licenses

Eclipse Public License 1.0

Legal Issues

TH P { margin-bottom: 0cm; }TD P { margin-bottom: 0cm; }P { margin-bottom: 0.21cm; }A.cjk:link { }A.ctl:link { }

There are no obvious legal issues with the code in the opinion of the author. The software has been designed and developed by the author. It includes and depends on open source libraries outside of the Eclipse ecosystem. It contains no code licensed under the GPL, LGPL or AGPL.

The following dependencies outside the Eclipse ecosystem are used:

Package

Version

License

Website

Use

SWTChart

0.9.0

EPL v1.0

http://www.swtchart.org

Charts

SWTXYGraph

2.0.1

EPL v1.0

http://code.google.com/p/swt-xy-graph

Heatmap

OrientDB

2.0.5

Apache v2.0

http://www.orientechnologies.com

Commons Math

3.3.0

Apache v2.0

http://commons.apache.org/math

Math

JNA

4.0.0

Apache v2.0

https://github.com/twall/jna

OrientDB Dependency

Concurrent Linked HashMap

1.4.0

Apache v2.0

https://code.google.com/p/concurrentlinkedhashmap

OrientDB Dependency

EJML

0.24.0

Apache v2.0

https://code.google.com/p/efficient-java-matrix-library

PCA

NetCDF

4.3.19

MIT

http://www.unidata.ucar.edu/downloads/netcdf/netcdf-java-4

NetCDF

Why Here?

P { margin-bottom: 0.21cm; }A.cjk:link { }A.ctl:link { }

The Eclipse Foundation is the right place to collaborate for ChemClipse because of the Science Working Group. The Eclipse Foundation in general and the Science Working Group in particular offers great opportunities to collaborate with other projects and to find new ways for the data evaluation. Moreover, the recombination of software from different scientific scopes offers chances to make serendipitous discoveries.

Future Work

P { margin-bottom: 0.21cm; }A.cjk:link { }A.ctl:link { }

Implementation of:

1. Re-implementation of mzXML, mzML and mzData converters

2. Improvements of the data handling for Triple-Quad and QTOF instruments

3. Improvements for the data handling of high-resolution mass spectrometry systems

4. Improvements of the data handling for flame-ionization data (FID)

5. Support for new detector types like diode-array detectors (DAD)

6. Performance improvements

Project Scheduling

P { margin-bottom: 0.21cm; }A.cjk:link { }A.ctl:link { }

The initial contribution will be made in quarter one of 2015 and the first release will happen by the end of quarter two of 2015.