Triquetrum

Thursday, May 28, 2015 - 08:37 by Erwin De Ley

Basics

This proposal is in the Project Proposal Phase (as defined in the Eclipse Development Process) and is written to declare its intent and scope. We solicit additional participation and input from the community. Please login and add your feedback in the comments section.

Project

Eclipse Triquetrum

Parent Project

Eclipse Technology

Proposal State

Created

Background

We have been developing Passerelle at eclipselabs@Google (see https://code.google.com/a/eclipselabs.org/p/passerelle/) for many years, as a specialization of the Ptolemy II actor-oriented modeling and simulation framework. It uses Ptolemy as a workflow execution engine and offers a basic GEF-based graphical model editor. Passerelle has been applied in open-source tools for scientific workflows and has also been integrated in iSencia's commercial Passerelle EDM product.

Recently we've agreed with the Ptolemy team at UC Berkely to collaborate more closely on an evolution of Ptolemy II towards the world of OSGi and RCP. This would also include a refactoring of the existing Passerelle code-base, merging some parts into Ptolemy II and making sure that the RCP editor is no longer tied to Passerelle, but becomes generically useable on Ptolemy II.

Besides an RCP editor/workbench, there are also requirements & initial solution components for headless runtimes, ad-hoc task-based processing and integration with external resource managers and data analysis packages.

There are already several scientific workflow systems available, but many are specific to particular research domains. We believe that the combination of Eclipse/OSGi with Ptolemy's architecture for hierarchical and heterogeneous actor-based modeling, delivers a solid platform for a wide range of workflow applications.

The Eclipse Science IWG is an ideal community for such work.

And as eclipselabs@Google is closing down, we believe the time is right to take the step to a "real" Eclipse project.

Scope

The project is structured on three lines of work :

A Ptolemy II RCP model editor and execution runtime, taking advantage of Ptolemy's features for heterogeneous and hierarchical models.

The runtime must be easy to integrate in different environments, ranging from a personal RCP workbench to large-scale distributed systems.

To that end we will deliver supporting APIs for local & remote executions, including support for debugging/breakpoints etc.

The platform and RCP editor must be extensible with domain-specific components and modules.

We will also deliver APIs to facilitate development of extensions, building on the features provided by Ptolemy and OSGi.
APIs and OSGi service impls for Task-based processing. This would be a "layer" that can be used independently of Ptolemy, e.g. by other workflow/orchestration/sequencing software or even ad-hoc systems, interactive UIs etc.
Supporting APIs and tools, e.g. integration adapters to all kinds of things like external software packages, resource managers, data sources etc.

"Vanilla" packages will be delivered that can be used for general Ptolemy modeling work.

Triquetrum will also deliver extensions, with a focus on scientific software. There is no a-priori limitation on target scientific domains, but the current interested organizations are big research institutions in materials research (synchrotrons), physics and engineering.

Description

Triquetrum delivers an open platform for managing and executing scientific workflows. The goal of Triquetrum is to support a wide range of use cases, ranging from automated processes based on predefined models, to replaying ad-hoc research workflows recorded from a user's actions in a scientific workbench UI. It will allow to define and execute models from personal pipelines with a few steps to massive models with thousands of elements.

Besides delivering a generic workflow environment, Triquetrum also deliverd extensions with a focus on scientific software. There is no a-priori limitation on target scientific domains, but the current interested organizations are big research institutions in materials research (synchrotrons), physics and engineering.

The integration of a workflow system in a platform for scientific software can bring many benefits :

the steps in scientific processes are made explicitly visible in the workflow models (i.o. being hidden inside program code).

Such models can serve as a means to present, discuss and share scientific processes in communities with different skills-sets
allow differentiating for different roles within a common tools set : software engineers, internal scientists, visiting scientists etc
promotes reuse of software assets and modular solution design
technical services for automating complex processes in a scalable and maintainable way
crucial tool for advanced analytics on gigantic datasets
integrates execution tracing, provenance data, etc.

The implementation will be based on the Ptolemy II framework from UC Berkeley.

Licenses

Eclipse Public License 1.0

Legal Issues

This project builds on several existing open-source projects. The main ones are from Eclipse, Apache and Ptolemy II of UC Berkeley.

Ptolemy II comes with a very open (re)distribution model, based on a "BSD"-style copyright statement and no formal license.

Please see http://ptolemy.eecs.berkeley.edu/ptolemyII/ptIIfaq.htm#ptolemy%20II%20copyright for more info.

Why Here?

There are several reasons to join the Eclipse community with this project.

Technologically it will be integrating several existing Eclipse technologies like equinox, RCP, Graphiti, EMF. So it's a natural fit to become part of the same community and deliver our results here as well.

On a more functional level, this project will be linked to the Eclipse Science IWG. Through the integration of Ptolemy II as Eclipse RCP plugins, this project will add a research domain to the Science IWG : system design, modeling, and simulation techniques for hierarchical, heterogeneous concurrent systems. But besides this "native" Ptolemy application domain, and thanks to its advanced and open actor-oriented software architecture, it has also already been integrated in several other domains, e.g. as a workflow or process engine for automating scientific workflows.

Triquetrum will also deliver specialized tools and reusable libraries for scientific workflows, which would be a valuable contribution to the Science IWG, we hope.

Future Work

In the first year after the project start, we will spend a lot of work on migrating/reproducing many of the GUI features that Ptolemy II now offers in its Swing-based Vergil editor.

A second line of work will be to implement different storage strategies for execution traces and provenance info, based on the Task-based processing model.

This will be the basis for providing reproducible workflows. We will also collaborate with the Ptolemy team to define optimal ways to recover and continue from execution faults, to arrive at a sufficient level of fault-tolerance for long-running distributed workflows.

The long-term work will be oriented to build a software platform for "reproducible research".

Collaborations will be started with several existing Eclipse and Science projects.

At this moment we're already in contact with the project leads of the DAWNSci, ICE and the PTP projects :

DAWNSci delivers core APIs and reference impls to access scientific data sets in files and other sources. These would be integrated in science-oriented extension modules of Triquetrum.
DAWNSci + ICE will deliver requirements for integrating workflow software in a workbench for data exploration and visualization.
ICE + Sandia Analysis Workbench will deliver requirements for workflows for large-scale calculations
PTP has extensive support for working with computing resources. Integrating with clusters/grids like SGE, SLURM... is a crucial part for large-scale scientific workflows.
and Ptolemy is of course a primary collaborating project that already has a significant community. We expect interest from there as well.

Through these collaborations, and through our participation in the Eclipse Science IWG we will grow the community around Triquetrum.

Members of the Science IWG would be invited to evaluate use cases of Triquetrum in their domains, and/or to deliver requirements for future work.

We also intend to write about our work in community articles and to participate, when possible, in Eclipse conferences.

Project Scheduling

An initial contribution is planned for August 2015.

In the fall of 2015, the following will be added :

1. For the Ptolemy RCP editor :

- A minimal set of task-based actors as a show-case of the contents of the other lines of work.

2. For the Task-based processing :

- A basic implementation for in-memory processing.

3. Supporting services :

- An integration of that API with DAWNsci's python analysis RPC

- Some trivial implementations to connect to SOAP web-services

4. A first build configuration, using Eclipse's Tycho-based build infrastructure

People

Project Leads

Erwin De Ley

Christopher Brooks

Committers

erwin de ley

Christopher Brooks (This committer does not have an Eclipse Account)

Interested Parties

Christopher Brooks, Prof. Edward Lee (UC Berkeley) - Ptolemy team
Matt Gerring (Diamond LS, UK) - DAWN
Jay Jay Billings (ORNL) - ICE
Sandia Analysis Workbench team
Scott Lewis - ECF

Source Code

Initial Contribution

For the Ptolemy RCP editor :

1. A minimal editor capable of drawing simple toplevel models and running them, built on Graphiti and EMF.

2. An EMF-based model as a binding layer towards the underlying Ptolemy model elements.

3. Essential ptolemy bundles as binary dependencies. These will at first be built from the Ptolemy II repository.

For the Task-based processing :

1. A domain API for processing arbitrary sequences of Tasks, with their parameters, lifecycle tracing and final results.

Supporting services :

1. A ProcessingService API with brokers and handlers etc for synchronous and asynchronous/buffered task processing

2. An Adapter API to execute Tasks that require external services or applications

All the initial code will be rewritten from the existing Passerelle code. It will be copyrighted by the concerned committer(s) or their organizations. The license will be EPL :

The proposed elements in the 2 last topics are based on what's available in Passerelle, but will need to be refactored & extracted from there.
For the first line, the principles have already been tried out, but the current Passerelle code-base is not appropriate as an initial code drop. So that will take the most effort...

Source Repository Type

GitHub