This proposal has been approved and the Eclipse Hawk™ project has been created.
Visit the project page for the latest information and development.

Eclipse Hawk

Monday, March 12, 2018 - 13:14 by Antonio Garcia…
This proposal is in the Project Proposal Phase (as defined in the Eclipse Development Process) and is written to declare its intent and scope. We solicit additional participation and input from the community. Please login and add your feedback in the comments section.
Project
Parent Project
Proposal State
Created
Background

As modelling has become more prevalent, the inherent scalability issues in the "simple" approach of storing a model in a single file have become more apparent. The two most common approaches are breaking up the model into multiple "fragment" files, or storing it in a database. The first approach is simpler to implement than the second one: with Eclipse technologies, the first approach is well-supported with cross-file containment, whereas the second approach requires the complex task of integrating a new persistence framework (such as Eclipse CDO). The first approach also interoperates nicely with existing version control systems (e.g. Git), which scale better as models are broken up into more fine-grained fragments.

However, this first approach has traditionally suffered with scalability issues when we try to run a query that might touch upon all fragments. If we are not careful, these global queries on fragmented file-based models may require having the entire model loaded in memory at once, therefore limiting the maximum size of the model that we may handle.

Scope

Eclipse Hawk is intended to cover all aspects necessary to index, query and visualize collections of models in a more efficient manner than working with standard file-based persistence through the use of various database stores, in a transparent manner that does not require replacing the existing persistence mechanism for those models.

Description

Eclipse Hawk is a heterogeneous model indexing framework: it indexes collections of models transparently and incrementally into a NoSQL database, which can be queried in a more efficient and convenient manner. You can mirror EMF, UML or Modelio models (among others) into a Neo4j or OrientDB graph, which you can query with their native languages, or (preferably) through the languages provided by Hawk. Hawk will watch over those models and update the graph whenever they change, in an incremental manner.

Current versions of Hawk integrate extended versions of the Epsilon Object Language and the Epsilon Pattern Language (part of the Eclipse Epsilon project). The advantage of using these languages is that you can reuse the same query across backends - it will work the same across Neo4j, OrientDB or Greycat.

Hawk also includes tools to make life easier with it: exposing Hawk queries as EMF models, or a web service API for querying remote Hawk indexes over the network.

Why Here?

Hawk covers a gap in the Eclipse MDT project, which includes solutions for storing models into databases (e.g. CDO) but lacks a solution for simply making queries over existing file-based models faster. Hawk could also be used as an alternative to custom-developed indices in various Eclipse projects.

We would like to leverage the Eclipse community to reach more potential users and increase the awareness of Hawk as a solution to a common problem in the modeling community, and to benefit from a more formal governance model for the long term sustainability of the project. We would also benefit from the legal stewardship of the Eclipse Legal team, especially with regards to the integration of newer backends, which tend to have their own unique licensing challenges. We believe that by having this stewardship, we will make it simpler for companies to integrate the technology behind Hawk by having the legal issues cleared in advance.

Future Work

Regarding functionality, our main areas of activity at the moment are:

  • Full history indexing: we want Hawk to index the full history of the models rather than only the latest version. For that end, we have recently added a Greycat backend. Greycat implements a many-worlds temporal graph, which should make it possible to have this full history after further improvements on the current indexing components.
  • Revamped UI: the current UI is serviceable, but it is not component-based. It is not possible to use plug-ins to expand its capabilities, which hurts usability in places (e.g. the UI is the same whether we want to index a local folder or a remote SVN repository).
  • Clustering: Hawk has support for the multi-master mode of OrientDB, but in general it cannot respond to queries while it is updating, and it cannot distribute indexing work between multiple machines. In the medium-to-long-term, we want to expand the horizontal scalability of Hawk by adding coordination capabilities between multiple Hawk servers. Within a Hawk cluster, at least one node will always be available to answer queries, while the others may be indexing the latest version of the model.
  • Revamped storage: preferences for Hawk indexes are currently scattered over the Eclipse preferences store and multiple XML files. We intend to introduce a simple DSL for describing the configuration of a Hawk index, in a way that is friendly when using Hawk from inside and from outside the Eclipse IDE.
  • Further improvements in the usability, performance and stability of existing backends, model support plug-ins and model location plug-ins.

We intend to participate in academic and industrial events to promote participation in our project, running tutorials in relevant conferences and demo'ing Hawk at various industrial events throughout the world.

Project Scheduling

The initial contribution is already available, and builds are also ready to be tested. For the general scheduling of the project, we intend to release a new version roughly every year, integrating the latest developments in our research. We have already released versions 1.0, 1.1 and 1.1.1 in this manner by using the GitHub "Releases" functionality. These versions have focused on stability and performance, especially from the less mature OrientDB backend.

Project Leads
Interested Parties
  • The SOFTEAM Group has expressed their interest in the project to move to Eclipse. SOFTEAM has had multiple successful experiences integrating Hawk. Within the MONDO EU project, SOFTEAM used Hawk within their Constellation collaboration server to provide querying / searching over their Modelio models. As part of the ITEA3 MEASURE project, SOFTEAM and Aston University jointly supervised a MSc project which added Hawk-as-a-service to the integrated MEASURE metrics platform.
  • Rolls-Royce plc has also expressed their interest in using Hawk for indexing their models.

 

Initial Contribution

The code for version 1.1.1 (latest stable release) and the 1.2 interim releases is currently hosted on GitHub, and it is fully functional:

https://github.com/mondo-project/mondo-hawk/

It includes a full implementation of the indexing framework, the query languages, and three backends based on Neo4j [1] 2.0.5, OrientDB [2] 2.2.30 and Greycat [3] V10. The query language is based on the latest Eclipse Epsilon 1.5 interim releases [4]. The Hawk server is based on Eclipse Jetty [5], and it uses Apache Thrift [6] and Apache Artemis [7] for RPC-style and server-to-client notifications, respectively. Hawk follows the Eclipse plugin-based architecture, and some of the plugins introduce additional dependencies:

  • The BPMN model support integrates the Eclipse BPMN metamodel [8].
  • The EMF model support integrates the Eclipse Modelling Framework.
  • The HTTP location support integrates Apache HTTPClient 4.3.6 [9].
  • The local folder location support and the server user database integrate the Apache-licensed MapDB database (needed for locations with a very large number of files) [10].
  • The UML model support is based on Eclipse MDT UML2.
  • The workspace location support is based on core Eclipse APIs.
  • The security layer of the RCP-based Hawk server depends on Apache Shiro 1.2.4 [11].

Internally, Hawk is implemented as a set of Eclipse plugin projects and it can be built through Tycho or (partly) through plain Maven. The Travis CI system is used to build interim releases over the master branch [12], which are made available as an Eclipse update site.

Copyright is mostly shared between the University of York (Konstantinos Barmpis and Dimitris Kolovos) and Aston University (Antonio Garcia-Dominguez), with some minor contributions from third parties (a MSc student - Orjuwan Al-wadaei - and several collaborators in a European research project - Gabor Szárnyas and Abel Gómez).

As for the community, during the MONDO EU FP7 project multiple companies across Europe used it: SOFTEAM integrated Hawk into their Constellation collaborative modelling platform [13], and recently started to use it as well within their MEASURE metrics platform. UNINOVA from Portugal used it to speed up queries over building information models, and SOFT-MAINT used it to query large models produced by reverse-engineering code. We are also driving further use of Hawk within local companies in the UK, and the backing of the Eclipse Foundation in legal matters would help with attracting new developers into the project, and giving companies peace of mind about integrating it into their solutions.

[1]: https://neo4j.com/

[2]: https://orientdb.com/

[3]: https://greycat.ai/

[4]: https://www.eclipse.org/epsilon/

[5]: https://www.eclipse.org/jetty/

[6]: https://thrift.apache.org/

[7]: https://activemq.apache.org/artemis/

[8]: https://www.eclipse.org/modeling/mdt/?project=bpmn2

[9]: https://hc.apache.org/httpcomponents-client-ga/index.html

[10]: http://www.mapdb.org/

[11]: http://shiro.apache.org/

[12]: https://travis-ci.org/mondo-project/mondo-hawk/

[13]: https://research.aston.ac.uk/portal/en/researchoutput/integration-of-a-graphbased-model-indexer-in-commercial-modelling-tools(8d7bbefd-b6e8-4417-a0d5-cb1fd52e5efd).html

Source Repository Type