As modelling has become more prevalent, the inherent scalability issues in the "simple" approach of storing a model in a single file have become more apparent. The two most common approaches are breaking up the model into multiple "fragment" files, or storing it in a database. The first approach is simpler to implement than the second one: with Eclipse technologies, the first approach is well-supported with cross-file containment, whereas the second approach requires the complex task of integrating a new persistence framework (such as Eclipse CDO). The first approach also interoperates nicely with existing version control systems (e.g. Git), which scale better as models are broken up into more fine-grained fragments.
However, this first approach has traditionally suffered with scalability issues when we try to run a query that might touch upon all fragments. If we are not careful, these global queries on fragmented file-based models may require having the entire model loaded in memory at once, therefore limiting the maximum size of the model that we may handle.
Eclipse Hawk is intended to cover all aspects necessary to index, query and visualize collections of models in a more efficient manner than working with standard file-based persistence through the use of various database stores, in a transparent manner that does not require replacing the existing persistence mechanism for those models.
Eclipse Hawk is a heterogeneous model indexing framework: it indexes collections of models transparently and incrementally into a NoSQL database, which can be queried in a more efficient and convenient manner. You can mirror EMF, UML or Modelio models (among others) into a Neo4j or OrientDB graph, which you can query with their native languages, or (preferably) through the languages provided by Hawk. Hawk will watch over those models and update the graph whenever they change, in an incremental manner.
Current versions of Hawk integrate extended versions of the Epsilon Object Language and the Epsilon Pattern Language (part of the Eclipse Epsilon project). The advantage of using these languages is that you can reuse the same query across backends - it will work the same across Neo4j, OrientDB or Greycat.
Hawk also includes tools to make life easier with it: exposing Hawk queries as EMF models, or a web service API for querying remote Hawk indexes over the network.
Hawk covers a gap in the Eclipse MDT project, which includes solutions for storing models into databases (e.g. CDO) but lacks a solution for simply making queries over existing file-based models faster. Hawk could also be used as an alternative to custom-developed indices in various Eclipse projects.
We would like to leverage the Eclipse community to reach more potential users and increase the awareness of Hawk as a solution to a common problem in the modeling community, and to benefit from a more formal governance model for the long term sustainability of the project. We would also benefit from the legal stewardship of the Eclipse Legal team, especially with regards to the integration of newer backends, which tend to have their own unique licensing challenges. We believe that by having this stewardship, we will make it simpler for companies to integrate the technology behind Hawk by having the legal issues cleared in advance.
The current version of Hawk is licensed under EPL 2.0 with the GPLv3 as Secondary License to allow compatibility with Neo4j, which is licensed under the GPLv3. We will not distribute the Neo4j backend from Eclipse, and we will do that from a separate website. The Eclipse distributions will include the EPL-friendly OrientDB- and Greycat-based backends instead, which are drop-in replacements for it. OrientDB and Greycat are licensed under the Apache License 2.0.
Previous versions of Hawk included an IFC building information model parser plug-in, which was based on Affero GPLv3 libraries from the OpenBIMserver project. We have extracted these components to a separate repository, where they would live outside of Eclipse, for users that adopted Hawk under the secondary GPLv3 license. If there is enough demand for IFC parsing in the EPL-based version, we may consider a reimplementation with a new parser under the EPL.
The initial contribution is already available, and builds are also ready to be tested. For the general scheduling of the project, we intend to release a new version roughly every year, integrating the latest developments in our research. We have already released versions 1.0, 1.1 and 1.1.1 in this manner by using the GitHub "Releases" functionality. These versions have focused on stability and performance, especially from the less mature OrientDB backend.
Regarding functionality, our main areas of activity at the moment are:
- Full history indexing: we want Hawk to index the full history of the models rather than only the latest version. For that end, we have recently added a Greycat backend. Greycat implements a many-worlds temporal graph, which should make it possible to have this full history after further improvements on the current indexing components.
- Revamped UI: the current UI is serviceable, but it is not component-based. It is not possible to use plug-ins to expand its capabilities, which hurts usability in places (e.g. the UI is the same whether we want to index a local folder or a remote SVN repository).
- Clustering: Hawk has support for the multi-master mode of OrientDB, but in general it cannot respond to queries while it is updating, and it cannot distribute indexing work between multiple machines. In the medium-to-long-term, we want to expand the horizontal scalability of Hawk by adding coordination capabilities between multiple Hawk servers. Within a Hawk cluster, at least one node will always be available to answer queries, while the others may be indexing the latest version of the model.
- Revamped storage: preferences for Hawk indexes are currently scattered over the Eclipse preferences store and multiple XML files. We intend to introduce a simple DSL for describing the configuration of a Hawk index, in a way that is friendly when using Hawk from inside and from outside the Eclipse IDE.
- Further improvements in the usability, performance and stability of existing backends, model support plug-ins and model location plug-ins.
We intend to participate in academic and industrial events to promote participation in our project, running tutorials in relevant conferences and demo'ing Hawk at various industrial events throughout the world.