OMR | projects.eclipse.org

Monday, December 7, 2015 - 17:23 by Gary Liu

Basics

This proposal is in the Project Proposal Phase (as defined in the Eclipse Development Process) and is written to declare its intent and scope. We solicit additional participation and input from the community. Please login and add your feedback in the comments section.

Project

Eclipse OMR

Parent Project

Eclipse Technology

Proposal State

Created

Background

Building the runtime technology for a new language to match the capabilities of existing mature languages typically requires tremendous effort over decades and, in some cases, never happens because language adoption rates never justify the needed investment. But many of the technologies required are actually not substantially different than the technologies that have been created for existing languages. There are always quirks and peculiarities for each language, but the fundamental technology is really very similar in nature. What makes it extremely difficult to repurpose existing technology for a new language, however, is that the effort to create a new language runtime typically focuses almost entirely on the shortest path to becoming operational for one particular language. “Shortest path” typically means specializing the technology for that language which impedes reuse for other languages. This process has already been repeated many times for many different languages, resulting in several challenges that affect all communities to varying extents:

Opportunity cost : every community invests limited resources to independently implement and maintain code that is broadly similar in capability but expressed in different ways. How much more would we all accomplish without this wasted effort?
Long robustness ramp: different implementations tend to run into and fix similar kinds of bugs over their lifetimes. Early design flaws can become extremely restrictive and hard to fix as the community grows around a runtime implementation
Slow capability adoption: hardware and operating system capabilities take much longer to become consistently available and, in the meantime, the developer community can be be disadvantaged on some platforms
Hampered productivity: frameworks for development, diagnostic, profiling, monitoring, management, deployment, testing, etc. require much more effort to build and maintain across many languages or we build broadly similar (but different) tools for each language (see #1)
Barrier to entry: the more hardware, operating system, and tools become popular the harder it becomes to get a new language to be fully capable. Not all language designers necessarily want to become experts building these capabilities.
Slow forward progress: slower innovation in languages, possibly even foiled in some cases by significant runtime implementation costs

One approach to improve this situation is to build other languages within an existing mature runtime environment like Microsoft’s Common Language Runtime or a Java Virtual Machine. For example, Scala, Groovy, jRuby, Nashorn, and many other language projects leverage the Java Virtual Machine (JVM) to run code written in other languages. None of these projects, however, have become the de facto implementations for their target language, in part because the JVM is primarily designed and continues to make implementation trade-offs so as to run Java code very efficiently but not necessarily other languages. The implementation trade-offs needed to support other languages require and encourage workarounds and complexity that would not be needed were it not for the fundamental design constraints (i.e. the “Java-ness”) of the JVM itself. For reasons like this, most languages have a native C or C++ runtime implementation that is considered the reference implementation for the majority of that language’s users. More significantly, however, the success of this kind of approach depends on migrating a community from one runtime implementation to another, across what can be a significant number of implementation differences that manifest for developers and users as varying forms of “my program doesn’t work the way it used to work”. To date, very few large language communities have been able to succeed with this scale of migration.

A second approach could be to build new runtime components from scratch that are designed from the outset for reuse. But building even one runtime for one specific language is incredibly hard. Building such componentry to support any runtime but without any specific stakeholder (while conceptually every stakeholder) is almost guaranteed to fail.

Neither of these two approaches seems like a sure bet, but the idea to leverage a mature JVM’s core technology feels like the best direction. The JVM technology already exists and has proven itself for at least one mature language community. But bolting other languages on top of Java semantics has not yet shown to be a broadly viable solution.

Instead, we propose to reorganize the runtime components of an existing commercial JVM implementation (the IBM Developer’s Kit for Java) to separate the parts that implement Java semantics from the parts that provide key runtime capabilities.

The OMR project will be formed around these latter language independent parts: a runtime technology platform consisting of core components as a toolbox for building language runtimes. An ecosystem of developers working together to augment the capabilities of this platform while collaborating with developers for tools and frameworks simultaneously fosters industry-wide innovation in managed runtimes, the languages they implement, and the collection of frameworks and tools that will accelerate our industry’s ability to build even more amazing things.

Scope

This project consists of core componentry that can be (re)used to build language runtimes along with test cases to operationally document and maintain the semantics of those components. It is a set of functional, robust components that have no language specificity and direct component level tests. At least initially, it will not include any components or tests that are implemented in language specific ways, and it will not include any code that surfaces OMR component capabilities to any particular language except as sample code. Code and tests for language specific capabilities probably belong in projects devoted to particular languages, but as the OMR project becomes consumed by more languages, it may make sense for some language specific code to reside within the OMR project to accelerate problem discovery for OMR code contributions.

Alongside this project, we will be open sourcing our CRuby implementation that leverages the OMR technology, and we have a CPython implementation also that leverages some of the OMR technology. As we contribute the underlying OMR technology to the OMR project, we'll also open source the implementations to leverage that OMR technology for CRuby and eventually CPython.

Description

The OMR project consists of a highly integrated set of open source C and C++ components that can be used to build robust language runtimes that will support many different hardware and operating system platforms. These components include but are not limited to: memory management, threading, platform port (abstraction) library, diagnostic file support, monitoring support, garbage collection, and native Just In Time compilation.

The long term goal for the OMR project is to foster an open ecosystem of language runtime developers to collaborate and collectively innovate with hardware platform designers, operating system developers, as well as tool and framework developers and to provide a robust runtime technology platform so that language implementers can much more quickly and easily create more fully featured languages to enrich the options available to programmers.

Planned functionality:

Thread Library
Port Library
Garbage Collection
Diagnostic support
Just In Time Compiler
Tooling interfaces
Hardware exploitation (e.g. RDMA, GPU, SIMD, etc.)
Any technology implementing capabilities that can be reused in multiple languages, including source code translators, byte code or AST interpreters, etc.

Licenses

Apache License, Version 2.0

Eclipse Public License 1.0

Legal Issues

Our unit/component test infrastructure requires the use of Google Test framework (1.7.0) to run which is distributed under the “New BSD” license. We have not yet consulted for a legal opinion, but our understanding is that the New BSD license is not incompatible with the EPL 1.0 license although there are specific requirements we will need to meet. In particular, we would need to include the required notices and disclaimers for the Google Test framework as a small constituent part of this project.

The test infrastructure also requires the use of pugixml 1.5 which is distributed under the MIT License.

The port library in the first drop includes two files (auxv.c and auvx.h) from Auxiliary Vector Library, which is another open source project contributed by IBM. It is distributed under the “New BSD” license.

The name "OMR" has been previously owned in Canada (Application number 0566581 and Application number 1196558) and in US, but records indicate the owner has abandoned or has not renewed the name for the last 10 years.

Why Here?

The OMR project is an open extensible runtime technology platform enabling any kind of language runtime, but OMR is not itself a runtime for any language. Aside from the general support and nurturing environment any open source foundation would provide, the Eclipse Foundation has particular expertise in establishing open communities around platforms. The success of the OMR project will hinge on dependent projects becoming comfortable to consume our technology via repeated successful delivery of high quality code. The collective experience of the Eclipse Foundation is by far our best chance for success, and we think the OMR project would make an excellent addition to the Eclipse Foundation community.

Future Work

The initial focus will be to move our existing code base into the open project and establish the base core componentry. We hope to engage with partners to extend the list of supported platforms as well as begin to work with different language communities to start the adoption process to leverage the OMR components in language runtimes.

Project Scheduling

The initial contribution can be made available as early as January 2016 when we complete all the review and approval process. Additional components will be going open with an approximately monthly cadence with

End Jan 2016

Thread Library with core utilities
Partial Port Library and data structures
Garbage Collection: Mark / Sweep collector initially

End Feb 2016

Initial OS/X platform support

End Mar 2016

Parallel scavenger GC support, complete concurrent GC support

May/June 2016

Very large heap GC support

June 2016

Just In Time compiler initial drop with more code dropping throughout the rest of the year and into 2017
System core dump processing facilities for easier problem diagnosis

People

Project Leads

Mark Stoodley

Gary Liu

Committers

Angela Lin (This committer does not have an Eclipse Account)

Charlie Gracie

Mark Stoodley

Daryl Maier

John Duimovich (This committer does not have an Eclipse Account)

Mark Stoodley

Gary Liu

Interested Parties

Lots of interest expressed in the public conferences where we've talked about this technology.

Source Code

Initial Contribution

The initial contribution will include a set of core utilities, a low level memory allocation library, and a thread library along with an initial set of tests for these components and some examples for how to use these components. More components will be contributed through 2016. The code is virtually all owned by IBM (exceptions noted above under "Legal issues"). This project will be the first time this code has been released in the open, so there is no community around it (yet).

Source Repository Type

GitHub