This proposal has been approved and the Jakarta Batch project has been created.
Visit the project page for the latest information and development.

Jakarta Batch

Monday, November 26, 2018 - 17:48 by Kevin Sutter
This proposal is in the Project Proposal Phase (as defined in the Eclipse Development Process) and is written to declare its intent and scope. We solicit additional participation and input from the community. Please login and add your feedback in the comments section.
Is this a specification project?
Project
Parent Project
Proposal State
Created
Background

This project is created as part of the process of transitioning Java EE 8 technologies to the Eclipse Foundation as described in The Eclipse Enterprise for Java Project Top Level Project Charter.  A unique aspect of this project creation is that Java Batch was led by a non-Oracle organization, namely IBM.

The project aims to continue the standardization work of JSR-352, which was first finalized at version 1.0 in 2013 as a part of Java EE 7, and which is also designed for use in Java SE environments.

The problem domain targeted by Jakarta Batch can be described by this excerpt from the JSR-352 specification:

  Batch processing is a pervasive workload pattern, expressed by a distinct application organization and execution model. It is found across virtually every industry, applied to such tasks as statement generation, bank postings, risk evaluation, credit score calculation, inventory management, portfolio optimization, and on and on. Nearly any bulk processing task from any business sector is a candidate for batch processing.

  Batch processing is typified by bulk-oriented, non-interactive, background execution. Frequently long-running, it may be data or computationally intensive, execute sequentially or in parallel, and may be initiated through various invocation models, including ad hoc, scheduled, and on-demand.

  Batch applications have common requirements, including logging, checkpointing, and parallelization. Batch workloads have common requirements, especially operational control, which allow for initiation of, and interaction with, batch instances; such interactions include stop and restart.

Scope

Project Scope:  The Jakarta Batch project defines and maintains the Jakarta Batch specification and related artifacts.

Specification Scope:    Jakarta Batch describes a means for developing, executing and managing batch processes in Jakarta EE applications.

Description

The Jakarta Batch project describes the XML-based job specification language (JSL), Java programming model, and runtime environment for batch applications for the Java platform.

The specification ties together the Java API and the JSL (XML) allowing a job designer to compose a job in XML from Java application artifacts and conveniently parameterize them with values for an individual job.  This structure promotes application reuse of artifacts across different jobs. 

Some key features:

  • checkpoint / restart - The application read-process-write loop is performed under a global transaction, one "batch" or "chunk" of data at a time, with the batch implementation atomically storing a "checkpoint" at the end.  This checkpoint provides an index into the data stream which allows you to restart a job after an earlier execution hits a failure (or is stopped), such that picks up where you left off (at the checkpointed value).  
  • steps - jobs can be composed of steps to allow reuse of step logic and definitions within multiple jobs, as well as to faciliate restart (at the step the job left off at).
  • XML configuration - Configuration is externalized from Java code into XML and parameterized through a variety of "job property" substitutions.  As one example, this allows database lock tuning (for locks held during the duration of the chunk transaction) to be tuned without touching Java code.
  • partitions - The read-process-write loop can be broken up into multiple units running in parallel against different segments of the input data.

 

The specification allows the flexibility for batch jobs to be scheduled or orchestrated in any number of ways, and stops short of defining any APIs or constructs regarding scheduling or orchestration of multiple or repeated jobs.   

Why Here?

The top level EE4J project was created consistent with the direction described in The Eclipse Enterprise for Java Project Top Level Project Charter.   This project is created under the top level EE4J project as one of the Java EE 8 technologies being transitioned to the Eclipse Foundation.

Project Scheduling

We intend to use community-driven project scheduling, and so naturally hope to get some consensus about the initial set of priorities from other participants.  

It may help to start with a few steps to build momentum for the project without yet seeking to expand the end user function. 

For example, one set of priorties would be:

  1. Revive JSR 352 spec issues list - Create ("reconstitute") a new issues list starting from the old issues list (with community consensus for those we decide to NOT carry forward) that has been archived since Oracle sunset the java.net site.
  2. Improve TCK automation - Develop the Maven layer around the existing TCK so that it can more easily be executed both in "SE mode" as well as within "EE / app server mode".   This will allow execution of the TCK against existing well-known JSR 352 implementations via a tight iterative loop, which in turn will give us the confidence that we can begin enhancing the TCK with new tests addressing some of the coverage gaps that were identified in the JSR 352 project.  
  3. Package namespace migration ? - If the general javax.* to jakarta.* namespace issue has been resolved by this point, and there is a clear first action, it would make sense to tackle this right away, before starting on new function.

It seems like the first 2-3 months after the project is launched would be a reasonable timeframe to contain or show significant progress on a set of priorities like this.

Future Work

Once we've dealt with the above three things, which don't add any new function to end user developers, we could take the reconstituted issues list, and do a combination of polling and discussion to prioritize some big ticket / bullet point items for a VNext.  

Along with the bigger ticket items there will be quite a few minor additions that we can use to get the ball rolling and show progress.   E.g. "I have a well-defined listener lifecycle, but I just need one more method at point XYZ"...  Everyone will understand it, and it'll force us to produce 1.1-SNAPSHOT or whatever, and force us to build out the new pipeline enough to be able to ship/release anything.

Initial Contribution

The contribution will consist of:

  1. A spec document
  2. An API module
  3. A  TCK module

IBM has the IP rights necessary to allow us to contribute these immediately

Source Repository Type

I was part of the JSR-352 in the previous phase and one of the original architects of the Spring Batch product.  I'd love to participate again.  What do I need to do to get involved again?

My name and contact information is 

Wayne Lund

wlund@pivotal.io

916-296-1893 | twitter: @wxlund