This proposal has been approved and the AsciiDoc Language project has been created.
Visit the project page for the latest information and development.

AsciiDoc Language

Friday, January 10, 2020 - 14:59 by Lisa Ruff
This proposal is in the Project Proposal Phase (as defined in the Eclipse Development Process) and is written to declare its intent and scope. We solicit additional participation and input from the community. Please login and add your feedback in the comments section.
Parent Project
Proposal State
Created
Background

AsciiDoc has gained traction as a preferred choice for technical writing because it’s expressive, author-friendly, and tool agnostic. The AsciiDoc community has asserted that a specification for AsciiDoc is needed to solidify the ecosystem’s current foundation. We anticipate a specification will also provide pathways for new capabilities that adapt the language to the ever-changing technology landscape. The goal of this project is to produce that specification and its artifacts.

Scope

The AsciiDoc Language project defines and maintains the AsciiDoc Language Specification and Technology Compatiblity Kit (TCK), its artifacts, and the corresponding language and API documentation. The AsciiDoc Language Specification describes the syntax and grammar, Abstract Semantic Graph (ASG), Document Object Model (DOM), referencing system, and APIs for processing, converting, and extending the language. The TCK is used to verify and certify that an AsciiDoc processor implementation is compatible with this specification.

Specifically, the project scope includes the:

  • AsciiDoc language syntax and grammar (e.g., EBNF)
    • doctype structure and objects
  • ASG: namely the encoded form for use in the TCK (e.g., JSON)
  • TCK: Technology Compatiblity Kit for the AsciiDoc language
  • DOM API: in memory semantic representation of the encoded information
  • Processor API (load, convert)
    • Converter API
  • Extension API
    • Extended syntax processors (e.g., custom block or macro)
    • Resolvers (e.g., path and attribute resolvers, ID generator)
    • Parse events and lifecycle interceptors (e.g., input processor, output processor, tree processor)
    • Integration adapters: syntax highlighter, STEM, bibliography, docinfo
  • Expected converter behaviors (e.g., toc, ID generation, icon type, safe mode)
  • Internal and external referencing system: (e.g., xrefs, includes, images)
  • Reference converter and output format (e.g., HTML w/ reference stylesheet, DocBook)
  • Built-in attributes and reserved attribute namespaces
  • AsciiDoc media type (MIME) and .adoc file extension

The project also provides the:

  • AsciiDoc language documentation for writers
  • AsciiDoc API documentation
Description

AsciiDoc is a comprehensive, semantic markup language for producing a variety of presentation-rich output formats from content encoded in a concise, human-readable, plain text format. It also includes a set of APIs for transforming the encoded content, extending the syntax/grammar and processor lifecycle, and integrating with tools and publishing platforms. Teams and individuals use AsciiDoc to write product documentation, technical specifications, architectural guides, scientific and analytical reports, academic courses and training materials, books, and other technical communication.

The AsciiDoc Language isn’t coupled to the output format it produces. Software that implements the AsciiDoc Language Specification can parse and comprehend AsciiDoc and convert the parsed document structure to one or more output formats, such as HTML, PDF, EPUB, man page, DocBook. The ability to produce multiple output formats allows AsciiDoc to be used in static site generators, IDEs, git tools and services, CI/CD systems, and other software.

AsciiDoc bridges the gap between ease of writing and the rigorous requirements of technical authoring and publishing.

Why Here?

AsciiDoc is used across a spectrum of industries and communities, many that are associated with or members of the Eclipse Foundation. Being co-located with so many groups that are invested in AsciiDoc will provide a neutral and diverse forum for collaborating on and improving the language, its software, and related initiatives. Additionally, the Eclipse Foundation’s values of open source, transparency, and vendor neutrality are of the utmost importance to AsciiDoc and its community.

Project Scheduling

The initial contributions are expected to be ready in Q2 2020. Once the initial contributions are accepted and the project infrastructure and team process established, the plan is to iterate on the specification and TCK in coordination with the compatible implementation project(s). The goal of the first, stable version of the specification is to match the AsciiDoc Language as described by Asciidoctor 2.0.x as best as possible to minimize syntax and structure impacts on active AsciiDoc documents, but not propagate deprecations.

Future Work

Future functionality and activities will be driven by community feedback and their requirements. Proposed specification advancements could include:

  • defining syntax patterns for common, stable content models (e.g., tabbed blocks)
  • exploring accessibility functionality
  • improving integration with compatible tooling
  • adapting to the latest output format specifications and related web browser and output standards
  • providing additional doctypes to accommodate the needs of other types of technical writing
  • implementing language server protocol support for AsciiDoc
Project Leads
Mentors
Interested Parties

Projects

  • Asciidoctor
  • git

Companies and Organizations

  • OpenDevise
  • Couchbase
  • Neo4j
  • Pivotal
  • CloudBees
  • SUSE
  • vogella
  • Salesforce
  • Red Hat
Initial Contribution

The Asciidoctor project will provide the following initial contributions:

  • The AsciiDoc language user documentation with syntax examples. (CC BY 3)
  • Documentation build, configuration, and assets.
  • Scenarios from the test suite. (MIT)

In preparation for this project, the documentation and its build, configuration, and assets are being decoupled from the Asciidoctor implementation and scrubbed of implementation references. The documentation build and configuration depends on Antora (MPL-2.0).

Source Repository Type

Here's my take.

I use AsciiDoc to document:

  • Software applications
  • Software release notes
  • Software APIs
  • Protocols
  • File formats

This is my bias.

Missing semantics

Graeme Smecher
wrote
on the mailing list:

A friendly mark-up anchored to an industrial-strength document
model is AsciiDoc's killer feature for me. I'm interested in
reducing
the impedance mismatches between AsciiDoc and DocBook
.

Having this impedance reduced is also my principal ambition.

I want AsciiDoc to offer as many semantic markup as possible while
remaining as lightweight as possible (otherwise I'd just write
DocBook directly).

Considering this, here's the list DocBook tags of which an equivalent
markup is missing from AsciiDoc (as far as I know) for my use cases:

I get that for many inline elements, you can use hash symbols
with a custom class:

Pass an [.type]#std::string# object to the [.func]#setName# function.

Is this the intention? If so, it's still not specified and up to the
writer. I suggest to formalize this, using another syntax than the
class attribute, for example:

Pass an [:type]#std::string# object to the [:function]#setName#
function.

Syntax improvements

Here are a few syntax improvement suggestions, in order of importance
for me.

List item continuation

As a tech writer, what I use the most outside paragraphs are lists:
unordered, ordered, and description.

Those lists often contain items which can get rather complex.
I've always had a hard time dealing
with list item continuation in AsciiDoc. I find the + syntax is
annoying at best. Sure you can use open blocks, but
you
can't nest them
:

* This is a list item.
+
It continues here.
+
And here.

* Sure you can use an open block:
+
--
Like this!

But then what if you want to nest another list here?

* You can.
+
But you can't use an open block because the beginning and end delimiters
are the same.
--
+
First level continued.

Also, I find the "unnesting" syntax, where the number of newlines above the
following + on a single line indicates how many levels to
go back, very confusing:

* Level 1.
** Level 2.
+
Some more level 2 content.
+
*** Level 3.
+
Level 3 continued.

+
Level 2 continued.
*** Other level 3.
+
Other level 3 continued.

+
Level 1 continued.

* Other level 1.

Is this readable to you?

This issue (for me at least) includes the syntax to nest lists, where
you use more *, more ., or more :
depending on the list type when not using open blocks:

* Level 1.
** Level 2.
... Level 3.
** Other level 2.
Question:::
Answer.
Subpoint::::
Subpoint content.
+
Subpoint continued.
Other subpoint::::
Other subpoint content.
Other question:::
Other answer.

** Yet another level 2.

I know AsciiDoc does not rely on indentation usually, but what I'm
suggesting is make an exception here, at least optionally, to nest lists and
to continue list items, just like Markdown does.
There might be limitations, but as far as I know I see none.

Here are the two previous examples reformatted to use identation to
nest and continue items:

* Level 1.
* Level 2.

Some more level 2 content.

* Level 3.

Level 3 continued.

Level 2 continued.

* Other level 3.

Other level 3 continued.

Level 1 continued.

* Other level 1.

* Level 1.
* Level 2.
. Level 3.
* Other Level 2.

Question::
Answer.

Subpoint::
Subpoint content.

Subpoint continued.

Other subpoint::
Other subpoint content.

Other question::
Other answer.

* Yet another level 2.

Nested open blocks

As mentioned above, you can't nest open blocks.

The suggested solution in the GitHub issue is to use ~~~~
to delimit open blocks, adding more ~ to nest them:

~~~~
An open block.

~~~~~
A nested open block.
~~~~~

Continued open block.
~~~~

While this at least provides a solution, why not use a dedicated
closing delimiter instead?

Here's an example, reusing the -- delimiter
we know to begin an open block:

--
An open block.

--
A nested open block.
/-

Continued open block.
/-

There might be forms that are more visually appealing. For example,
using < to open and > to close on
single lines:

<
An open block.

<
A nested open block.
>

Continued open block.
>

In fact, why not use this strategy for any nestable block?

Block title

According to
Title:

A block title is defined on a line above the element. The line must
begin with a dot (.) and be followed immediately by the
title text.

Example:

.Using `printf()` with a signed integer.
====
[source,c]
----
printf("here's an integer: %d\n", 23);
----
====

Sometimes the block title can be long, especially for example titles.

I therefore suggest to have a way to continue the title on the
following line(s) in some way. For example, using a single space
on the following lines:

.Using `printf()` with a signed integer, an unsigned integer,
and a C{nbsp}string.
====
[source,c]
----
printf("here's a bunch of stuff: %d, %u, `%s`\n", 23, 77U, "hello");
----
====

Dedicated non-breaking space and hyphen shorthands

I often need non-breaking spaces. I use them between:

  • Numbers and units
  • Project names and versions
  • Titles and first names
  • Days and months
  • Months and years

and more.

You can use {nbsp} to write a non-breaking space and
&#8209; to write a non-breaking hyphen.

I suggest to have built-in shorthands for both of them. LaTeX uses
~ for a non-breaking space.

Macros

AsciiDoc (Python) has macros and attributes while Asciidoctor has
extensions (Ruby/Java/JavaScript) and attributes.

Should the AsciiDoc specification include an official macro language?

What I mean by macro is a template of AsciiDoc content with variable
placeholders. The expanded macro can become block content or inline
content.

Here's a fictitious example:

:func: https://myproject.org/docs/v{#1}/{#2}.html[`{#2}()`]

See the {web 3.2 replaceText} and {web 3.4 substitute} functions for
more details.

Here's another example for block content:

:rlq:
[quote, René Lévesque, Défaite du Oui au référendum de 1980]
{#1}

You can't always get what you want, as Jagger sang.

{rlq
"Si j'ai bien compris, vous êtes en train de me dire{nbsp}:
à la prochaine fois."}

But those who dare to fail miserably can achieve greatly.

In reply to by Philippe Proulx

To add to this: I thought I was commenting the specification proposal here, but now I understand those are supposed to be project proposal comments.

So I might post this comment again at the appropriate location when the specification draft takes form.

In reply to by Philippe Proulx

Thank you for taking the time to share this input. Indeed, these points are best suited for the AsciiDoc specification list once this proposal is approved and the mailinglist is up and running.

I do want to emphasize that the focus of this spec is not on creating a new language with new syntax, but rather to standardize and evolve (within reason) the existing syntax. We can and should address matters of semantics, but we're not aiming to fundamentally alter the syntax, such as changing the fences for delimited blocks (aside from the open block issue). An AsciiDoc document written today should still continue to work with the standard processor. Just something to keep in mind when we discuss enhancements to the syntax.

I'm the current maintainer of the AsciiDoc IntelliJ plugin, and I'm taking the perspective of a IDE-plugin developer for this comment.

Please reply and let me know if you second any of these ideas for the scope of the proposals, or if you consider them part of the existing proposal.

From an IDE perspective I'd like to see the following elements to be part of the scope:

  1. Retrieve meta-information at runtime for auto-complete of macros, preprocessor-directives and attributes 
  2. Mapping output to the source input, to allow the user to trace back an output in the preview to a source line
  3. Extentions should provide an abstraction to the file system and references to allow advanced reference/content systems like Antora.
  4. Extensions should be ordered. They should be able to hook into the processing and delegate to a previous implementation. This way an extension could add an attribute to an existing macro, or convert an attribute's content before delegating it to the original implementation.
  5. For spell checking an adoc-file meta-information should be available to find out what parts of a macro contain information that should be spell checked 

ad 1: The meta-information at runtime should include built-in and active extensions for a list of available macros and attributes. Each macro and attribute should provide a textual self-description for a (human) writer. Each macro (extension or built-in) should provide a list of supported attributes. Each attribute should provide a sample and default value and possibly also a type so that the IDE can trigger auto-complete for file names, IDs, etc. 

ad 2: The Asciidoctor HTML output already implements it by adding additional data attributes to some HTML tags, but doesn't attach it to all tags. It is currently based on line level. Future implementations could provide line and column information. Sourcemaps would be a method of implementation, but might depend on the output.

ad 3: I assume this is covered by either "Internal and external referencing system" or the Extension API "(path) resolvers", I just want to make sure either of them can be used for this.

ad 4: When retrofitting some Antora style behavior to Asciidoctor Ruby I used "prepend" to monkey-patch some of the necessary functionality. With a mechanism as described above this would not have been necessary.

 

In reply to by Alexander Schwartz

Retrieve meta-information at runtime for auto-complete of macros, preprocessor-directives and attributes 

Yes, I consider this part of the AST / DOM. Unlike the existing AsciiDoc processors, a standard processor should be able to capture and make available all the information about the parsed document. We'll need to work out how all that is stored, but it needs to be in there somewhere.

Mapping output to the source input, to allow the user to trace back an output in the preview to a source line

Yes, source-level information will be available for each parsed node, and perhaps even lower-level than that.

Extentions should provide an abstraction to the file system and references to allow advanced reference/content systems like Antora.

Great idea. That will come into play when we get into extensions. (We're not sure yet whether extensions will be in the main language spec or a supplemental spec).

Extensions should be ordered. They should be able to hook into the processing and delegate to a previous implementation. This way an extension could add an attribute to an existing macro, or convert an attribute's content before delegating it to the original implementation.

This is defintely a detail that the extensions part of the spec (or supplemental spec) will need to address. There are two concerns here...one is about extention hierarchy and one is about extension ordering relative to one another. Asciidoctor gives us some direction here, though we need to address where it leaves ambiguity.

For spell checking an adoc-file meta-information should be available to find out what parts of a macro contain information that should be spell checked 

In general, just having source information for each node will help a lot here. I do like the idea that nodes self identify as having content that should be considered / processed by a spell checker...or perhaps something more high level. Certainly a great idea to consider.

All in all, the key point to keep in mind is that one of the key goals is to parse the language fully. When AsciiDoc began, it was a streaming processor that offered no access to a parsed document. Asciidoctor introduced a document model and parsed down to the block level. The standard language will require that mapping to be complete down to the lowest reasonable level, certainly inline nodes and maybe even characters.

In reply to by Alexander Schwartz

I'd like to add one more ask from an IDE perspective. I do IDEs professionally, but I haven't done IDE for markup langauges, so this is medium-level confidnce.

 

My understanding is that today `include` directive works by raw textual inclusion, much like C's include. That's a bit problemantic from an IDE perspective, as it precludes building an AST for an isolated file. Ability to parse document without accessing context is something that makes life of IDE writers much easier, as it allows parsing documents in parallel, in isolation and with trivial cache invalidation rule. 

A possible solution here is to make `include` more structured and require it to work on a DOM level. That is, processor conceptually *first* parses both files in isolation and builds a syntax/documnt tree, and then it inserts a subdocument's tree into the main doc.