AsciiDoc Language

Friday, January 10, 2020 - 14:59 by Lisa Ruff

Basics

This proposal is in the Project Proposal Phase (as defined in the Eclipse Development Process) and is written to declare its intent and scope. We solicit additional participation and input from the community. Please login and add your feedback in the comments section.

Project

AsciiDoc Language

Parent Project

Eclipse Technology

Proposal State

Created

Background

AsciiDoc has gained traction as a preferred choice for technical writing because it’s expressive, author-friendly, and tool agnostic. The AsciiDoc community has asserted that a specification for AsciiDoc is needed to solidify the ecosystem’s current foundation. We anticipate a specification will also provide pathways for new capabilities that adapt the language to the ever-changing technology landscape. The goal of this project is to produce that specification and its artifacts.

Scope

The AsciiDoc Language project defines and maintains the AsciiDoc Language Specification and Technology Compatiblity Kit (TCK), its artifacts, and the corresponding language and API documentation. The AsciiDoc Language Specification describes the syntax and grammar, Abstract Semantic Graph (ASG), Document Object Model (DOM), referencing system, and APIs for processing, converting, and extending the language. The TCK is used to verify and certify that an AsciiDoc processor implementation is compatible with this specification.

Specifically, the project scope includes the:

AsciiDoc language syntax and grammar (e.g., EBNF)
- doctype structure and objects
ASG: namely the encoded form for use in the TCK (e.g., JSON)
TCK: Technology Compatiblity Kit for the AsciiDoc language
DOM API: in memory semantic representation of the encoded information
Processor API (load, convert)
- Converter API
Extension API
- Extended syntax processors (e.g., custom block or macro)
- Resolvers (e.g., path and attribute resolvers, ID generator)
- Parse events and lifecycle interceptors (e.g., input processor, output processor, tree processor)
- Integration adapters: syntax highlighter, STEM, bibliography, docinfo
Expected converter behaviors (e.g., toc, ID generation, icon type, safe mode)
Internal and external referencing system: (e.g., xrefs, includes, images)
Reference converter and output format (e.g., HTML w/ reference stylesheet, DocBook)
Built-in attributes and reserved attribute namespaces
AsciiDoc media type (MIME) and .adoc file extension

The project also provides the:

AsciiDoc language documentation for writers
AsciiDoc API documentation

Description

AsciiDoc is a comprehensive, semantic markup language for producing a variety of presentation-rich output formats from content encoded in a concise, human-readable, plain text format. It also includes a set of APIs for transforming the encoded content, extending the syntax/grammar and processor lifecycle, and integrating with tools and publishing platforms. Teams and individuals use AsciiDoc to write product documentation, technical specifications, architectural guides, scientific and analytical reports, academic courses and training materials, books, and other technical communication.

The AsciiDoc Language isn’t coupled to the output format it produces. Software that implements the AsciiDoc Language Specification can parse and comprehend AsciiDoc and convert the parsed document structure to one or more output formats, such as HTML, PDF, EPUB, man page, DocBook. The ability to produce multiple output formats allows AsciiDoc to be used in static site generators, IDEs, git tools and services, CI/CD systems, and other software.

AsciiDoc bridges the gap between ease of writing and the rigorous requirements of technical authoring and publishing.

Licenses

Eclipse Public License 2.0

Legal Issues

The .adoc extension was once associated with a now obsolete file format. Otherwise, we don’t know of any legal issues at this time.

Why Here?

AsciiDoc is used across a spectrum of industries and communities, many that are associated with or members of the Eclipse Foundation. Being co-located with so many groups that are invested in AsciiDoc will provide a neutral and diverse forum for collaborating on and improving the language, its software, and related initiatives. Additionally, the Eclipse Foundation’s values of open source, transparency, and vendor neutrality are of the utmost importance to AsciiDoc and its community.

Future Work

Future functionality and activities will be driven by community feedback and their requirements. Proposed specification advancements could include:

defining syntax patterns for common, stable content models (e.g., tabbed blocks)
exploring accessibility functionality
improving integration with compatible tooling
adapting to the latest output format specifications and related web browser and output standards
providing additional doctypes to accommodate the needs of other types of technical writing
implementing language server protocol support for AsciiDoc

Project Scheduling

The initial contributions are expected to be ready in Q2 2020. Once the initial contributions are accepted and the project infrastructure and team process established, the plan is to iterate on the specification and TCK in coordination with the compatible implementation project(s). The goal of the first, stable version of the specification is to match the AsciiDoc Language as described by Asciidoctor 2.0.x as best as possible to minimize syntax and structure impacts on active AsciiDoc documents, but not propagate deprecations.

People

Project Leads

Committers

Andres Almiray (This committer does not have an Eclipse Account)

Interested Parties

Projects

Asciidoctor
git

Companies and Organizations

OpenDevise
Couchbase
Neo4j
Pivotal
CloudBees
SUSE
vogella
Salesforce
Red Hat

Source Code

Initial Contribution

The Asciidoctor project will provide the following initial contributions:

The AsciiDoc language user documentation with syntax examples. (CC BY 3)
Documentation build, configuration, and assets.
Scenarios from the test suite. (MIT)

In preparation for this project, the documentation and its build, configuration, and assets are being decoupled from the Asciidoctor implementation and scrubbed of implementation references. The documentation build and configuration depends on Antora (MPL-2.0).

Source Repository Type

GitHub

Here's my take.

Submitted by Philippe Proulx on Mon, 04/27/2020 - 22:32

Here's my take.

I use AsciiDoc to document:

Software applications
Software release notes
Software APIs
Protocols
File formats

This is my bias.

Missing semantics

Graeme Smecher
wrote
on the mailing list:

A friendly mark-up anchored to an industrial-strength document
model is AsciiDoc's killer feature for me. I'm interested in
reducing
the impedance mismatches between AsciiDoc and DocBook.

Having this impedance reduced is also my principal ambition.

I want AsciiDoc to offer as many semantic markup as possible while
remaining as lightweight as possible (otherwise I'd just write
DocBook directly).

Considering this, here's the list DocBook tags of which an equivalent
markup is missing from AsciiDoc (as far as I know) for my use cases:

General:
- abbrev
- acronym
- date
- firstterm
- replaceable
- see
- seealso
- termdef
- wordasword
Revision:
Procedure:
Task:
Manual page reference:
Software (general):
- command
- database
- envar
- errorcode
- errorname
- errortext
- errortype
- filename
- markup
- menuchoice
- msg
- msgaud
- msgentry
- msgexplan
- msginfo
- msglevel
- msgmain
- msgorig
- msgrel
- msgset
- msgsub
- msgtext
- optional
- package
- prompt
- property
- screenshot
- synopsis
- systemitem
- uri
- userinput
GUI (software):
Command description (software):
- arg
- cmdsynopsis
- command
- option
- sbr
Software programming:

I get that for many inline elements, you can use hash symbols
with a custom class:

Pass an [.type]#std::string# object to the [.func]#setName# function.

Is this the intention? If so, it's still not specified and up to the
writer. I suggest to formalize this, using another syntax than the
class attribute, for example:

Pass an [:type]#std::string# object to the [:function]#setName#
function.

Syntax improvements

Here are a few syntax improvement suggestions, in order of importance
for me.

List item continuation

As a tech writer, what I use the most outside paragraphs are lists:
unordered, ordered, and description.

Those lists often contain items which can get rather complex.
I've always had a hard time dealing
with list item continuation in AsciiDoc. I find the + syntax is
annoying at best. Sure you can use open blocks, but
you
can't nest them:

* This is a list item.
+
It continues here.
+
And here.

* Sure you can use an open block:
+
--
Like this!

But then what if you want to nest another list here?

* You can.
+
But you can't use an open block because the beginning and end delimiters
are the same.
--
+
First level continued.

Also, I find the "unnesting" syntax, where the number of newlines above the
following + on a single line indicates how many levels to
go back, very confusing:

* Level 1.
** Level 2.
+
Some more level 2 content.
+
*** Level 3.
+
Level 3 continued.

+
Level 2 continued.
*** Other level 3.
+
Other level 3 continued.

+
Level 1 continued.

* Other level 1.

Is this readable to you?

This issue (for me at least) includes the syntax to nest lists, where
you use more *, more ., or more :
depending on the list type when not using open blocks:

* Level 1.
** Level 2.
... Level 3.
** Other level 2.
Question:::
Answer.
Subpoint::::
Subpoint content.
+
Subpoint continued.
Other subpoint::::
Other subpoint content.
Other question:::
Other answer.

** Yet another level 2.

I know AsciiDoc does not rely on indentation usually, but what I'm
suggesting is make an exception here, at least optionally, to nest lists and
to continue list items, just like Markdown does.
There might be limitations, but as far as I know I see none.

Here are the two previous examples reformatted to use identation to
nest and continue items:

* Level 1.
* Level 2.

Some more level 2 content.

* Level 3.

Level 3 continued.

Level 2 continued.

* Other level 3.

Other level 3 continued.

Level 1 continued.

* Other level 1.

* Level 1.
* Level 2.
. Level 3.
* Other Level 2.

Question::
Answer.

Subpoint::
Subpoint content.

Subpoint continued.

Other subpoint::
Other subpoint content.

Nested open blocks

As mentioned above, you can't nest open blocks.

The suggested solution in the GitHub issue is to use ~~~~
to delimit open blocks, adding more ~ to nest them:

~~~~
An open block.

~~~~~
A nested open block.
~~~~~

Continued open block.
~~~~

While this at least provides a solution, why not use a dedicated
closing delimiter instead?

Here's an example, reusing the -- delimiter
we know to begin an open block:

--
An open block.

--
A nested open block.
/-

Continued open block.
/-

There might be forms that are more visually appealing. For example,
using < to open and > to close on
single lines:

<
An open block.

<
A nested open block.
>

Continued open block.
>

In fact, why not use this strategy for any nestable block?

Block title

According to
Title:

A block title is defined on a line above the element. The line must
begin with a dot (.) and be followed immediately by the
title text.

Example:

.Using `printf()` with a signed integer.
====
[source,c]
----
printf("here's an integer: %d\n", 23);
----
====

Sometimes the block title can be long, especially for example titles.

I therefore suggest to have a way to continue the title on the
following line(s) in some way. For example, using a single space
on the following lines:

.Using `printf()` with a signed integer, an unsigned integer,
and a C{nbsp}string.
====
[source,c]
----
printf("here's a bunch of stuff: %d, %u, `%s`\n", 23, 77U, "hello");
----
====

Dedicated non-breaking space and hyphen shorthands

I often need non-breaking spaces. I use them between:

Numbers and units
Project names and versions
Titles and first names
Days and months
Months and years

and more.

You can use {nbsp} to write a non-breaking space and
‑ to write a non-breaking hyphen.

I suggest to have built-in shorthands for both of them. LaTeX uses
~ for a non-breaking space.

Macros

AsciiDoc (Python) has macros and attributes while Asciidoctor has
extensions (Ruby/Java/JavaScript) and attributes.

Should the AsciiDoc specification include an official macro language?

What I mean by macro is a template of AsciiDoc content with variable
placeholders. The expanded macro can become block content or inline
content.

Here's a fictitious example:

:func: https://myproject.org/docs/v{#1}/{#2}.html[`{#2}()`]

See the {web 3.2 replaceText} and {web 3.4 substitute} functions for
more details.

Here's another example for block content:

:rlq:
[quote, René Lévesque, Défaite du Oui au référendum de 1980]
{#1}

You can't always get what you want, as Jagger sang.

{rlq
"Si j'ai bien compris, vous êtes en train de me dire{nbsp}:
à la prochaine fois."}

But those who dare to fail miserably can achieve greatly.

Re: Here's my take.

Submitted by Philippe Proulx on Thu, 04/30/2020 - 16:45

To add to this: I thought I was commenting the specification proposal here, but now I understand those are supposed to be project proposal comments.

So I might post this comment again at the appropriate location when the specification draft takes form.

Re: Re: Here's my take.

Submitted by Dan Allen on Thu, 05/28/2020 - 18:16

Thank you for taking the time to share this input. Indeed, these points are best suited for the AsciiDoc specification list once this proposal is approved and the mailinglist is up and running.

I do want to emphasize that the focus of this spec is not on creating a new language with new syntax, but rather to standardize and evolve (within reason) the existing syntax. We can and should address matters of semantics, but we're not aiming to fundamentally alter the syntax, such as changing the fences for delimited blocks (aside from the open block issue). An AsciiDoc document written today should still continue to work with the standard processor. Just something to keep in mind when we discuss enhancements to the syntax.

Scope ideas from an IDE perspective

Submitted by Alexander Schwartz on Thu, 04/30/2020 - 17:46

I'm the current maintainer of the AsciiDoc IntelliJ plugin, and I'm taking the perspective of a IDE-plugin developer for this comment.

Please reply and let me know if you second any of these ideas for the scope of the proposals, or if you consider them part of the existing proposal.

From an IDE perspective I'd like to see the following elements to be part of the scope:

Retrieve meta-information at runtime for auto-complete of macros, preprocessor-directives and attributes
Mapping output to the source input, to allow the user to trace back an output in the preview to a source line
Extentions should provide an abstraction to the file system and references to allow advanced reference/content systems like Antora.
Extensions should be ordered. They should be able to hook into the processing and delegate to a previous implementation. This way an extension could add an attribute to an existing macro, or convert an attribute's content before delegating it to the original implementation.
For spell checking an adoc-file meta-information should be available to find out what parts of a macro contain information that should be spell checked

ad 1: The meta-information at runtime should include built-in and active extensions for a list of available macros and attributes. Each macro and attribute should provide a textual self-description for a (human) writer. Each macro (extension or built-in) should provide a list of supported attributes. Each attribute should provide a sample and default value and possibly also a type so that the IDE can trigger auto-complete for file names, IDs, etc.

ad 2: The Asciidoctor HTML output already implements it by adding additional data attributes to some HTML tags, but doesn't attach it to all tags. It is currently based on line level. Future implementations could provide line and column information. Sourcemaps would be a method of implementation, but might depend on the output.

ad 3: I assume this is covered by either "Internal and external referencing system" or the Extension API "(path) resolvers", I just want to make sure either of them can be used for this.

ad 4: When retrofitting some Antora style behavior to Asciidoctor Ruby I used "prepend" to monkey-patch some of the necessary functionality. With a mechanism as described above this would not have been necessary.

Re: Scope ideas from an IDE perspective

Submitted by Dan Allen on Thu, 05/28/2020 - 18:30

Retrieve meta-information at runtime for auto-complete of macros, preprocessor-directives and attributes

Yes, I consider this part of the AST / DOM. Unlike the existing AsciiDoc processors, a standard processor should be able to capture and make available all the information about the parsed document. We'll need to work out how all that is stored, but it needs to be in there somewhere.

Mapping output to the source input, to allow the user to trace back an output in the preview to a source line

Yes, source-level information will be available for each parsed node, and perhaps even lower-level than that.

Extentions should provide an abstraction to the file system and references to allow advanced reference/content systems like Antora.

Great idea. That will come into play when we get into extensions. (We're not sure yet whether extensions will be in the main language spec or a supplemental spec).

Extensions should be ordered. They should be able to hook into the processing and delegate to a previous implementation. This way an extension could add an attribute to an existing macro, or convert an attribute's content before delegating it to the original implementation.

This is defintely a detail that the extensions part of the spec (or supplemental spec) will need to address. There are two concerns here...one is about extention hierarchy and one is about extension ordering relative to one another. Asciidoctor gives us some direction here, though we need to address where it leaves ambiguity.

For spell checking an adoc-file meta-information should be available to find out what parts of a macro contain information that should be spell checked

In general, just having source information for each node will help a lot here. I do like the idea that nodes self identify as having content that should be considered / processed by a spell checker...or perhaps something more high level. Certainly a great idea to consider.

All in all, the key point to keep in mind is that one of the key goals is to parse the language fully. When AsciiDoc began, it was a streaming processor that offered no access to a parsed document. Asciidoctor introduced a document model and parsed down to the block level. The standard language will require that mapping to be complete down to the lowest reasonable level, certainly inline nodes and maybe even characters.

Re: Scope ideas from an IDE perspective

Submitted by Aleksey Kladov on Thu, 11/19/2020 - 15:02

I'd like to add one more ask from an IDE perspective. I do IDEs professionally, but I haven't done IDE for markup langauges, so this is medium-level confidnce.

My understanding is that today `include` directive works by raw textual inclusion, much like C's include. That's a bit problemantic from an IDE perspective, as it precludes building an AST for an isolated file. Ability to parse document without accessing context is something that makes life of IDE writers much easier, as it allows parsing documents in parallel, in isolation and with trivial cache invalidation rule.

A possible solution here is to make `include` more structured and require it to work on a DOM level. That is, processor conceptually *first* parses both files in isolation and builds a syntax/documnt tree, and then it inserts a subdocument's tree into the main doc.

One request

Submitted by Phil Beauvoir on Fri, 07/03/2020 - 03:56

Just please don't rename it to "Eclipse AsciiDocium".

If I can choose only one thing... or two..

Submitted by Ioannis Stavrakakis on Wed, 11/10/2021 - 05:18

I would go for the procedure and the task suggestions.

Those two alone will standardize at least 70% of my content issues.