Skip to content
Marco Ciampa edited this page Sep 6, 2019 · 1 revision

This page is the design documentation for internationalization (i18n) support in Asciidoctor.

Purpose

To be able to maintain and publish translations of an AsciiDoc document in a way similar to what is actually done by the po4a tool.

Proposed design

An AsciiDoc document consists of structured content that blends prose (paragraphs, titles labels, etc) and objects (images, listings, etc). Asciidoctor should provide a mechanism for substituting translations of the prose without impacting or duplicating the content structure (“Don’t Repeat Yourself”). It’s also important to provide a way to track which parts of a translation need to be updated when the source document changes.

The most widely adopted solution to translate (in open source) is the gettext system. Using this system, the translation document is a collection of extracted passages from the source (po file), not full standalone document. By using this system, we can leverage existing tooling such as Zanata, Transifex, Pootle or the gettext commandline tools. These tools also provide a means of tracking which parts of a translation need updating (with some help from tool generating the translation file).

The translated prose will contain any inline AsciiDoc markup necessary to retain the meaning and semantic styling of the source content. Performing inline substitutions of translations is likely too tedious for both the technology and the humans using it.

A possible strategy for performing the translation is as follows:

  • Load the AsciiDoc content into a tree structure (AST)

  • Extract all paragraph blocks and write the content to a po file for the target language

  • Translate the paragraph content in the po file, handling all .po format options, especially fuzzy and comment entries

  • Load the AsciiDoc content into a tree structure again

  • Use a TreeProcessor to swap in the translation for each paragraph read from the po file

  • Convert the document to the target format (e.g., HTML5)

A similar procedure could be done for block titles.

The generation of the po files should probably be implemented as a separate tool that uses the Asciidoctor API. The translation substitution can be performed by activating the translation TreeProcessor extension and any related extensions. Thus, it’s probably best to put this code into a dedicated translation tool project named asciidoctor-translate or similar.

The proposed strategy is still limited somewhat by the Asciidoctor parser. For instance, it’s more difficult to access and extract content inside table cells as they aren’t represented as cleanly in the AST. However, these will improve over time, thus improving the degree to which content can be swapped in the AST.

None