Skip to content
Wendell Piez edited this page Nov 9, 2022 · 11 revisions

Invisible XML proves to be a capable approach to providing for parsing a text-based format, in an XSLT environment, to produce a structure subject to further processing.

A LMNL Syntax Grammar

Grammar for a reduced LMNL "sawtooth" syntax, in iXML:

       LMNL: (tag, text?)*, tag.
       text: char+. 
      -char: ~["[";"{";"\"]; "\["; "\{"; "\\". { \ as escape character is also escaped so we can represent '\[' }
       -tag: (start | end | empty).
      start: -"[", gi?, ws?, annotation*, ws?, -"}". 
        end: -"{", gi?, ws?, annotation*, ws?, -"]". 
      empty: -"[", gi?, ws?, annotation*, ws?, -"]".
        @gi: name, ("#", cc+)?.
      @name: ic, cc*.
         ic: [L].
         cc: ic; ["0"-"9"]; "."; "_"; "-"; ":".
 annotation: -"[", name?, -"}", -text?, ae. 
        -ae: -"{]". 
        -ws: (" "|#9|#d|#a)+.           { SPACE TAB CR LF }

See https://johnlumley.github.io/jwiXML.xhtml for an iXML workspace.

to do: stress test for top level ambiguities, etc.

TBD - structured annotations, character references, PIs and comments ...

Restrictions

The text to be parsed must start and end with tags (start, end or empty) or a parse error is returned.

A number of issues must be intercepted at the next level by examining the result tree (see below) - this grammar produces only the rough inputs for deriving a LMNL model from the input text, as marked up.

Limitations

Emits a format capable of casting into a range model, but it doesn't capture all of LMNL. In particular:

  • no support for structured annotations, only flat 'values' as annotations
  • only abbreviated annotation syntax is supported
  • name characters are limited to A-Za-z
  • item objects, processing instructions, comments, LMNL declaration and namespaces are not supported
  • ambiguities related to tag ordering are prevented by forbidding tag-only overlap (when range A's end tag appears directly after, not before, B's start tag)

To be supported (in the LMNL model):

  • overlapping ranges
  • arbitrary range (type) names including declarative names
  • empty ranges
  • anonymous ranges and annotations
  • 'self' overlap ("sibling rivalry")

(semi) LMNL tagging in operation

LMNL syntax well-formedness

Follows the grammar, delivers a parse

LMNL properly or coherently tagged - having 'tag integrity' or 'coherence'

  • is well-formed
  • tagging all lines up, with no mismatches or missing tags
    • with no end tags before the first text content or start tags after the end
  • adjoining tagging is given in the order end, empty, start
    • this prevents tag-only overlap from intruding on a simple processing model (where ranges may be ordered but not tags)

Note that assuming it is well-formed, even LMNL syntax that is not properly tagged can be rendered for display (to show errors).

A LMNL syntax transpiler could similarly produce a LMNL wf AST marked up with notices of errors as well as information produced by scanning - suitable also as input (when error free) to the full LMNL range model.

Valid LMNL

LMNL is valid when it conforms to a schema - rules such as

  • which tags (range names or range type names) are permitted
  • which annotations are recognized for which tags;
  • cardinality constraints for ranges and their annotations
  • nesting/overlap constraints - what is and is not permitted to overlap
  • datatype restrictions (lexical and semantic) over annotations