Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: explicit syntax for custom tags #240

Open
matklad opened this issue Aug 20, 2023 · 9 comments
Open

Proposal: explicit syntax for custom tags #240

matklad opened this issue Aug 20, 2023 · 9 comments

Comments

@matklad
Copy link
Contributor

matklad commented Aug 20, 2023

This proposal is a synthesis of #239 and #146 and organized in TL;DR, What? and Why? sections, where the Why? is the most important.

TL;DR

Change djot such that the following input:

Shortcut: :kbd[Ctrl+C]

::: details
Copies text
:::

produces the following HTML:

<p>
    Shortcut: <kbd>Ctrl+C</kbd>
</p>

<details>
    <p>Copies text</p>
</details>

What?

Specifically:

  1. Change the Ast for div and span to be
 interface Div extends HasAttributes {
   tag: "div";
+  tag_name: "string"
   children: Block[];
 }

 interface Span extends HasAttributes {
   tag: "span";
+  tag_name: "string"
   children: Inline[];
 }
  1. Change the parsing rule for ::: spam to use "spam" for tag_name, rather than a class.

  2. Changing parsing rules for bare ::: and [] to set tag_name to "div" and "span",
    respectively.

  3. Add new concrete syntax :tag-name[], that is, :(\S+)\[ where $1, an arbitrary sequence of non-whitespace symbols, is a tag_name, and the rest is the usual span syntax. This concrete syntax produces a Span AST with the corresponding tag_name set.

  4. Change default HTML renderer to use tag_name when rendering span and div elements.

The most invasive change here is 4, as it adds a bit of new syntax to djot and directly enlarges the surface area.

Why?

This single solution fixes several "problems" in the current version of djot, some big an some small. I list them roughly in order of priority:

Problem: users need a lightweight approach for producing custom HTML interspersed with normal djot.

Today, djot provides a ``` =HTML syntax to embedded raw HTML (or any other format). The problem here is that its all-or-nothing: everything inside =HTML needs to be HTML. You can't use that to wrap a part of a djot document into a custom tag:

This is Djot!

``` =HTML
<details>
    This *isn't* Djot :sob:
</details>
```

This is solvable by using a custom filter/renderer, but that's a significant step up in complexity, and might not be available to the user (e.g., a forum software using Djot for comments could alow raw HTML(with sanitization), but won't allow custom filters). In a more ad-hoc way, it's possible to split the raw block in two

This is Djot!

``` =HTML
<details>
```

Ok,  this *is* Djot :weary:

``` =HTML
</details>
```

but that's not quite as pretty as some might want!

With the proposed solution, the above can be written simply as

::: details
This *is still* Djot :smile:
:::

Naturally, = HTML doesn't go away: that's still the right tool for raw HTML, but we now gain a way to add HTML-Djot sandwiches.

Note that while I say HTML, this feature applies to any roughly XML-shaped output format. For example, a docbook renderer could use that to emit arbitrary docbook elements, and a LaTeX renderer could emit a

\begin{environment}

\end{environment}

pair.

Problem: extensibility properties of Djot are not obvious and need better explanation.

The core feature of Djot is that its syntax is fixed, but it is still extensible because the syntax is flexible enough to encode arbitrary attributed trees which could be interpreted specially by the renderers. This is a somewhat subtle and non-obvious point, and may not be immediately clear to the new users.

With this proposal, Djot gains an explicit first-class syntax for custom elements. We can clearly document that ::: plugin and :plugin[] is how one extends Djot. In terms of expressive power, this is exactly equivalent to []{.plugin} of course, but is easier to explain and search for.

Overloading .class syntax to mean custom tags/elements is harder to teach.

Problem: it's impossible to express arbitrary HTML in a Djot filter.

Djot has two programmatic extensibility mechanisms:

  • filters transform Djot AST to another Djot AST
  • renderers transform Djot AST to the target format, such as HTML

Filters are generally nicer, they are target-format-independent and composable (you can chain several filters together, because input and output have the same type). However, you can't use a filter to emit an HTML node not already used by a renderer, unless you resort to raw half-nodes, which is ugly, and output-format specific.

With this proposal, filters gain full power of HTML, while keeping a nice, well-typed tree structure. Fewer things need to be custom renderers, more things can can be filters.

Problem: the ::: spam syntax is not orthogonal

In today's Djot, the following two are equivalent:

::: spam
:::

{.spam}
:::
:::

In the following example, both classes are on equal footing semantically, although syntactically one feels like it should be the primary:

{.spam}
::: eggs
:::

The proposal fixes makes the syntaxes orthogonal by adding a new dimension. ::: spam is no longer a class, it is a tag name.

Problem: when reading custom elements existing "introducer last" syntax requires the reader to backtrack.

Consider a custom element in today's djot: [Ctrl+C]{.kbd}. Here, the + would be interpreted specially by the renderer as a notation for shortcuts. However, if you read this left-to-right, you need to look ahead to {.kbd} to get the context for interpreting the +.

In the proposal, this looks like :kbd[Ctrl+C] --- introducer keyword, kbd, is leading, so a one-pass left-to-right visual scan tells you everything.

Problem: smarter editors and IDEs need to know context to provide helpful suggestions.

Let's say you added a custom citation element to Djot, which looks like [foo, p. 15]{.cite}. A smart editor should be able to auto-complete foo from your references library, but, if you are typing this left-to-write, by the time you get to [foo] IDE doesn't yet know that it's going to be a cite.

With the proposal, as soon as you've typed :c, the IDE can suggest auto-completing that to :cite[] and then show completion list for actual citations.

@Omikhleia
Copy link

Omikhleia commented Aug 20, 2023

Djot was beyond Markdown, keeping its legacy:

{.myblock}
:::
This is _Djot_{.underline}

 - Apples. 
 - Oranges.

> Quote
:::

This proposal opens Pandora's box:

{.myblock}
:::: div
::: p
Are we saying this would be the new em:[u:[Djot{.dubious}]]?
:::
::: ul
li:[Apples.]
li:[Oranges.]
:::
::: blockquote
Quote
:::
::::

And I wonder what a LaTeX renderer (say, or SILE) would then do. Have to support div, p, ul etc. environments and commands, or magically recognize a subset of the HTML (and why not, DocBook or any random schema) tag set to map them to appropriate commands?

I am afraid the problems supposedly solved might be worse. Or did I miss something?

@matklad
Copy link
Contributor Author

matklad commented Aug 20, 2023

And I wonder what a LaTeX renderer (say, or SILE) would then do.

There's no any special handling of tags. For example, the SILE renderer would do exactly what SILE XML Flavor would do, namely, interpret the document as

\begin[class=myblock]{div}
\begin{p}
Are we saying this would be the new \em{\u{\span[class=dubious]{Djot}}}?
\end{p}
\end{div}

This might, or might not produce a valid SILE document, depending on which custom SILE commands the user has defined.

Stated positively, the user gains access to all their pre-existing custom SILE commands without having to define custom Djot renders or filters. So, if the user has

\define[command=red]{\color[color=red]{\process}}

defined, they can use

Making things red is a red:[silly] way to emphasise text.

in their djot

@Omikhleia
Copy link

Omikhleia commented Aug 20, 2023

Well I am afraid I have to disagree on everything then...

  • This makes Djot just a replacement for random XML with a different syntax... One can invent a Djot/Markdown-inspired syntax for arbitrary XML, sure, but it's no longer the same thing.
  • This is fragmenting the portability of input files (so to keep this "div" example, one has to provide an implementation of it for LaTeX, SILE, and any other target renderer he might consider using at some point?) and the possibility for conversion tools to operate gracefully.

(I don't think it's the place to discuss the SILE examples, but the SIL language should be completely avoidable, and the user shouldn't need custom commands to do this kind of things. Styles are a better paradigm with a nicer separation of concerns)

@Omikhleia
Copy link

Omikhleia commented Aug 20, 2023

Still, an additional comment though:

The user gains access to all their pre-existing custom SILE commands without having to define custom Djot renders or filters.

Would the user really want to do this, with markdown.sile they don't need to define custom Djot renders or filters, indeed. The following works:

``` =sile
% Or defined in Lua with =sile-lua, or implemented elsewhere in a class, package, wrapper document, your call.
\define[command=red]{\color[color=red]{\process}}
```
Making things red is a [silly]{custom-style="red"} way to emphasize text.

And all other things equal, it does work identically whether the input is a Markdown file, a Djot file, or a Pandoc JSON AST1.
(Not that I really recommend this, but it's already available2, though again I would recommend using styles rather than direct commands)

Footnotes

  1. For the rare cases (now) where some syntax extension not covered by the native implementation would be needed. Tables in some other format than the "pipe tables" in (extended-)Markdown, for instance.

  2. EDIT: And, before someone asks, I favored the custom-style key trick over a class attribute there indeed: influenced, for what is worth, by what Pandoc does with Word docx conversion -- so that defining a "red" character style in a Word reference document and converting to docx with Pandoc should indeed then also work as intended.)

@jgm
Copy link
Owner

jgm commented Aug 20, 2023

This is an interesting and well thought-out proposal. It does go in a somewhat different direction than I'd originally had in mind, but I see its good points.

The original conception was that if you wanted to do something like <details>, you'd simply write

::: details
## This is the summary.

And here's the rest.
:::

and then make use of a filter that replaces this with AST nodes including the raw HTML <details>, <summary>, etc. It's true that the filter needs to be format-specific -- though in pandoc at least, filters can conditionalize on the output format (I forget whether we built that into djot.js).

This proposal would allow you to do

:::: details
::: summary
This is the summary
:::

And here's the rest
::::

which is a bit more verbose and relies more on English keywords, but it would work out of the box without filters.

The proposed change would be breaking for existing djot documents that used

::: classname
...
:::

but maybe that is okay as the language is still in an experimental phase.

The proposed change would make the djot AST less compatible with the pandoc AST (which doesn't have a notion of "tag name"), and this would make pandoc interoperability less smooth.

In general I don't like to rely on English language keywords. Perhaps one could work around that, though, by introducing the concept of a "tag dictionary" that allows you to define your own aliases for tag names?

If we did implement the prefix :defn[] style notation, it would be good to impose some restrictions on the characters allowed in tag names and also a length restriction, to keep parsing fast.

You are right that allowing a special name for spans restores symmetry with what we now have for divs. However, there's also a question of symmetry with verbatim containers (code spans and code blocks). For example, in LaTeX you might want

``` tikz
arrow(whatever) -> node(thing)
```

to produce a tikz environment instead of verbatim. But doing this automatically conflicts with the role we've given to this position for specifying the "language." There's also a question whether code spans should have something similar? :kbd`*3b*`

As for syntax, I fear that the tag name in :tag[...] looks a bit too much symbol syntax (just missing the final :). Of course, we could remove that problem by just using a symbol for this purpose: :tag:[...], but this might not be ideal. Another option could be !tag[...] which is reminiscent of the image syntax.

@matklad
Copy link
Contributor Author

matklad commented Aug 20, 2023

@Omikhleia

This is fragmenting the portability of input files

Yeah, that's the big thing here! One can view Djot as eihter:

  1. a relatively self-contained markup language for documents with a closed set of syntactic constructs
  2. or an open-ended constructor for domain-specific formats

This proposal pushes us more towards the second interpretation (but note that they are not mutually exclusive --- some people may use djot as 1, and some might use it as 2)

As you've rightfully notice, everything expressible with this proposal is already possible with custom attributes and classes, the "custom tags" thing just basically formalizes this pattern.

And that nicely segues in @jgm first point! Even under this proposal I would expect people to write

::: details
## This is the summary.

And here's the rest.
:::

and handle this as a filter by default. The "raw html" mode I think is needed solely as an escape hatch.

However, under the new proposal its syntactically apparent that ::: details is some custom element. In the status quo with using "magical" classes, it's less clear whether that's indeed a custom element, or just a pure-style .class.

That's probably what I like aesthetically most here --- that we clearly separate the "semantics" attribute from the style ones (including adding invariant that there's at most one custom tag, but many classes).

relies more on English keywords

I was under the impression that we already don't restrict class names and such to be English, but apparently that's not the case. It feels a bit strange that the following is parsed differently

x{.foo} x{.бар}

I would say if we are fine with class names being English, we should be fine with tag-names being English also (but it might be a good idea to include some quoted syntax then just in case, eg ::: "бар" to be analogous to {class="бар"}).

but maybe that is okay as the language is still in an experimental phase.

FWIW, this is something that worries me quite a bit. The page https://djot.net doesn't say that Djot is in an experimental phase, and makes it look like its quite finished. Ideally, we'd be more clear with communicating our stability promise.

As for syntax, I fear that the tag name in :tag[...] looks a bit too much symbol syntax (just missing the final :)

Yeah, I think syntactically the salient bits are that:

  • there's a dedicated place for a single name, which is different from potentially repeated class names.
  • the name goes before the element

As for particular syntax, !tag[ definitely works!

@jgm
Copy link
Owner

jgm commented Aug 20, 2023

I don't think there was any intention to exclude non-English class names! If we do it seems like a bug. The attribute grammar in attributes.ts does say that keywords need to be ascii, but not classes or identifiers.

@bpj
Copy link

bpj commented Aug 21, 2023

See also #197 and #192 where I proposed another use for ::: tag, namely to provide "hints" for the parser.

I'm thus all for storing these "tags" specially in the AST. What worries me is that this proposal seems very HTML-centric for such a "central" syntax feature. I think it is important that djot is output-format agnostic, not favoring any one output format. While I do not yet use djot for real (the lack of a metadata — and other data in the spirit of #192 — syntax which is interoperable with Pandoc is the main show stopper for me) I really like most of the syntax features where djot differs/adds to Markdown, but my typical target format is PDF via LaTeX. If this means "tags" are stored separately in the ast and can be used for anything by parsers, filters and renderers I'm all for. If this means that "tags" become unusable unless you target HTML/XML, or even djot gets tied to those formats I'm actually worried!

@matklad
Copy link
Contributor Author

matklad commented Nov 15, 2023

As a data point, someone laments the inability to create HTML/djont sandwiches without writing custom filters:

https://lobste.rs/s/wrksua/data_oriented_blogging#c_pzqjot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants