Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Punctuation Character for Code Block's language specifier #281

Open
RonaldZielaznicki opened this issue Mar 21, 2024 · 6 comments

Comments

@RonaldZielaznicki
Copy link

RonaldZielaznicki commented Mar 21, 2024

Proposal

Utilize a punctuation character for fenced code block language specifier.

I use ? below, but would be happy for any character.

Why?

Current syntax introduces ambiguity and breaks djot's design goals. Namely goal 7 and 3 (in spirit).

In addition to resolving these breaks from the goals, it'd allow easier parsing. As is, it's not until we get to the second word of a line that a parser would know the line is the start of a paragraph rather than a code fence.

Rules 3 and 7 - ambiguity and friendly to hard wrapping

Paragraph

Here we have a paragraph that I'd like to introduce a hard break to. Live Demo

``` I'm a paragraph
```
<p><code>I'm a paragraph</code></p>
Code Block

Add a hard break at the wrong spot, and we no longer have a paragraph. Live Demo

``` I'm
a paragraph
```
<pre><code class="language-I'm">a paragraph</code></pre>

What it'd look like

Using a specifier character, we know from the beginning that this is a fenced code block or a paragraph

With Specifier Character

```? I'm not a paragraph
```
<pre><code>I'm not a paragraph</code></pre>

Without Specifier Character

``` I'm
a paragraph```
<p><code>I'm a paragraph</code></p>

With Language

```?djot I'm not a paragraph
```
<pre><code class="language-djot">I'm not a paragraph</code></pre>

Optional Spacing

``` ?djot I'm not a paragraph
```
<pre><code class="language-djot">I'm not a paragraph</code></pre>

Raw Fencing

```
I'm not a paragraph
```
<pre><code>I'm not a paragraph</code></pre>

Additional Benefits

Multiple words are allowed in a code fence, as suggested in #214.

Additional Thoughts

The character doesn't need to be ?. It's just a character I thought would be easy to utilize in this context. = would of been my first choice, but that's used for raw html blocks.

Alternatives

Use tilde instead of backtick

Having punctuation characters pull double duty is what creates the ambiguity. Backtick is used for verbatim and code blocks. Tilde is used for subscript. But, unlike in verbatim, a series of ~ doesn't do anything special for subscripts. I'd lean heavier into this alternative myself, but backticks for code fences is fairly well understood by users and they'd be the ones who'd have to adapt.

Block Attributes

I'd love to use a block attribute to set the language specifier, but that doesn't resolve the ambiguities nor the non-friendliness towards hard breaks.

Related or Similar Issues

#41
#214

@jgm
Copy link
Owner

jgm commented Mar 21, 2024

Actually, the current djot.js parser does allow ~~~ for code blocks, even though this isn't mentioned in the syntax description. So requiring that is a tempting solution to the ambiguity problem.

On the other hand, using ``` has a pleasing conceptual simplicity; it's not too different from """ for multiline strings and " for inline strings in some languages.

Curious to hear other comments on this.

@RonaldZielaznicki
Copy link
Author

RonaldZielaznicki commented Mar 21, 2024

Actually, the current djot.js parser does allow ~~~ for code blocks, even though this isn't mentioned in the syntax description. So requiring that is a tempting solution to the ambiguity problem.

Yup! It's why I pushed it as an alternative. Glad we're aligned there.

On the other hand, using ``` has a pleasing conceptual simplicity; it's not too different from """ for multiline strings and " > for inline strings in some languages.

It is pretty intuitive at this point to reach for ``` isn't it? I didn't even know about ~~~ as a fence until I glanced at the js implementation to see how it handled the language specifier. Then tried it with github's markdown preview.

Curious to hear other comments on this.

Same. This other issues linked above are already filled with plenty of insights, but I don't think any of them touched on this specific issue/proposal.

@RonaldZielaznicki
Copy link
Author

RonaldZielaznicki commented Mar 21, 2024

Ah, one more comment on:

Actually, the current djot.js parser does allow ~~~ for code blocks, even though this isn't mentioned in the syntax description. So requiring that is a tempting solution to the ambiguity problem.

Requiring ~~~ as the code block fence doesn't completely solve the ambiguity issue unless a blank line followed by ~~~ always becomes a code block. But even then, having a specifier character would help because of the ambiguity caused by hard breaks and whether the first word is a language specifier or not.

Without Hard Break

~~~ I'm a paragraph
~~~

leads to

<p>~~~ I’m a paragraph~~~</p>

With Hard Break

While

~~~ I'm
a paragraph
~~~

becomes

<pre><code class="language-I'm">a paragraph
</code></pre>

@Omikhleia
Copy link

The rule being:

A code block starts with a line of three or more consecutive backticks, optionally followed by a language specifier, but nothing else.

Then

``̀` Some things
``

Should perhaps

  • Be a syntax error
  • Or a language specifier "Some things"

(Depending on imposing or not restrictions on language specifiers)

@RonaldZielaznicki
Copy link
Author

@Omikhleia As is, calling it a syntax error might be difficult. Verbatim text, is:

Verbatim content begins with a string of consecutive backtick characters (`) and ends with an equal-lengthed string of consecutive backtick characters.

Material between the backticks is treated as verbatim text (backslash escapes don’t work there).

If the content starts or ends with a backtick character, a single space is removed between the opening or closing backticks and the content.

If the text to be parsed as inline ends before a closing backtick string is encountered, the verbatim text extends to the end.

This is verbatim text:

`Verbatim text`

but so is this:

```Verbatim text```

@RonaldZielaznicki
Copy link
Author

RonaldZielaznicki commented Mar 22, 2024

After writing that last comment, I think I'm slowly pushing myself over towards using tilde instead of backticks.

So, a code block would look like:

~~~
I am not a paragraph and I have no language specifier
~~~

or

~~~ I am not a paragraph and I have a language specifier
~~~

(Everything after ~~~ but before a new line ends up as the language specifier. "I am not a paragraph and I have a language specifier" in this case)

Which has a number of advantages:

  • We'd get rid of the ambiguity between paragraphs and code blocks.
  • We can't accidentally a paragraph from a code block
  • and we don't add new punctuation syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants