Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is a backslashed space still whitespace? #245

Open
faelys opened this issue Sep 25, 2023 · 5 comments
Open

Is a backslashed space still whitespace? #245

faelys opened this issue Sep 25, 2023 · 5 comments

Comments

@faelys
Copy link

faelys commented Sep 25, 2023

Hello,

sorry to bother you again. As it might be obvious now, I'm implementing a new djot parser, and trying to match existing behavior. Here is something which surprises me as a user (and is somewhat difficult to fit in my parser architecture, but that's my problem):

This _is an escaped space:\ _ and there is no emphasis.

This _is an actual non-breaking space (U+00A0): _ and there is an emphasis.

As a user, I would have expected \ and U+00A0 to be interchangeable, and not be considered as whitespace as far as syntax goes.

Am I in a minority here? Is it worth a specification update?

@Omikhleia
Copy link

Omikhleia commented Sep 25, 2023

Interestingly, currently in the online playground, attributes on the escaped non-breaking space change the behavior:

This _is an escaped space without attributes:\ _ and there is no emphasis.

This _is an escaped space with attributes:\ {.fixed}_ and there is now an emphasis.

This might perhaps be an inconsistency in the current parser or a specification issue on how the inlines nest?

@faelys
Copy link
Author

faelys commented Sep 25, 2023

Having looked intensely at the current parser, the current behavior as I understand it is that emphasis and similar marks look for preceding or subsequent whitespace in the raw source text and not in the AST or any semantic representation, so here adding attributes makes the character before _ a closing brace, which is not whitespace.

I guess specifying a rule about raw source whitespace is as legitimate as a rule about semantic whitespace, but I think even as a basic user I would like to be informed of which one it is (just like I think it was useful to spell out that only ASCII whitespace counts, not the whole unicode class).

@jgm
Copy link
Owner

jgm commented Sep 25, 2023

look for preceding or subsequent whitespace in the raw source text

Correct.

Do you want to make a targeted suggestion about where this should be reflected in the documentation?

@faelys
Copy link
Author

faelys commented Sep 25, 2023

Do you want to make a targeted suggestion about where this should be reflected in the documentation?

My specification-reading skill is a bit weird, so you might want other opinions, but as a user I think I would be satisfied with the following additions:

A _ or * can open emphasis only if it is not directly followed by whitespace in the source text. It can close emphasis only if it is not directly preceded by whitespace in the source text, and only if there are some characters besides the delimiter character between the opener and the closer.

The emphases mark the additions, I don't think any emphasis would be needed in the documentation itself. However these would be the first occurrences of the words "source text", I haven't found any established vocabulary to distinguish between source text, semantic interpretation, and "formatted output".

As a parser-writer I would also welcome an update to the example box below that paragraph, showing that _\ can open emphasis and \ _ cannot close it, but I don't know at which point that makes too many examples.

@bpj
Copy link

bpj commented Sep 25, 2023

As long as there is no standard way to insert characters by reference (e.g. a symbol looking like a Unicode codepoint in U+XXXX format) this is not good. A \ + U+0020 should probably be equivalent to a U+00A0 everywhere (except inside attributes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants