Create <whitespace> and <newline>, (, ), etc nodes rather than throwing out tokens #16

bcoe · 2020-12-24T18:13:02Z

So that there's no lost information in the parse tree, we should create <whitespace> and <newline> tokens for spacing characters that fall outside of <text> nodes.

The text was updated successfully, but these errors were encountered:

wesleytodd · 2020-12-28T21:07:09Z

I realized after submitting that review that the pr was merged, so just to keep the context organized I figured I would re-post it here to see if we need to follow up:

I might be wrong here, but I think we need a <blankline> token which is <blankline> ::= <newline>, <newline>+ as the separators between the <summary>, <body>, & <footer>. If I read this correctly as it, you would have a valid summary/body as (only one LF):

fix: summary line
this is my body

Am I missing something here? The implementation looks right, but I think it is just miss-represented in the grammar spec.

#26 (review)

bcoe · 2020-12-29T18:42:58Z

Am I missing something here? The implementation looks right, but I think it is just miss-represented in the grammar spec.

@wesleytodd this was intentional on my part, I see no reason that:

fix: summary line
this is my body

shouldn't parse correctly, since it doesn't cause any trouble for the grammar.

bcoe · 2020-12-29T18:44:27Z

@byCedric @wesleytodd having been playing around a bit with a tree visitor, it jumps out at me that we'll probably also want to add (, ), and any other tokens I might be missing? to the tree.

This allows us to do something like visit(summary, () => true, (node) => { if (node.value) { text += node.value} }), to put a portion of the tree back together into raw text.

wesleytodd · 2020-12-29T21:03:02Z

this was intentional on my part

Not opposed to this, but it is a change (might be breaking even?) from the current spec. What does the current parser do in this case?

This allows us to do something like visit(summary, () => true, (node) => { if (node.value) { text += node.value} }), to put a portion of the tree back together into raw text.

This was my main concern with not having these. If you want to modify and stringify, these markers are very helpful to maintain exact formatting.

byCedric · 2020-12-29T22:51:25Z

@byCedric @wesleytodd having been playing around a bit with a tree visitor, it jumps out at me that we'll probably also want to add (, ), and any other tokens I might be missing? to the tree.

Yeah, it feels a bit odd that we remove those scopes. In this change I had to mitigate it by doing some stuff differently.

One thing we could add is something similar to nlcst's punctuation. Not sure if we have more than the scope parenthesis, but if we do we might want to consider that.

bcoe · 2020-12-30T00:40:08Z

Not opposed to this, but it is a change (might be breaking even?) from the current spec.

I think I've made a few breaking changes as we transcribe the grammar. I'm trying to channel all the things I've seen colleagues accidentally get wrong, and then reach out to me about.

Making the grammar as forgiving as possible, but then having a linter on top of it, I'd say is the ideal. We can have a linter indicate that it's recommended to have \n\n between the summary and body, while still parsing appropriately.

Yeah, it feels a bit odd that we remove those scopes. In this change I had to mitigate it by doing some stuff differently.

I'm happy with a Punctuation node, if it's only parenthesis we don't have in the tree right now, perhaps we could even just call it a Parenthesis node?

bcoe added the enhancement New feature or request label Dec 24, 2020

This was referenced Dec 25, 2020

feat: add greedy whitespace tokenizer #24

Merged

feat: add greedy newline tokenizer #25

Merged

bcoe changed the title ~~Create <whitespace> and <newline> nodes rather than throwing out tokens~~ Create <whitespace> and <newline>, (, ), etc nodes rather than throwing out tokens Dec 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create <whitespace> and <newline>, (, ), etc nodes rather than throwing out tokens #16

Create <whitespace> and <newline>, (, ), etc nodes rather than throwing out tokens #16

bcoe commented Dec 24, 2020

wesleytodd commented Dec 28, 2020

bcoe commented Dec 29, 2020

bcoe commented Dec 29, 2020

wesleytodd commented Dec 29, 2020

byCedric commented Dec 29, 2020

bcoe commented Dec 30, 2020

Create <whitespace> and <newline>, (, ), etc nodes rather than throwing out tokens #16

Create <whitespace> and <newline>, (, ), etc nodes rather than throwing out tokens #16

Comments

bcoe commented Dec 24, 2020

wesleytodd commented Dec 28, 2020

bcoe commented Dec 29, 2020

bcoe commented Dec 29, 2020

wesleytodd commented Dec 29, 2020

byCedric commented Dec 29, 2020

bcoe commented Dec 30, 2020