Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More structured AST #95

Open
lewis6991 opened this issue Feb 2, 2023 · 3 comments
Open

More structured AST #95

lewis6991 opened this issue Feb 2, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@lewis6991
Copy link
Member

In https://github.com/MDeiml/tree-sitter-markdown sections are represented structurally in the AST. This allows things like https://github.com/nvim-treesitter/nvim-treesitter-context to leverage this structure to provide contexts.

Proposal

Make the current column_heading or h1 node the beginning of a block and nest everything under until the next column_heading or h1.

@justinmk justinmk added the enhancement New feature or request label Feb 2, 2023
@justinmk
Copy link
Member

justinmk commented Feb 2, 2023

Would love to do this--and spent a lot of time trying to make it work--but I failed. The problem AFAIR is codeblock termination can happen on any line.

In https://github.com/MDeiml/tree-sitter-markdown sections are represented structurally in the AST

tree-sitter-markdown has a custom scanner.c. Thus far tree-sitter-vimdoc has avoided a custom scanner, which helped a lot with development velocity. Of course, the door is open to exploring that now that things are mostly working.

Ideally tree-sitter itself would introduce a feature that makes things easier for grammars instead of needing a custom scanner. For example tree-sitter/tree-sitter#160 would provide EOF to the grammar instead of making grammars do insane backflips to deal with that.

@clason
Copy link
Member

clason commented Feb 2, 2023

Would things change if we tighten the requirements to always have a terminating < for codeblocks?

But it should be noted that tree-sitter-markdown also tried and failed and in the end had to switch to a two-pass strategy where one parser only parses the block structure, and a second parser does inline parsing of each individual block. (This works but has obvious performance implications.)

@justinmk
Copy link
Member

justinmk commented Feb 2, 2023

tighten the requirements to always have a terminating < for codeblocks?

Instead of "always", maybe only if the next block is a h1 or column_heading?

So this would be allowed:

foo >
  code
bar >
  code
<

but this would not be allowed:

foo >
  code

=========
h1

This wouldn't result in a perfect AST but might be good enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants