New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
external scanner: support mark_begin() instead of advance(skip=true) #2985
Comments
From #2315, I'm not the only one affected insofar as that "skip" parameter and its description is misleading. Having read that and looked at the code to confirm that "skip" doesn't really equate to "skip this character", but rather "exclude this and all preceding characters from the token", the current API is less limited than it appears to be. Having said that, I think the best way to fix that confusion would be to add a function to the API that complements the "mark_end" function and eliminate that misleading notion of "skipping" individual characters. This is how it is:
Keeping that boolean argument only for compatibility, this is how it could be:
Moreover, that API would make it possible to check the lookahead codepoint before deciding whether to call mark_begin() on the codepoint just advanced over, which I would like to be able to do. |
Yes, I agree 100% @michael-db. This would be a better API, and ideally, we should not have a I'm not sure when and how we should roll out this change, but maybe we should do it before 1.0. |
Actually, it may be possible to do this in a backward-compatible manner. Here's what I think we could do.
/cc @amaanq - just wanted to get your thoughts on this, since I think it would be a good pre-1.0 change. @michael-db are you interested in working on this? |
Thank you for looking into this, @maxbrunsfeld. I had read only a little of the code - not enough to understand what would be involved in changing the API in a safe way. I was researching Tree Sitter in anticipation of using it. That might be a few weeks away, and then I'll have some learning to do, e.g., I haven't worked with WASM before. However, I'd be happy to contribute to a clever and useful tool if I could. How soon would you want a pull request? If it suited, I could take this on provisionally and warn you if I'm unlikely to be ready to start on it within, say, a month. |
Problem
I'm developing a small language where it is not known whether whitespace should be skipped or form part of a token until what is following it is consumed: if it is trailing whitespace on a line then it should be skipped, but in other cases it can be part of a token with internal whitespace. Now I can imagine that this is very unusual and I might even be the only one ever to ask this, but before I decide whether to use tree-sitter, could it be a problem?
I'm provisionally assuming that I will work around this by putting a token for insignificant separator-only whitespace tokens in the grammar (though I don't really want to) and so the fix for this bug report is just to mention that in the docs https://tree-sitter.github.io/tree-sitter/creating-parsers#external-scanners
It would be nice if the "skip" function was separated from the "advance" mechanism (advance, advance, advance, then decide to skip what's been consumed), but of course I don't really expect the API to be changed for this.
Excuse me if I've missed or misunderstood something.
Steps to reproduce
None, doc issue/question.
Expected behavior
Support skip_consumed_chars(). No, OK, doc update then.
Tree-sitter version (tree-sitter --version)
0.20.9
Operating system/version
Linux
The text was updated successfully, but these errors were encountered: