Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Few missing pieces of documentation from the C API #2946

Open
cdacamar opened this issue Feb 7, 2024 · 1 comment
Open

Few missing pieces of documentation from the C API #2946

cdacamar opened this issue Feb 7, 2024 · 1 comment

Comments

@cdacamar
Copy link

cdacamar commented Feb 7, 2024

I have been using the tree-sitter C API and, while it is great and pretty easy to use, it seems to lack some notes on APIs that would be very helpful. Here's a few I've noticed:

TSPoint

typedef struct {
  uint32_t row;
  uint32_t column;
} TSPoint;

There doesn't appear to be anything in api.h which describes these values other than what you might intuit from the names. Some clarifying things could be:

  • Is column a UTF8, UTF16, etc. codepoint-based column? Or is it a byte offset in that particular row?
  • Is column or row 0-indexed or 1-indexed? 1-indexed is useful for presenting directly to the user vs 0-indexed which is more natural to program against.

TSInputEdit

typedef struct {
  uint32_t start_byte;
  uint32_t old_end_byte;
  uint32_t new_end_byte;
  TSPoint start_point;
  TSPoint old_end_point;
  TSPoint new_end_point;
} TSInputEdit;

It is not quite self-explanatory what byte positions we're talking about here. Here's what I gathered from using it:

  • start_byte indicates the place where the edit happened.
  • old_end_byte is the end byte for that specific edit and not the end byte of the entire buffer. Don't use the latter or you'll end up reparsing everything.
  • new_end_byte, like old_end_byte is the new offset from that specific edit.

TSParser

Can I use the same parser for different texts? e.g. is it valid to do something like:

TSParser* parser = ts_parser_new();
/* setup parser for language */
TSTree* t1 = ts_parser_parse(parser, nullptr, input1);
TSTree* t2 = ts_parser_parse(parser, nullptr, input2);

To parse two entirely different buffers as long as they happen on the same thread?

Per-frame efficiency

Guidance in how to pattern match trees efficiently on a per-frame-basis would be extremely useful. e.g. preallocate all the objects before the frame and run ts_query_cursor_exec over a minimal tree selected through something like ts_node_descendant_for_byte_range and constrain the query via ts_query_cursor_set_byte_range.

Additionally, what is the most efficient way of ordering captures by widest range captured per byte-offset?

@cdacamar
Copy link
Author

cdacamar commented Feb 8, 2024

Since I'm here,

Batch edits?

What is the best way to handle batch edits? e.g., can I repeatedly call ts_tree_edit and then, once the batch is complete, I can parse that and get back a valid tree? I haven't tried this, but I see no reason it can't work this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants