You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been using the tree-sitter C API and, while it is great and pretty easy to use, it seems to lack some notes on APIs that would be very helpful. Here's a few I've noticed:
There doesn't appear to be anything in api.h which describes these values other than what you might intuit from the names. Some clarifying things could be:
Is column a UTF8, UTF16, etc. codepoint-based column? Or is it a byte offset in that particular row?
Is column or row 0-indexed or 1-indexed? 1-indexed is useful for presenting directly to the user vs 0-indexed which is more natural to program against.
It is not quite self-explanatory what byte positions we're talking about here. Here's what I gathered from using it:
start_byte indicates the place where the edit happened.
old_end_byte is the end byte for that specific edit and not the end byte of the entire buffer. Don't use the latter or you'll end up reparsing everything.
new_end_byte, like old_end_byte is the new offset from that specific edit.
TSParser
Can I use the same parser for different texts? e.g. is it valid to do something like:
TSParser*parser=ts_parser_new();
/* setup parser for language */TSTree*t1=ts_parser_parse(parser, nullptr, input1);
TSTree*t2=ts_parser_parse(parser, nullptr, input2);
To parse two entirely different buffers as long as they happen on the same thread?
Per-frame efficiency
Guidance in how to pattern match trees efficiently on a per-frame-basis would be extremely useful. e.g. preallocate all the objects before the frame and run ts_query_cursor_exec over a minimal tree selected through something like ts_node_descendant_for_byte_range and constrain the query via ts_query_cursor_set_byte_range.
Additionally, what is the most efficient way of ordering captures by widest range captured per byte-offset?
The text was updated successfully, but these errors were encountered:
What is the best way to handle batch edits? e.g., can I repeatedly call ts_tree_edit and then, once the batch is complete, I can parse that and get back a valid tree? I haven't tried this, but I see no reason it can't work this way.
I have been using the tree-sitter C API and, while it is great and pretty easy to use, it seems to lack some notes on APIs that would be very helpful. Here's a few I've noticed:
TSPoint
There doesn't appear to be anything in api.h which describes these values other than what you might intuit from the names. Some clarifying things could be:
column
a UTF8, UTF16, etc. codepoint-based column? Or is it a byte offset in that particular row?column
orrow
0-indexed or 1-indexed? 1-indexed is useful for presenting directly to the user vs 0-indexed which is more natural to program against.TSInputEdit
It is not quite self-explanatory what byte positions we're talking about here. Here's what I gathered from using it:
start_byte
indicates the place where the edit happened.old_end_byte
is the end byte for that specific edit and not the end byte of the entire buffer. Don't use the latter or you'll end up reparsing everything.new_end_byte
, likeold_end_byte
is the new offset from that specific edit.TSParser
Can I use the same parser for different texts? e.g. is it valid to do something like:
To parse two entirely different buffers as long as they happen on the same thread?
Per-frame efficiency
Guidance in how to pattern match trees efficiently on a per-frame-basis would be extremely useful. e.g. preallocate all the objects before the frame and run
ts_query_cursor_exec
over a minimal tree selected through something likets_node_descendant_for_byte_range
and constrain the query viats_query_cursor_set_byte_range
.Additionally, what is the most efficient way of ordering captures by widest range captured per byte-offset?
The text was updated successfully, but these errors were encountered: