Few missing pieces of documentation from the C API #2946

cdacamar · 2024-02-07T04:18:45Z

I have been using the tree-sitter C API and, while it is great and pretty easy to use, it seems to lack some notes on APIs that would be very helpful. Here's a few I've noticed:

`TSPoint`

typedef struct {
  uint32_t row;
  uint32_t column;
} TSPoint;

There doesn't appear to be anything in api.h which describes these values other than what you might intuit from the names. Some clarifying things could be:

Is column a UTF8, UTF16, etc. codepoint-based column? Or is it a byte offset in that particular row?
Is column or row 0-indexed or 1-indexed? 1-indexed is useful for presenting directly to the user vs 0-indexed which is more natural to program against.

`TSInputEdit`

typedef struct {
  uint32_t start_byte;
  uint32_t old_end_byte;
  uint32_t new_end_byte;
  TSPoint start_point;
  TSPoint old_end_point;
  TSPoint new_end_point;
} TSInputEdit;

It is not quite self-explanatory what byte positions we're talking about here. Here's what I gathered from using it:

start_byte indicates the place where the edit happened.
old_end_byte is the end byte for that specific edit and not the end byte of the entire buffer. Don't use the latter or you'll end up reparsing everything.
new_end_byte, like old_end_byte is the new offset from that specific edit.

`TSParser`

Can I use the same parser for different texts? e.g. is it valid to do something like:

TSParser* parser = ts_parser_new();
/* setup parser for language */
TSTree* t1 = ts_parser_parse(parser, nullptr, input1);
TSTree* t2 = ts_parser_parse(parser, nullptr, input2);

To parse two entirely different buffers as long as they happen on the same thread?

Per-frame efficiency

Guidance in how to pattern match trees efficiently on a per-frame-basis would be extremely useful. e.g. preallocate all the objects before the frame and run ts_query_cursor_exec over a minimal tree selected through something like ts_node_descendant_for_byte_range and constrain the query via ts_query_cursor_set_byte_range.

Additionally, what is the most efficient way of ordering captures by widest range captured per byte-offset?

The text was updated successfully, but these errors were encountered:

cdacamar · 2024-02-08T20:15:53Z

Since I'm here,

Batch edits?

What is the best way to handle batch edits? e.g., can I repeatedly call ts_tree_edit and then, once the batch is complete, I can parse that and get back a valid tree? I haven't tried this, but I see no reason it can't work this way.

ObserverOfTime added question documentation labels Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Few missing pieces of documentation from the C API #2946

Few missing pieces of documentation from the C API #2946

cdacamar commented Feb 7, 2024

cdacamar commented Feb 8, 2024

Few missing pieces of documentation from the C API #2946

Few missing pieces of documentation from the C API #2946

Comments

cdacamar commented Feb 7, 2024

TSPoint

TSInputEdit

TSParser

Per-frame efficiency

cdacamar commented Feb 8, 2024

Batch edits?

`TSPoint`

`TSInputEdit`

`TSParser`