Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor tree-sitter related functionality #330

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Slotos
Copy link

@Slotos Slotos commented Nov 16, 2023

nvim-treesitter is transitioning to being just a parsers manager, with plugins being encouraged to move to independent initialization and implementation.

This commit extracts nvim-treesitter plugin dependencies, copying those too big to reimplement. Additionally, tree-sitter engine is reworked to be an async library. When buffer's filetype is set, a parser is initialized, a throttled reparser is attached to parser's on_bytes event, and schedule_wrapped match extractor is attached to parser tree change event. The latter extracts matches and stores them in a per-buffer dictionary.

All requests to the treesitter backend now fetch data from the aforementioned dictionary. Extra precautions are taken in get_matches request, which now returns no matches if the buffer is known to have changed with a parser that haven't caught up yet.


I removed old tests execution from neovim CI config, as new tests execute them already, and for some reason old tests require 0.5s delays before matchup matching becoms available. Same tests running with new framework need 10ms to 50ms.

@andymass
Copy link
Owner

andymass commented Nov 16, 2023

Thanks, this looking great. Will take me a bit to review. Have a couple questions to start

  1. Can you link to any nvim-treesitter discussions that document what's changing?
  2. Is this pattern now deprecated
require'nvim-treesitter.configs'.setup {
   matchup = {}

I still see it exampled on the nvim-treesitter readme.

  1. Will this lead to misses? That is, the user won't see highlighting since the parser wasn't ready when the matching request was made?
  2. Will there be an overall performance improvement? I have a couple issues related to performance I'm not able to resolve well.

@Slotos
Copy link
Author

Slotos commented Nov 16, 2023

nvim-treesitter/nvim-treesitter#4767 describes the plan for ditching modules framework. There’s no official deprecation notice just yet (I believe it’s waiting for neovim 0.10 release).

This implementation does lead to intermittent misses on buffer changes. For up to 300ms after last change the parser can be out of sync. It hasn’t been a problem for me yet, but independent falsification would definitely help.

Generally, asynchronicity in access to matches would solve this (e.g. reschedule matches processing for the next parser ready state). I’m not confident I can implement that cleanly, however - vimscript knowledge is not my strong suite.

This PR should lead to responsiveness improvement due to match lookup getting disconnected from tree parsing.
That being said, I can describe a case where resource consumption would actually increase - a prolonged monotonous buffer update. In the end, I reasoned that that’s an unlikely scenario that could be tackled by detecting a paste setting if need be.

@Slotos
Copy link
Author

Slotos commented Nov 17, 2023

It's possible to skip on_bytes dance by declaring that vim-matchup requires treesitter highlighting. In that case, re-parsing is handled by highlighter, and vim-matchup can simply react to tree changes.

It can result in a simpler code with no need to consider performance in presence of other plugins that could be running re-parsing routines.

@andymass
Copy link
Owner

declaring that vim-matchup requires treesitter highlighting

This seems reasonable, as match-up already has deferred mode (which could be migrated to lua on_bytes but one thing at a time).

What's the mechanism to tell nvim that we require treesitter highlighting?

@Slotos
Copy link
Author

Slotos commented Nov 18, 2023

What's the mechanism to tell nvim that we require treesitter highlighting?

It would rather be a user requirement. Basically, “to enable tree-sitter subsystem, enable tree-sitter highlighting on a buffer with vim.treesitter.start”.

nvim-treesitter is transitioning to being just a parsers manager,
with plugins being encouraged to move to independent initialization and
implementation.

This commit extracts nvim-treesitter plugin dependencies, copying those
too big to reimplement. Additionally, tree-sitter engine is reworked to
be an async library. When buffer's filetype is set, a parser is
initialized, a throttled reparser is attached to parser's on_bytes
event, and schedule_wrapped match extractor is attached to parser tree
change event. The latter extracts matches and stores them in a per-buffer
dictionary.

All requests to the treesitter backend now fetch data from the
aforementioned dictionary. Extra precautions are taken in get_matches
request, which now returns no matches if the buffer is known to have
changed with a parser that haven't caught up yet.
@Slotos Slotos force-pushed the extract_nvim_ts_deprecations branch from ac2c32f to 34d3362 Compare November 25, 2023 19:10
@andymass
Copy link
Owner

andymass commented Nov 25, 2023

It would rather be a user requirement. Basically, “to enable tree-sitter subsystem, enable tree-sitter highlighting on a buffer with vim.treesitter.start”.

Could you provide some instructions or a link to some to do that? I still think it's better to do things automatically for now but we'll want to move toward nvim convention more as time goes on so want to have an idea of the migration path.

BTW I'm not sure why the vim workflow is failing, can probably just ignore that for now.

@Slotos
Copy link
Author

Slotos commented Nov 25, 2023

I'll describe the whole thing to "reset" the context (if only for my sake).

Current implementation

Tree-sitter module doesn't rely on tree-sitter highlighting. When vim-matchup initialises for the buffer, it calls register_callbacks, which in turn does the following:

  • loads the parser for the file
  • enables syntax if additional_vim_regex_highlighting is desired
  • attaches callbacks to the parser
    • on_bytes re-parses buffer in a throttled manner with initially immediate execution (e.g. for paste actions)
    • on_changedtree re-runs queries and updates matches - this reacts to any changes, regardless of parse source
    • on_detach cleans up resources dedicated to the other two

Downsides

There's one, and it boils down to lack of a central convention on handling parse updates in neovim, or my ignorance of its existence.

Specifically, if multiple plugins implement this behaviour, the timers will interleave, reducing the benefit of throttling parse calls. Simply typing in insert mode can then result in a high CPU usage in that scenario.

Rely on vim.treesitter.start() alternative

vim.treesitter.start() initialises highlighter that handles tree re-parsing and is maintained by neovim core team. nvim-treesitter's highlighting module uses it too, so anyone who's using master branch with highlight enabled, will receive the benefit without extra configuration.

If we declare that one needs to "start tree-sitter" (which, admittedly, is not quite honest), then this plugin can ditch re-parsing routines and simply handle to on_changedtree. The following diff achieves this:

diff --git a/lua/treesitter-matchup/internal.lua b/lua/treesitter-matchup/internal.lua
index f38b2bb..887b2ac 100644
--- a/lua/treesitter-matchup/internal.lua
+++ b/lua/treesitter-matchup/internal.lua
@@ -173,26 +173,18 @@ M.register_callbacks = function(bufnr)
     store_buf_matches(bufnr, ltree, scopes, nodes, symbols)
   end)
 
-  local handle_updates, update_timer = throttle.throttle_leading(function()
-    vim.fn['matchup#perf#tic']('ts.reparse')
-    ltree:parse(true)
-    vim.fn['matchup#perf#toc']('ts.reparse', 'done')
-  end, 300)
-
   local detach = vim.schedule_wrap(function(buf)
-    update_timer:stop()
-    update_timer:close()
     clear_buf_matches(buf)
   end)
 
   ltree:register_cbs({
     on_detach = detach,
-    on_bytes = handle_updates,
     on_changedtree = get_and_store_matches,
   }, true)
 
-  ltree:parse(true)
-  get_and_store_matches()
+  if vim.treesitter.highlighter.active[bufnr] then
+    get_and_store_matches()
+  end
 
   api.nvim_buf_set_var(bufnr, 'matchup_treesitter_initialized', 1)
 

Note: It's a quickly bodged up example diff, probably broken. I'm currently implementing this approach properly to see how it behaves in real life.

Neovim nightly issue

The diff works, but only on neovim stable. Nightly doesn't work due to highlighter using parse call without passing a parameter, which skips injections re-parsing, consequently keeping the tree in an invalid state. I'm not sure whether this is a bug.

Test command - :lua vim.treesitter.start(); vim.print(vim.treesitter.get_parser():is_valid()). It returns true on stable, false on nightly.

I've submitted a patch to neovim to handle this case - neovim/neovim#26220

Combined approach

This plugin could default to vim.treesitter.start() requirement, with an opt-in on_bytes handler. That would work without re-parsing downsides in the general case, and would allow people to opt-in when they want to, or even implement their own re-parsing logic with vim-matchup following suite.

In order to make things "just work", we could call vim.treesitter.start() ourselves during the attach phase with an opt-out configuration flag.

@clason
Copy link

clason commented Nov 25, 2023

we could call vim.treesitter.start() ourselves during the attach

No. This starts treesitter highlighting. Each plugin is expected to handle its own logic independently (calling parse on a parsed tree is a no-op, so there's no point and many drawbacks in centralizing it). In particular, it is an express goal for features to be independent (so you can use, say, textobjects without having to give up regex highlighting).

I recommend looking at https://github.com/nvim-treesitter/nvim-treesitter-context for an example of a standalone treesitter plugin.

I'm not sure whether this is a bug.

It is not; it's a necessary performance optimization.

@andymass
Copy link
Owner

I recommend looking at https://github.com/nvim-treesitter/nvim-treesitter-context for an example of a standalone treesitter plugin.

Thanks @clason for giving guidance here!

Sorry if this is a really basic question, it is not quite clear to me yet. Specifically, I don't understand how nvim-treesitter-context can guarantee that the tree is in such a state as to be queried, it does not contain any parse() command, although your statement "Each plugin is expected to handle its own logic independently (calling parse on a parsed tree is a no-op," makes me think all such plugins ought to.

Per @Slotos's question (neovim/neovim#26220), it seems that neovim is moving to more incremental update parsing, which is surely a good thing. However per the commentary it seems even the highlighter "module" cannot be relied upon to have fully parsed the tree before invocation of dependent plugins like match-up or nvim-treesitter-context.

@clason
Copy link

clason commented Nov 25, 2023

The point here is that every "user" is responsible to doing whatever it needs to. The highlighter does not need to have a fully parsed state, as things outside the viewport do not need to be highlighted. Similar for context; other plugins may have different needs -- and why should people that don't use them pay for that?

TL;DR: "parse if you need to, and only parse what you need".

(And highlighting is just one functionality among others; it should not be considered the "main treesitter module" that others have to hook into. We're trying to get away from this initial monolithic design, which has not aged well.)

@Slotos
Copy link
Author

Slotos commented Nov 26, 2023

I gather that for vim-matchup we’re looking at the full tree in the general case.

We might be able to use previous matches to somewhat limit the range of update. E.g. by finding the range of the largest known match containing the changes.

I worry that at that point that’s getting indistinguishable from languagetree’s own parsing with injections, however. I see that there’s tracking of changed lines present in its code, but I haven’t traced it far enough to have a complete mental model yet.

@clason
Copy link

clason commented Nov 26, 2023

I gather that for vim-matchup we’re looking at the full tree in the general case.

I'd be very surprised if that were the case. You know where the cursor is, and you'd never search for matching braces in injected languages (unless you were already inside it).

@clason
Copy link

clason commented Nov 26, 2023

Let's take a step back and consider the actual issue -- if there even is one; this seems to me a very premature (and likely unwarranted) concern.

Also, simply vendoring nvim-treesitter is the wrong way to do it -- much of this code we're getting rid of because it's obsolete. The main point of the rewrites is instead to rely on the functionality built into Neovim core (and possibly some convenience functions on top that are fine to steal).

Lastly, it's important to point out that there's no real hurry here:

  1. the current master branch will not go away (although it might be renamed), so you can always require people use that until you're ready;
  2. and that won't happen until after the Neovim 0.10 release at the earliest, so you should target Neovim HEAD in your rewrite.

@andymass
Copy link
Owner

andymass commented Nov 26, 2023

Good points @clason, let's not over-optimize for where we imagine nvim-treesitter is headed until we need it.

On the other hand, I feel @Slotos 's work here is independently very important for a number of reasons (the title of this PR is inaccurate in my opinion):

  1. It appears to be simpler and better organized overall. I don't see wholesale copying of nvim-treesitter functions in this PR (although we may have done such in the past).
  2. There have been a number of open issues to this plugin related to (lazy) loading which are really hard to reason about in the current nvim-treesitter module framework.
  3. There are many issues to this plugin related to performance which I'm hoping some of these changes help to address. In particular, the async query match retrieval seems promising and in line with some other modern plugins.

There are a few issues and nits in this PR to address but on whole it looks great. I'll take another pass later and do a full review with some suggested edits. Few high level comments:

  1. We should omit the deprecation warning until nvim-treesitter has confirmed its plans.
  2. Let's make sure to support the treesitter related options in .setup{} call like other config options are. (setup{treesitter={config={}}} is probably fine, exactly equivalent to setting g:matchup_treesitter_config).
  3. (most important) Need to settle the aforementioned performance considerations- do we need to call .parse at all or can we omit it like nivm-treesitter-context? Can we use incremental parsing? Does this PR address Freeze in large C file with treesitter #302 in particular? How do we measure it?

@andymass andymass changed the title Extract nvim-treesitter deprecations Refactor tree-sitter related functionality Nov 26, 2023
@clason
Copy link

clason commented Nov 26, 2023

Fair points, but my general recommendation still stands: forget about nvim-treesitter and rely on Neovim core functionality as much as possible. (Any code from nvim-treesitter that is not a simple convenience wrapper is suspect; I've removed all of that for a reason.) Going forward, nvim-treesitter will just be a parser installer and curated query collection; you can take that as confirmation.

Refactoring now (before nvim-treesitter 1.0) is possible, of course, but be aware that treesitter is still considered experimental even in Neovim, and a lot still changes -- breaking changes included. Nevertheless, I'd recommend targeting HEAD with this refactor, as a lot of performance improvements were made for 0.10 (specifically related to C injections, which is a well-known issue -- in fact, the "full parse" was removed exactly for that reason!) These issues are fundamental treesitter issues, though, and need to be solved by Neovim. (This is one of the things required for "graduating" treesitter support.)

Regarding parse(): calling get_parser will call parse() internally (note the lack of arguments), which parses the root tree (only). Just try working with that; if you find that you're missing context inside injected languages, call parse(range) with the minimal range needed. (Parsing is always incremental; that's what tree-sitter is all about!)

Regarding caching: queries are cached internally by Neovim (as opposed to nvim-treesitter; another reason to avoid that framework!), but if you do non-trivial things with the results, it would make sense to cache those via a memoize function as in https://github.com/nvim-treesitter/nvim-treesitter-context/blob/bf4d15ee4e96ff5201f16a4ed14443670662eb90/lua/treesitter-context/cache.lua#L3

(There are similar ones in nvim-treesitter 1.0, and in fact in Neovim: vim.func._memoize, which is private though.)

@Slotos
Copy link
Author

Slotos commented Nov 26, 2023

I've been thinking about the problem the cached query results is trying to solve, and I'm wondering about how to benchmark it.

The goal itself is similar to that of nvim-treesitter-context's - as the cursor moves, we need to fetch related regions as fast as possible, preferably imperceptibly so.

Now, the question then becomes "does lua cache perceptibly outperform a tree-sitter querying mechanism". Tree-sitter operates on trees, and I presume it is well optimised for querying on said trees, whereas local lua cache right now is a linear loop. If local lua cache doesn't show an order of magnitude speedup with average lookup times <1ms, using buftick cached individual queries is just ok and likely scales better into larger trees.


Separately from that, I've been checking out performance on Neovim's own src/nvim/shada.c, and I discovered two fatal flaws in my current code:

  1. :parse(true) does not necessarily validate the tree on the first call. Or rather, it never validates it on the first call in every attempt of mine. For fast reproduction, use something like NVIM_APPNAME=nvim-test nvim src/nvim/shada.c and =vim.treesitter.get_parser():parse(true) with a followup vim.treesitter.get_parser():is_valid().

    :parse({0, maxline_num}) -> self:_add_injections -> self:add_child -> self:invalidate
    It's a surprising behaviour.

    Even if going by code alone, :parse() with an argument intent seems to be to go through the provided range, discover and add injections, and parse them in a descending loop. At the end of this process, I'd expect the parser own :is_valid(true) validity to be based on self:_parse_regions() outcome.

    Yet, due to invalidation that occurs when adding children, even if every leaf child is valid, the parser needs a second parse call to validate itself.

  2. on_changedtree is invoked hundreds of times during parsing. With devastating effect on performance, but oh so warm in the winter.


I intent to iterate a few more times on this code, looking for a good combination of performance and simplicity.
I took a look at nvim-treesitter-context for inspiration, but it doesn't handle buffer updates at all. In absence of an active highlighter, it gets stuck on initial buffer parse state. Duplicating any enum typedef in src/nvim/shada.c is enough to highlight the behaviour.

@Slotos
Copy link
Author

Slotos commented Nov 26, 2023

Fair points, but my general recommendation still stands: forget about nvim-treesitter and rely on Neovim core functionality as much as possible.

That's the goal. Code copies are probably masking the overall trend, but I specifically started this work to eliminate all and any reliance on nvim-treesitter. I simply cannot do it in one fell swoop, so I chose to iterate by freezing some parts, copying nvim-treesitter implementations.

@andymass
Copy link
Owner

andymass commented Feb 2, 2024

Hey @Slotos, any further thoughts on this? Do you have any more ideas how to efficiently handle updates?

@Slotos
Copy link
Author

Slotos commented Feb 2, 2024

On treesitter side, parsing a line as a region should give us all the local information that's interesting. Range parsing gets propagated to children trees, and will yield all the data that's relevant to the line. Highlighter is an example of this working, and it handles nested trees just ok.

All of that should be mostly confined to lua/treesitter-matchup/internal.lua.

I apologise for the silence, holidays season turned into a viral melting pot, followed by a set of tasks at work that consume all my attention.

@clason
Copy link

clason commented Feb 2, 2024

You might want to look at https://github.com/nvim-treesitter/nvim-treesitter/blob/main/lua/nvim-treesitter/locals.lua for another "self-contained" example for the nvim-treesitter 1.0 era. Feel free to just steal any code from the old ts_utils.lua module that you need and isn't in Neovim (0.10) already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants