New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(feat): Increase speed of org-roam-node-find #2341
base: main
Are you sure you want to change the base?
Conversation
00056f7
to
bbb3851
Compare
bbb3851
to
40e984f
Compare
causes some regressions, particularly regarding using filter functions over the nodes, |
Please elaborate what the problem is. |
I was using this advice function before https://github.com/Konubinix/Devel/blob/30d0184db0a61f46ca35e102302d707fef964a8c/elfiles/config/after-loads/KONIX_AL-org-roam.el#L770-L787 And I was also using filter functions such as described in https://org-roam.discourse.group/t/filter-org-roam-node-find-insert-using-tags-and-folders/1907/6 The problem in that was that caching org-roam-node-read--completions meant the filter functions just wouldn't work. Then I tried this patch - and it had the same problem, caching meant that filter functions wouldn't work - although in the earlier case I was able to resolve it by using let to not use the cache within the scope of the functions, but I don't understand how to achieve a solution with this patch - If it caches the last result - the advice function is much simpler and does the same thing? But I cannot answer with certainty because I wouldn't understand the modifications you made very thoroughly - just thought it would be prudent to let you know it causes this problem. |
The cache from https://github.com/Konubinix/Devel/blob/30d0184db0a61f46ca35e102302d707fef964a8c/elfiles/config/after-loads/KONIX_AL-org-roam.el#L770-L787 works on a different level. It's caching the whole evaluation of org-roam-node-read--completions, while this pr doesn't cache the sorting of the entries. This pr also caches Also this pr works at least for me with filter functions. |
I just checked again! Its working. I am so sorry I dont know what was going wrong previously. I feel so embarrassed. My filter functions are working just fine! Thanks, I honestly don't know why they werent working previously. Thanks for this PR ! |
I added caching for sort-fn. This makes it viable to add sth like (setq iko/org-roam-node-find-precache-timer
(run-with-idle-timer 6 t (defun iko/org-roam-node-find-precache ()
(org-roam-node-read--completions org-roam-node--candidate-list-prev-filter-fn
(or org-roam-node--candidate-list-prev-sort-fn
(when org-roam-node-default-sort
(intern (concat "org-roam-node-read-sort-by-"
(symbol-name org-roam-node-default-sort))))))))) to ones config. This code rebuilds the cache if needed after 6 seconds of inactivity. |
Hi - I want to ask a query if you dont mind -- will the timer rebuild the sort of the existing cache or will it rebuild the cache every x seconds - even if no change in the db has occurred? I use vertico and it automatically handles sort for me based on history -- so if I dont need this sort rebuilt -- will the code work equivalently as before if I don't do any other input ? Thanks. |
It will only rebuild the cache if a change in the db has occured and emacs has been idle for 6 seconds. |
Another trick i use is the following: I memoize
|
Can I use this hashtable in addition to this patch? I dont really understand this sort fn thing -- can I update with the new patch and if I dont set the timer -- will it work equivalently as before? Is the sort fn put so that the cache could be reset during idle time rather than on every save? Is it plausible to do delta update to the cache rather than a full reset on change ? |
Yes
It's a needed optimization.
No, this doesn't seem possible to me |
The database query only takes a fraction of the total time. The actual culprit is org-roam-node-read--to-candidate. |
I profiled multiple rebuilds of the cache with the memoized version of |
The current implementation can be improved -- I spent some time in concising the code. Currently I am trying a trio of cache, the cache of formatting improves responsiveness when the other two cache rebuilds after db change.
Please remove redundancies -- actual code requirement is very small. |
I think whether the user uses the advice or not should be optional. All of these solutions are intrusive, some more than others. Thus, the user should be confronted with the choice of enabling the advice or not (specially the ones caching the database). In many ways, these optimizations can live in a separate file (or even a module) and only be enabled (one at a time) if so desired. Also, org-roam uses dynamic scoping, so it is possible to create a file/module that contains redefinitions of functions in org-roam functions (and get fully tested) before they replace the originals. by the way, I run the emacs profiler very little time is actually spent in the query. Almost all the time is spent building the data structures (including the reformatting of the query--the cl-loop in org-roam-node-list). |
Based on your suggestions - I prepared the final version -- three layers of cache is defined, For most scenarios Basic Cache over the formatting would suffice, and it is the only one enabled by default Best. https://gist.github.com/akashpal-21/5991e99949af5feb0f75729374ce3d66 |
I'm using sort functions. It is relevant for me that the cache works with sort functions. I don't think that code that silently uses the previous cache (so that the sort function is not applied) can be called a final version. Also due to the (when (>= org-roam-node-cache-level …)) blocks the code only works on a switch of the value of |
Yes, so as to not define things unless used. Why define a bunch of variables and functions if not used. Coming to the sort fn -- if custom sort fn is used it invalidates the logic to use the cache over the final candidates - so its useless to cache and use a custom sort fn in the last case. I do not want to define fns and variables unless used. Please provide solution if you may as how do dynamically define them like that. |
The last process is least expensive - it is not worth the bother to cache for both filter fns and then a new mechanism for sorting this list. Sort fn invalidates fully the logic behind caching the last step from the point of being economic. Cache level 1 already speeds up 98% Should we increase complexity for saving 0.01 seconds? |
Please try this function. It might obviate the need to cache the database. The cost of the processing of the list using append (in the original code) meant that the code was O(n^2). With mapcan (as in this version) the processing is O(n). If you have a a lot of nodes, this will be a huge improvement. Sqlite (even with 3 joins) is probably O(nlog(n)) on the number of nodes processed. https://gist.github.com/dmgerman/53e266502e4d53294d1e8be46af93b4f |
I saw your post the other day on the discourse - truly this is a good fix rather than trying to do a brute force cache over the nodes list for efficiency gains. But I couldn't replicate the case of org-roam-db-query deviating from the function by a noticeable margin -- in any case, increasing the efficiency of the function in generating the initial struct is a good step. Anyhow - the cache over I will post my results on changing the function definition for a case of 20000 generated test sample later on your post in the discourse |
Unfortunately I cannot spot any benchmark difference -- can you share yours? I have a simulated list of 20000 nodes - with no interconnections. Do you see any noticeable difference in your benchmark runs ? |
Mmm, see this: I get a huge difference in garbage collection. Can you run the following test (adapt to point to your org-roam database). This would show how fast the DB is: https://gist.github.com/dmgerman/f8679f7a11bbdd748f648dd3f30151c5 This is how long it takes on my computer, with 1000 nodes:
it is basically instantaneous |
@vherrmann Fixed the case that changing the variable without reloading the file didn't work - a simple reloading of the mode would suffice - it was trivial enough to solve it. Also this allows the user to simply pluck out what they need and put in their init file -- the gist is simply for documentation - since this PR doesn't clarify anything, the user is left to his own ways to first patch it then manually comb through the changes and comparing with the original code to see anything being done or what even is being done. Even custom sort fn users need not fear - the difference between cache level 2 and 3 is in milliseconds - no need to enable cache level 3. For node size < 3000 - the basic cache is more than enough. https://gist.github.com/akashpal-21/5991e99949af5feb0f75729374ce3d66 @dmgerman I will continue our conversation in the discourse. But for the time being - these functions are a good bandaid over the problem. |
Why would the cache of |
@vherrmann You're correct.
My intention with not defining those functions and variables was not efficiency - I felt what is not being used should not be exposed to the user. Also it allowed me to slice the sections so that people may easily plug in what they need to their init -- otherwise I had to follow the established semantic of defining all variables first then all functions in order of their dependency and so on.. But it backfired, in that i found edebug to be non operational during test -- so I gave up, and refactored it to be more simple and in line with the orthodox way -- thanks for your suggestion. I have defined the sort cache in the requisite way so as not to destroy the cache - the sort can be safely applied over the existing final candidate list - this way we make the cache better. I have also defined the requisite variable. Thanks for your cooperation, |
Increase speed of
org-roam-node-find
by caching last results. The validity of the cache is ensured by comparing the modification attribute of the database file with the last modification time of the cache.This commit only increases the speed if you didn't modify any node since the last time you used
org-roam-node-find
.