Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: switch conditionally to ripgrep or fd #78

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ja0nz
Copy link

@ja0nz ja0nz commented Sep 12, 2022

resolves #69

@l3kn
Copy link
Owner

l3kn commented Dec 3, 2022

Thanks for this!

At first I couldn't see the benefit because on my org-fc-directories
(still containing thousands of files) the difference in runtime was so small.
Trying it on my whole home directory, fd is much faster.

I think the greatest improvement is limiting the search to files actually containing flashcards.
With the -exec flag of find, we might be able to avoid the extra xargs
and I assume fd has something similar.

The commands will need some work though, so their results are exactly the same.
(For example, the current rg command also finds org archive files).

I'll try to rewrite them and then do some end-to-end indexer benchmarks.

@l3kn
Copy link
Owner

l3kn commented Dec 3, 2022

After a lot of experimentation, these three commands
seem to behave exactly the same regarding hidden files, hidden directories
and upper/lower case '.org' extensions.

  • "find -L %s -name \".*\" -prune -o -name \"[^.]*.org\" -type f -exec grep -l --null \"^:FC_CREATED\" {} \\+"
  • "rg ^:FC_CREATED: -L -l --null -g '[^.]*.org' %s"
  • "fd --type f -s -e org -g '[^.]*.org' -L %s --exec-batch grep -l --null \"^:FC_CREATED:\" {}"

As expected, rg is significantly faster than find.
I have benchmarked these on my home directory and the subdirectory I use for org files:

  • find: 2.53s, 0.18s
  • rg: 0.36s, 0.095s
  • fd: 0.84s, 0.24s

What's odd here is that fd is slower than find in one of the cases.
-e org is made redundant by the latter glob pattern
but it appears to reduce the runtime.

I think it's safe to assume that anyone who has installed fd would be fine installing rg as well,
so until we can find a fd command that is faster than find in all cases,
I'll only add rg as an alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Option for use ripgrep instead of find in org-fc-awk--find
2 participants