Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore using a native third-party search tool such as ripgrep or Silver Searcher #19983

Closed
d-akara opened this issue Feb 5, 2017 · 50 comments
Closed
Assignees
Labels
feature-request Request for new features or functionality search Search widget and operation issues
Milestone

Comments

@d-akara
Copy link

d-akara commented Feb 5, 2017

I am very impressed with the performance of the new parallel search; however, there is an opportunity to take search speed to the absolute limit by optionally allowing a user to configure ripgrep as the search provider.

Even with the new search speed, ripgrep is still an order of magnitude faster. Ripgrep is actually an order of magnitude faster than pretty much anything.

http://blog.burntsushi.net/ripgrep/

@roblourens
Copy link
Member

I was going to look at The Silver Searcher this month, I had skipped ripgrep because it doesn't support multiline search. I know ripgrep can be faster though. Do you think there's enough difference between different search tools that it would be worth allowing people to hook up arbitrary ones to vscode?

@roblourens roblourens added feature-request Request for new features or functionality search Search widget and operation issues labels Feb 5, 2017
@d-akara
Copy link
Author

d-akara commented Feb 6, 2017

Yes, I was just thinking that maybe there should be an API that allows extension authors to hookup whatever they want. However, I'm not sure there is anything that interesting beyond ripgrep and Silver Searcher. When I think of awesome search tools, those are what come to mind.

So, I just did some comparisons on my project. Searching for a particular term, I get these results:

If we can get back to 2 sec performance again, then there is less difference between vscode and ag, but ripgrep is still significantly faster. However, as you said, you do lose multiline searches. If vscode implements multiline searching, then I would be happy with doing multiline with vscode's builtin search and using ripgrep for most of my other searching as that would be most common.

@roblourens roblourens added this to the February 2017 milestone Feb 7, 2017
@roblourens roblourens changed the title Offer option to configure ripgrep as vscode file search provider Explore using a native third-party search tool such as ripgrep or Silver Searcher Feb 7, 2017
@roblourens
Copy link
Member

Ripgrep would be perfect, but I really want multiline search support. The problem with Silver Searcher right now is that it can't handle ignoring ** patterns, which is problematic for supporting gitignore files and our ignore glob patterns. ggreer/the_silver_searcher#530

The ripgrep blog post you posted lists other tools, but I eliminate them on various other grounds, like lacking windows support or perf that apparently breaks down.

There are also some specialist tools I've found, like ICgrep or Hyperscan, that focus on advanced unicode or regex features.

Considering all this, we should either

  • use ripgrep, but never support multiline search
  • use ag, but get extra results from ag, and filter them with our glob magic afterwards
  • add an extension API, so people can use whatever tool they like

Still leaning towards ag even though working around the ** issue would be very annoying.

@d-akara
Copy link
Author

d-akara commented Feb 7, 2017

My thoughts are something like this in order of preference:

  1. use ripgrep. However, with these conditions
    1. We are able to restore the original search speed of 1.8.1
    2. There are plans to support multiline searching in the future for vscode builtin search
  2. use Silver Searcher: If both conditions above for ripgrep are not true, then for me it seems Silver Searcher would be best choice.
  3. Extension API: If for whatever reason we can't make a confident decision for 1 or 2

@roblourens
Copy link
Member

For 1., if we were using ripgrep, it would replace vscode search entirely, so it would be much faster than 1.8.1, and there would be no multiline search.

@d-akara
Copy link
Author

d-akara commented Feb 7, 2017

Ahh, I didn't realize you were considering it as a replacement. So you would then bundle ripgrep or Silver Searcher as part of vscode?
If the internal search wouldn't be supported any longer, I suppose I would then prefer Silver Searcher.

@roblourens
Copy link
Member

Yeah that's the idea, to have it drive the search viewlet behind the scenes. Possibly could also be involved in driving quick open.

@BurntSushi
Copy link

(ripgrep author here.) What do you folks use multiline search for? I've long considered it something I'd be unlikely to add support for, but I've been known to bend if there's strong demand for it. Alternatively, maybe there's a compromise that can be reached.

Note that ** isn't the only thing that ag doesn't support in gitignore files. ripgrep's support for gitignore matching is pretty dang close to 100% and remains fast. e.g., If you have lots of gitignore files or a single giant one, then ag slows down quite a bit compared to ripgrep.

Are there other things you folks care about? What about Unicode support? Support for searching UTF-16 (planned, not actually available yet)?

@BurntSushi
Copy link

I'm also in the process of moving a lot of code in ripgrep out into distinct distinct Rust libraries, which would give you a lot more control over how search operates. But, you'd need to build out a C FFI for it, which wouldn't be especially hard, but it wouldn't be something someone could bang out in a day either.

@d-akara
Copy link
Author

d-akara commented Feb 19, 2017

@BurntSushi here are some sample use cases of multiline search.

The most often is simply Code statements often don't always exist as single lines
someFunctionCall( arg1, arg2 )
Can be written like this

someFunctionCall(arg1,
arg2)`

  1. I often am interested in terms that appear near each other or in the same file. Questions like...
    1. Which classes make use of x
    2. Where do we query for type X using join with Y. Likely these terms will be near each other, but not on same line
  2. Where are empty try catch blocks where exceptions were not handled.

@BurntSushi
Copy link

@dakaraphi If ripgrep asked you to use two distinct regexes, would that suffice? Or do you want to use one regex?

@d-akara
Copy link
Author

d-akara commented Feb 19, 2017

@BurntSushi If I follow what you imply, then that would only help answer if 2 different terms exist in the same file.
However, a sample regex might look like this where I want to find something that is near:
termA(.|\n){0,200}termB
or
termA(.*\n.*){0,3}termB

or example searching for xml tag with given id
<extension(.|\n)*?id="A"

@roblourens
Copy link
Member

I'll do a writeup for this investigation next week, but for vscode's purposes, we're interested in multiline search, UTF-16 support, and also I like a search that returns results in sorted order by path, which ripgrep doesn't do right now.

@BurntSushi
Copy link

I don't think any search tool with parallelism returns results in sorted order. ripgrep does have the --sort-files option which I think will do what you want, but it disables parallelism.

@roblourens
Copy link
Member

Silver Searcher does actually - I see why it could be a perf hit to order the results though.

@BurntSushi
Copy link

BurntSushi commented Feb 20, 2017 via email

@roblourens
Copy link
Member

You're right - looking at it more closely, SS tends to be closer to being in order, and often is in order when I run it in my vscode workspace, but not always.

@roblourens roblourens modified the milestones: Backlog, February 2017 Feb 22, 2017
@roblourens
Copy link
Member

And I'm glad to hear that UTF-16 support is on the roadmap. Any idea in what timeframe you'd expect to look at it?

@BurntSushi
Copy link

BurntSushi commented Feb 24, 2017

Any idea in what timeframe you'd expect to look at it?

No, sorry. "Within the next year" is probably the best I can do. Hopefully sooner.

@d-akara
Copy link
Author

d-akara commented Feb 24, 2017

Given there isn't an ideal match around features you wish to provide out of the box and uncertainty around the timing of the availability of those features, does it make sense to reevaluate the option of simply providing an API that extension authors can use to integrate external search providers?

It would probably be useful also if the API provided the ability to invoke VS Code's builtin search as a fallback if the extension is able to detect a type of regular expression might not be supported in the external tool, it can then pass it on to VS Code.

@roblourens
Copy link
Member

@BurntSushi I could also fall back to VS Code's builtin search for UTF-16 files, but don't want to duplicate the file tree walking work that ripgrep does. An easy compromise would be if ripgrep prints a message each time it encounters a UTF-16 (or binary) file. I imagine this hidden behind an option but it would also be useful for CLI users who are missing matches because they don't realize a file is an unsupported encoding.

@dakaraphi Thought about it, but it seems like overkill. Creating an extension API is a lot of work and there are probably only a few search providers in the world that anyone would want to use. I want to focus on the out of box experience.

@d-akara
Copy link
Author

d-akara commented Feb 25, 2017

Using ripgrep as primary and VS Code builtin as fallback seems like a good solution.

I would actually be very happy with that compromise. Especially if VS Code could implement multiline search, then multiline regular expressions could just be passed on to VS Code. VS Code's builtin is fast enough that it wouldn't be a bad solution given that multiline searching will be less common.

@d-akara
Copy link
Author

d-akara commented Mar 17, 2017

Speed is awesome, but I will have to strongly disagree here. Not being able to do something entirely is fairly a significant disadvantage. I certainly can understand there is no desire to support another engine that you have to implement yourself.

I would suggest actually consider having a secondary external engine. Silver searcher or platinum searcher as a fallback. If Platinum searcher is easier to integrate, it doesn't have to be the fastest, but feature completeness could then be provided without having to support your own implementation.

@BurntSushi
Copy link

@dakaraphi I don't think the platinum searcher has any functionality that ripgrep doesn't have at this point. It doesn't appear to have multiline search and its regex engine is FSM based like ripgrep's. The only real choice available to you if you want multiline search and PCRE is the silver searcher.

@d-akara
Copy link
Author

d-akara commented Mar 17, 2017

@BurntSushi ahh ok thanks. Yes, the most important feature for me would be multiline. That's a big one. I do use it somewhat often. PCRE would be nice, but I could live without it.

@BurntSushi
Copy link

@dakaraphi I've thought about multiline search for a long time. You folks aren't the only ones who have requested it. I re-opened the issue on ripgrep's tracker and left some thoughts: BurntSushi/ripgrep#176 (comment)

@d-akara
Copy link
Author

d-akara commented Mar 17, 2017

@roblourens @BurntSushi Thanks for making this happen and bringing to VS Code in such a short time! Truly is a pleasure to use.

@lnicola
Copy link
Contributor

lnicola commented Mar 17, 2017

Does any of this apply to searches in a file that's being edited? That is, the "Find" command, not "Find in Files".

@roblourens
Copy link
Member

No

@d-akara
Copy link
Author

d-akara commented Mar 17, 2017

So I have given some additional thought to feature gap of things like look arounds.
Typically additional regex features are about further constraining the results in some way.

I think some of the feature gap would be mitigated if the search results could be easily sent to a new editor. Then you could further search the results using the more feature rich in editor regex engine. Potentially it would also be useful to have a way to send results directly to an editor and have a much higher result limit cap.

There is already a request for this for other use cases. So this would just be an additional benefit.
See #17920

@octref
Copy link
Contributor

octref commented Mar 22, 2017

Was looking for how I can use ag to search faster in VSCode and found this issue.
Tried it out and search takes milliseconds in my fairly large web project. Huge improvement on my workflow.
Thanks @roblourens and @BurntSushi!

@Ethan-VisualVocal
Copy link

@dakaraphi This is a feature of Sublime Text that I miss in VSCode -- ST just automatically dumps everything into a special, searchable "Find Results" tab that also doesn't auto-clear between searches.

(Relying on this feels a bit like a crutch, like maybe I could have gotten ideal results if I'd composed my original search filters + regex better, but I end up using it often anyway because I want to keep my brain on the original task at hand.)

@d-akara
Copy link
Author

d-akara commented Mar 22, 2017

@Ethan-VisualVocal having the ability to dump the results to a document tab opens up some very useful possibilities; however, I currently strongly prefer VSCode's default implementation for finding and navigating code. I find it much better at browsing the files from the results.
I just want the option of being able to capture the results in a document, as there are times when it is very useful and not just as the potential work around here for feature gap of ripgrep regular expression support.

@ThunderEX
Copy link

Can we just use git grep to replace original "find in files" feature?
git grep is just a built-in command of git. Compared to ripgrep:

  • has similar powerful feature as ripgrep
  • built-in of git, cross-platform of course, and no additional code/binary required unless user don't have git
  • support gitignore, by nature!
  • a little lower than ripgrep but still faster than current search feature
  • mature, should support unicode but untested.

@sophiajt
Copy link

@ThunderEX - git grep doesn't work for non-git directories, unless it's some option I haven't seen.

@ThunderEX
Copy link

@jonathandturner you can check config grep.fallbackToNoIndex

@roblourens
Copy link
Member

I didn't realize that git grep works on non-git dirs. But we're now shipping with ripgrep for the March release so I'm closing this issue.

@d-akara
Copy link
Author

d-akara commented Mar 27, 2017

git grep doesn't have multiline. However ripgrep is now investigating adding that feature.
That will be a greater win.
BurntSushi/ripgrep#176

@BurntSushi
Copy link

I don't think it supports UTF-16 either.

@chrmarti chrmarti removed their assignment May 17, 2017
@vscodebot vscodebot bot locked and limited conversation to collaborators Nov 18, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature-request Request for new features or functionality search Search widget and operation issues
Projects
None yet
Development

No branches or pull requests