Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In search, a symbol for "any non-numeric character" #53

Open
JeffreyBenjaminBrown opened this issue Jun 19, 2017 · 5 comments
Open

In search, a symbol for "any non-numeric character" #53

JeffreyBenjaminBrown opened this issue Jun 19, 2017 · 5 comments

Comments

@JeffreyBenjaminBrown
Copy link
Member

If I'm searching for sum, I'll probably search for *sum*. Otherwise I'll miss (sum or sum. or other punctuation-adjoined instances of the word. However, by using * I expand the search to include sumo and assume and other things I'm not looking for.

I googled for a while and still don't know whether it's possible.

@joshsh
Copy link
Member

joshsh commented Jun 20, 2017

Probably not possible using Lucene syntax. In regular expressions, that would be a character set like [^A-Za-z], but I don't think Lucene supports regex. We could add support at the filter level if the use cases are compelling enough to justify breaking from Lucene.

@JeffreyBenjaminBrown
Copy link
Member Author

"support at the filter level"? Does that mean rewriting Lucene?

@joshsh
Copy link
Member

joshsh commented Jun 20, 2017

No, it means defining a new, SmSn-specific query syntax, and mapping expressions in that syntax to Lucene syntax (then filtering on the results). Since Lucene syntax is so well-known, the pros (more expressive queries) would have to be pretty significant to outweigh the cons (a syntax everyone has to learn from scratch, and an implementation we have to develop, test, and maintain).

@JeffreyBenjaminBrown
Copy link
Member Author

I'm not (yet) suggesting any radical changes to Lucene, just a symbol for non-alphanumeric characters.

But while we're on the subject ...

I have nodes that look like this: practical & alone. {easy money}. The trailing period indicates that "practical & alone" is a complete thought. Then the bracketed expression provides an example of the content of the category. It would be valuable to me if I could search in only the first sentence, or search on the whole note but then rank according to the length of the first sentence rather than the whole note.

@JeffreyBenjaminBrown
Copy link
Member Author

You suggested you want more use cases. Consider (I did this reacently) searching for notes about yourself, by searching for (probably among other words) the word "me". If you want to find it when it's adjacent to punctuation, you'll have to surround it with * symbols. But to do that is to find every word with me in it, which is a humongous number of words.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants