Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: hyphen-proof search #1823

Open
erlefloch opened this issue Apr 12, 2024 · 3 comments
Open

Feature: hyphen-proof search #1823

erlefloch opened this issue Apr 12, 2024 · 3 comments
Labels

Comments

@erlefloch
Copy link

erlefloch commented Apr 12, 2024

If I have a data file that is named "trials-wheat-2022" for example, it won't be found if I run the query "wheat" in the FAIRDOM search box.
It would be nice to have hyphen-proof search, to make this kind of hyphenated titles appear in the query results.

@stuzart
Copy link
Member

stuzart commented Apr 12, 2024

yeah this is an ongoing issue we're hoping to look at soon. We used to use a tokenizer that stripped out punctuation and hyphens, but then we had complaints that search queries containing hyphens or other symbols were'nt found so we swtiched to a different tokenizer. So we need to look at using multiple tozenizers, if that's possible. It's a sympton of the solr configuration rather than anything in seek.

@stuzart stuzart added the search label Apr 12, 2024
@stuzart
Copy link
Member

stuzart commented Apr 12, 2024

.... if you want to build your own solr container, I think you can just switch this and the line just below from WhitespaceTozenizer to StandardTokenizer.

https://github.com/FAIRdom/solr-seek-docker/blob/master/conf/schema.xml#L64

@erlefloch
Copy link
Author

Thanks for the information ! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants