Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop words in hyphenated compounds #121

Open
FrankKooij opened this issue Sep 8, 2020 · 2 comments
Open

Stop words in hyphenated compounds #121

FrankKooij opened this issue Sep 8, 2020 · 2 comments

Comments

@FrankKooij
Copy link

When two words are used together to yield a new meaning, a compound is formed. Compound words can be written in three ways: as open compounds (spelled as two words, e.g., ice cream), closed compounds (joined to form a single word, e.g., doorknob), or hyphenated compounds (two words joined by a hyphen, e.g., long-term). Sometimes, more than two words can form a compound (e.g., mother-in-law). (source: https://www.grammarly.com/blog/open-and-closed-compound-words)

If a word in a hyphenated compound is a stop word, elasticlunr will ignore it in its search and be less specific. It will not only find results for the compound, but also for the compound excluding all stop words. In one of the examples above, using the default stop words, elasticlunr will search for mother and law, since in is a stop word. The list of results may be much longer than for mother-in-law. A search for would-be, however, will not have any results at all, since both parts of this compound are stop words.

@blackholeearth
Copy link

blackholeearth commented Nov 3, 2023

elasticlunr.clearStopWords();

Will remove all stopwords.

Since , Td idf is already lowering the score of most common words.
You dont need to worrying about them.

@FrankKooij
Copy link
Author

I would like to see the score of compound word matches raised. If you are searching for mother-in-law, the score of an exact compound match should be higher than that of mother is in the house studying law (all parts of the compound word are present, but not as a compound) or if you ask the lawyer mother, law comes first (same but with stop words removed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants