Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too much memory consumed on queries matching many synonym groups #71

Open
Mykezero opened this issue Jan 3, 2017 · 3 comments
Open

Comments

@Mykezero
Copy link
Contributor

Mykezero commented Jan 3, 2017

Hello! I've noticed with the plugin a problem like what was happening in issue #38.

If a query is broad - with matches on many synonym groups - then the plugin will start expanding too many synonyms which causes the Java heap memory to be consumed too rapidly, eventually resulting in a out of memory exception. Even if this query succeeds, the request takes a long time to process (request time > 5 seconds).

The linked issues suggests using the synonyms.bag=true flag which seems to keep the memory usage down, but are there any downsides to using that flag?

I've tested this under Solr.6.0.0 with the hon-lucene-synonyms-5.0.5.jar file. Here is the query and synonyms that seem to trigger this problem.

Query

http://localhost:8983/solr/test/select?q="bobcat pup tortoise bunny angle toad"&defType=synonym_edismax&synonyms=true&debugQuery=true

Synonyms.txt

# Cats
bobcat, cheetah, cougar, jaguar, kitten, kitty, leopard, lion, lynx, mouser, ocelot, panther, puma, puss, pussy, tabby, tiger, tom, tomcat, grimalkin, malkin

# Dogs
pup, puppy, bitch, cur, doggy, hound, mongrel, mutt, pooch, stray, tyke, bowwow, fido, flea bag, man's best friend, tail-wagger

# Turtles
tortoise, chelonian, cooter, leatherback, loggerhead, slowpoke, snapper, terrapin, testudinal

# Rabits
bunny, hare, rodent, buck, capon, chinchilla, coney, cony, cottontail, cuniculus, doe, lagomorph, lapin

# Fish
angle, bait, bob, cast, chum, extract, extricate, find, net, produce, seine, trawl, troll, bait the hook, cast one's hook, cast one's net, go fishing, haul out, pull out

# Frogs
toad, bullfrog, caecilian, croaker, polliwog
@Mykezero
Copy link
Contributor Author

Mykezero commented Jan 5, 2017

I've tried using the synonyms.bag flag, but it doesn't return results with the same precision as without it, and I really need the extra precision. Queries that contain only one term seem to work fine, but when the query has two or three terms, it doesn't have the same precision as it did when not using the synonyms.bag flag.

@yogeshk-ezdi
Copy link

i have also a same problem If a query is broad it matches on many synonym groups - then the plugin will start expanding too many synonyms which causes memory and performance

@softwaredoug
Copy link
Collaborator

I actually implemented the synonyms.bag flag. But I just saw @yogeshk-ezdi comment and wanted to point out that I tend to use this plugin instead of hon-lucene-synonyms these days. I've had better luck being more precise with synonym expansion which has helped me control memory usage. More at this blog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants