Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quering taking lot of time (18 sec to 3 min) intermittently #90

Open
shubhamjoshi2130 opened this issue Feb 21, 2022 · 0 comments
Open

Comments

@shubhamjoshi2130
Copy link

shubhamjoshi2130 commented Feb 21, 2022

Quering taking lot of time (18 sec to 3 min) intermittently

I am using pymagnitude in one of my project to load and use GoogleNews-vectors-negative300.bin.

I have converted GoogleNews-vectors-negative300.bin ----> to a .magnitude file and loading the .magnitude file using Magnitude(). I use pymagnitude to generate embedding of words and then train a ANN model on those embedding.

On my local (with below mentioned details), i face no issue and

Environments:-

(local):-
Mac, 32 GB RAM,docker with centos ---- very fast less than fraction of a second

(Testing Environment):-
CentOs 16 GB Ram --- intermittent slowness, taking 18sec to 3 min for querying some words and the process timeouts.

** I am using a mount , to keep my mmap files. And assured that it is not getting wiped out.

Here are the finings of a few words on Testing Environment and on local :-

Word, Time on Testing Environment
li��n , 0.82 min
ph���m, 0.4 min
al,1.3
Time on local of above keys is very less , even less than a second.

On further investigation and profiling execution time we observed that more time is being taken in case an OOV token if found, and _db_query_similar_keys_vector function is invoked.

Sample Queries which are taking more time:-

SELECT
magnitude.*
FROM
magnitude_subword,
magnitude
WHERE
char_ngrams MATCH "\uf000al" OR "al" OR "l" OR "\uf000"
AND magnitude.rowid = magnitude_subword.rowid
ORDER BY
(
(
LENGTH(offsets(magnitude_subword)) - LENGTH(
REPLACE(offsets(magnitude_subword), ' ', '')
)
) + 1
) DESC,
magnitude.key LIKE 'a%'
AND LENGTH(magnitude.key) <= 4 DESC,
magnitude.key LIKE '%';

-- Took 3.8 min to execute

SELECT
magnitude.*
FROM
magnitude_subword,
magnitude
WHERE
char_ngrams MATCH "\uf000ch" OR "ch" OR "h" OR "n" OR "ng" OR "ng\uf000"
AND magnitude.rowid = magnitude_subword.rowid
ORDER BY
(
(
LENGTH(offsets(magnitude_subword)) - LENGTH(
REPLACE(offsets(magnitude_subword), ' ', '')
)
) + 1
) DESC,
magnitude.key LIKE 'a%'
AND LENGTH(magnitude.key) <= 4 DESC,
magnitude.key LIKE '%';
-- Took 2 min to execute

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant