How to search not only from first word? #10

eiva · 2019-11-02T19:14:20Z

Hello, is there any way to provide autosuggestion based not only by first word?

words = {'acura zdx': {},
             'acura abc': {},
             'bmw test': {},
             'bmw coupe': {},
             }
    synonyms = {}
    autocomplete = AutoComplete(words=words, synonyms=synonyms)
    print(autocomplete.search(word='test', size=2))

Prints empty array - so it is not search by second word...

seperman · 2019-11-05T01:17:57Z

Hi @eiva
What I recommend is to add these variations of words in the words dictionary. acura rlx and test acura rlx. We do already have code that does all this automatically but is not explicitly added to fast autocomplete yet.

So for example your words dictionary is going to be:

{
  "acura rlx": [
    {
      "model": "rlx",
      "make": "acura"
    },
    "Acura RLX",
    3132
  ],
  "test acura rlx": [
    {
      "model": "rlx",
      "make": "acura"
    },
    "Acura RLX",
    3132
  ],
  "rlx": [
    {
      "model": "rlx",
      "make": "acura"
    },
    "Acura RLX",
    3132
  ],
  "acura": [
    {
      "make": "acura"
    },
    "Acura",
    130123
  ],
  ...
}

And then you can use the factory function as described in: https://github.com/wearefair/fast-autocomplete#sorting

You will see a similar example in that page.

jayaddison · 2020-06-27T12:14:06Z

A small +1 on this issue, to agree that in-built support for this would be a useful extra (even if optional / disabled-by-default).

jayaddison · 2020-06-27T12:20:33Z

(afterthought: if it's challenging to implement within the library for algorithmic and/or performance reasons, perhaps including a usage example like the one above in the repo itself would be a reasonable alternative)

seperman · 2020-07-01T17:24:05Z

Hi @jayaddison
I have not had a chance to implement it. But basically the solution is called Gaddag: https://en.wikipedia.org/wiki/GADDAG
I agree with you that it is easier to leave some examples in the readme for now! I will try to leave some examples soon. If you have a chance to open a PR and add some examples that would be great too till Gaddag is implemented.
Thanks!

tomerav0 · 2020-09-28T21:43:38Z

I "sovled" it by kind of brute force.
I took each phrase I want to use and created all the combination for that phrase.
I used another array to map each combination to the original value.
For each combination I attached UUID that later I parse at the results.

original_word = row["name"]
          #words[original_word] = [{}, original_word, row["count"]]
          parts = original_word.split()
      
          if len(parts) > 1 and len(parts) <= self._combo_words_limit:   
            for words_combo in itertools.permutations(parts, len(parts)):
                pharse, id = self.buildPharse(words_combo)
                pharses_map.append({id : original_word})
                words.append({pharse : [{}, original_word, row["count"]]})
          else:
                pharse, id = self.buildPharse(original_word)
                pharses_map.append({id : original_word})
                words.append({pharse : [{}, original_word, row["count"]]})

seperman · 2020-09-29T00:25:45Z

@tomerav0 Yeah that works! In fact I have used something similar to your solution before too.

Ronserruya · 2020-10-28T16:03:01Z

I "sovled" it by kind of brute force.
I took each phrase I want to use and created all the combination for that phrase.
I used another array to map each combination to the original value.
For each combination I attached UUID that later I parse at the results.

original_word = row["name"]
          #words[original_word] = [{}, original_word, row["count"]]
          parts = original_word.split()
      
          if len(parts) > 1 and len(parts) <= self._combo_words_limit:   
            for words_combo in itertools.permutations(parts, len(parts)):
                pharse, id = self.buildPharse(words_combo)
                pharses_map.append({id : original_word})
                words.append({pharse : [{}, original_word, row["count"]]})
          else:
                pharse, id = self.buildPharse(original_word)
                pharses_map.append({id : original_word})
                words.append({pharse : [{}, original_word, row["count"]]})

Would you mind posting a more complete working example? maybe a gist?

tomerav0 · 2020-10-28T17:39:32Z

I made it to work but finally gave up on this because its extremely RAM consuming not feasible in any sense of cost. Im talking about 60GB of RAM needed just to load the model.
This lib is good for small datasets but if you go back its a mess.

seperman · 2020-10-28T17:59:04Z

Haha, yeah it is not designed for big data sets at all. I will need to rewrite it maybe in Go or something other than Python to be memory efficient. Sep Dehpour

…

On Oct 28, 2020, at 10:39 AM, tomerav ***@***.***> wrote: I made it to work but finally gave up on this because its extremely RAM consuming not feasible in any sense of cost. Im talking about 60GB of RAM needed just to load the model. This lib is good for small datasets but if you go back its a mess. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

angrygoats · 2020-11-12T19:07:31Z

Related to @tomerav0's memory issue:

@seperman I think there is a solution here you could do with cython. Cython supports extension types. These extension types ("cdef classes") operate using C structs which should have far less overhead.

I glanced through the code and the Autocomplete class has some very memory intense operations in it. These could be cleaned up with Cython as well.

@tomerav0 if the data isn't proprietary (or if you could scrub it) it would be useful to have in order to profile the code before making any changes.

@seperman once you have some test data you can profile using:

python -m cProfile -o memory_profile.profile <test_script>.py <any_args_here>

and then you can inspect it via:

python -m pstats memory_profile.profile

and then check calls using

sort time
stats 10

To show the top 10 biggest call sites. You could also look into memory_profiler. I haven't had a need for it but it seems popular.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to search not only from first word? #10

How to search not only from first word? #10

eiva commented Nov 2, 2019

seperman commented Nov 5, 2019

jayaddison commented Jun 27, 2020

jayaddison commented Jun 27, 2020

seperman commented Jul 1, 2020

tomerav0 commented Sep 28, 2020 •

edited

seperman commented Sep 29, 2020 •

edited

Ronserruya commented Oct 28, 2020

tomerav0 commented Oct 28, 2020

seperman commented Oct 28, 2020 via email

angrygoats commented Nov 12, 2020 •

edited

How to search not only from first word? #10

How to search not only from first word? #10

Comments

eiva commented Nov 2, 2019

seperman commented Nov 5, 2019

jayaddison commented Jun 27, 2020

jayaddison commented Jun 27, 2020

seperman commented Jul 1, 2020

tomerav0 commented Sep 28, 2020 • edited

seperman commented Sep 29, 2020 • edited

Ronserruya commented Oct 28, 2020

tomerav0 commented Oct 28, 2020

seperman commented Oct 28, 2020 via email

angrygoats commented Nov 12, 2020 • edited

tomerav0 commented Sep 28, 2020 •

edited

seperman commented Sep 29, 2020 •

edited

angrygoats commented Nov 12, 2020 •

edited