Use of Set for dictionaries #16

missinglink · 2019-05-13T09:23:49Z

Initially, I used a js object and hasOwnProperty to do the hashmap lookups and then later used Set and has().

It would be nice to standardize this, I'm just not familiar with the performance of Set vs. Object, I think if Set is faster/the same then we should use it.

I think one benefit of Set is that Object can possibly have issues with numeric keys?

cc/ @Joxit thoughts?

The text was updated successfully, but these errors were encountered:

missinglink · 2019-05-13T09:57:58Z

I did some quick performance testing and it seems that Set performs better and the performance is more linear as the size of the dictionary increases:

https://jsperf.com/set-vs-object-as-sets/15

Joxit · 2019-05-13T12:29:34Z

I also did some quick perf (Node v8.16.0) testing and it found that :

Memory print : Set uses less memory than Object (for localities Set = 59.9MB and Object = 61MB)
ops/sec : obj[value] faster than set.has(value) faster than obj.hasOwnProperty(value) 🤔

That's strange, in your test obj[value] seems to be the slowest.

missinglink · 2019-05-13T12:52:35Z

I think it's because those benchmarks (which I copied from someone else) also have a value check ( ^ 0 or !!) which means they are doing two operations.

The problem with something like obj[key] is that it returns false for falsy values such as 0 (although in this case they are all strings so it probably doesn't matter).

It looks like they are pretty similar so let's just go all-in on Set? I like that it doesn't coerse the keys to strings and all the other things that Object does which are weird.

missinglink · 2019-05-13T12:55:15Z

For the prefix checks, I will write a little FST memory structure which will make those much faster, in the meantime they can just use iterators and it will be slow.

Performance overall is pretty good, although some complex queries are getting near 10ms which is not great.

missinglink · 2019-05-13T14:02:57Z

Added FST in #17

Joxit · 2019-05-13T14:26:38Z

I tried your branch, it use much more memory (415MB) and seems to be slower than Set (4 times slower than Set) 🤔

Here is the test : https://gist.github.com/Joxit/32ccd5b5f63b474f30e707d804cbda25

Maybe in the future if we will want to add some metadata e.g if the token is Rue then it may be French. If it's Boulevard it may be English, French, Spanish....

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of Set for dictionaries #16

Use of Set for dictionaries #16

missinglink commented May 13, 2019

missinglink commented May 13, 2019

Joxit commented May 13, 2019

missinglink commented May 13, 2019 •

edited

missinglink commented May 13, 2019 •

edited

missinglink commented May 13, 2019

Joxit commented May 13, 2019

Use of Set for dictionaries #16

Use of Set for dictionaries #16

Comments

missinglink commented May 13, 2019

missinglink commented May 13, 2019

Joxit commented May 13, 2019

missinglink commented May 13, 2019 • edited

missinglink commented May 13, 2019 • edited

missinglink commented May 13, 2019

Joxit commented May 13, 2019

missinglink commented May 13, 2019 •

edited

missinglink commented May 13, 2019 •

edited