Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of Set for dictionaries #16

Open
missinglink opened this issue May 13, 2019 · 6 comments
Open

Use of Set for dictionaries #16

missinglink opened this issue May 13, 2019 · 6 comments

Comments

@missinglink
Copy link
Member

Initially, I used a js object and hasOwnProperty to do the hashmap lookups and then later used Set and has().

It would be nice to standardize this, I'm just not familiar with the performance of Set vs. Object, I think if Set is faster/the same then we should use it.

I think one benefit of Set is that Object can possibly have issues with numeric keys?

cc/ @Joxit thoughts?

@missinglink
Copy link
Member Author

I did some quick performance testing and it seems that Set performs better and the performance is more linear as the size of the dictionary increases:

https://jsperf.com/set-vs-object-as-sets/15

@Joxit
Copy link
Member

Joxit commented May 13, 2019

I also did some quick perf (Node v8.16.0) testing and it found that :

  • Memory print : Set uses less memory than Object (for localities Set = 59.9MB and Object = 61MB)
  • ops/sec : obj[value] faster than set.has(value) faster than obj.hasOwnProperty(value) 🤔

That's strange, in your test obj[value] seems to be the slowest.

@missinglink
Copy link
Member Author

missinglink commented May 13, 2019

I think it's because those benchmarks (which I copied from someone else) also have a value check ( ^ 0 or !!) which means they are doing two operations.

The problem with something like obj[key] is that it returns false for falsy values such as 0 (although in this case they are all strings so it probably doesn't matter).

It looks like they are pretty similar so let's just go all-in on Set? I like that it doesn't coerse the keys to strings and all the other things that Object does which are weird.

@missinglink
Copy link
Member Author

missinglink commented May 13, 2019

For the prefix checks, I will write a little FST memory structure which will make those much faster, in the meantime they can just use iterators and it will be slow.

Performance overall is pretty good, although some complex queries are getting near 10ms which is not great.

@missinglink
Copy link
Member Author

Added FST in #17

@Joxit
Copy link
Member

Joxit commented May 13, 2019

I tried your branch, it use much more memory (415MB) and seems to be slower than Set (4 times slower than Set) 🤔

Here is the test : https://gist.github.com/Joxit/32ccd5b5f63b474f30e707d804cbda25

Maybe in the future if we will want to add some metadata e.g if the token is Rue then it may be French. If it's Boulevard it may be English, French, Spanish....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants