Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using non-string hash{table,set} is harder than needed #130

Open
stefanct opened this issue Jan 19, 2020 · 2 comments
Open

Using non-string hash{table,set} is harder than needed #130

stefanct opened this issue Jan 19, 2020 · 2 comments

Comments

@stefanct
Copy link
Contributor

stefanct commented Jan 19, 2020

The documentation gives some hints that the default hashtable and hashset implementation are assuming C strings as keys. However, the "official" documentation use some defines that have been removed in the meantime, i.e. in 9b4c3cb. As of now using the hashing data structures with anything but strings as keys is rather a PITA. There is also no test case that would show how much boilerplate code is required (in comparison to string keys).

I propose re-adding the lost comparison functions and defines and adding new explicit functions that initialize hashsets and hashtables for use with pointers, strings (and maybe even a generic version if it makes sense) respectively, e.g. hashtable_new_str and hashtable_new_ptr.

@srdja
Copy link
Owner

srdja commented Jan 30, 2020

The documentation gives some hints that the default hashtable and hashset implementation are assuming C strings as keys. However, the "official" documentation use some defines that have been removed in the meantime, e.g. in 9b4c3cb.

The documentation is definitely a pain point and, in my opinion, should be deleted and replaced with code examples that build and link against the library. That way if they become outdated, they crash the build. (coming soon™)

There is also no test case that would show how much boilerplate code is required (in comparison to string keys).

Well, there is , but the boilerplate is separated out into setup and tear-down functions which makes it a poor substitute for documentation.

Lastly, I don't think the old pointer comparator should make a comeback because of the issue with it discussed here. That said, the hash table doesn't actually need a full comparator, it just needs a simple equality test, so that could be added. ...new_str and ...new_ptr could be also be a thing. I'm not sure what a "generic" one would look like though.

@stefanct
Copy link
Contributor Author

The documentation is definitely a pain point and, in my opinion, should be deleted and replaced with code examples that build and link against the library. That way if they become outdated, they crash the build. (coming soon™)

Yes, stale/wrong documentation is way worse than no documentation and having to figure out the code directly.

There is also no test case that would show how much boilerplate code is required (in comparison to string keys).

Well, there is , but the boilerplate is separated out into setup and tear-down functions which makes it a poor substitute for documentation.

I was referring to the hastset tests that use strings only AFAICT. Looking at the hashtable tests instead would have helped me actually.

Lastly, I don't think the old pointer comparator should make a comeback because of the issue with it discussed here. That said, the hash table doesn't actually need a full comparator, it just needs a simple equality test, so that could be added.

Well, the UB mentioned there is (I think) only relevant if you compare pointers to vastly different things (e.g., function pointers that might point to a completely separate address range in harvard architectures to stack addresses) and this would most probably never be a problem in the wild unless compilers exploit it (which is a big threat actually :). I guess one could convert the pointers to uintptr_t to get rid of the UB and compiler "optimizations" but this would need further investigations. If a simple comparison can be used that that's the way to go surely.

...new_str and ...new_ptr could be also be a thing. I'm not sure what a "generic" one would look like though.

A generic function would have an extra parameter to choose the actual implementation. It's no necessity of course but this would make it much clearer from the prototype(s) alone what's going on - the best documentation are well designed prototypes with good naming ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants