Using non-string hash{table,set} is harder than needed #130

stefanct · 2020-01-19T23:09:12Z

The documentation gives some hints that the default hashtable and hashset implementation are assuming C strings as keys. However, the "official" documentation use some defines that have been removed in the meantime, i.e. in 9b4c3cb. As of now using the hashing data structures with anything but strings as keys is rather a PITA. There is also no test case that would show how much boilerplate code is required (in comparison to string keys).

I propose re-adding the lost comparison functions and defines and adding new explicit functions that initialize hashsets and hashtables for use with pointers, strings (and maybe even a generic version if it makes sense) respectively, e.g. hashtable_new_str and hashtable_new_ptr.

The text was updated successfully, but these errors were encountered:

srdja · 2020-01-30T11:07:54Z

The documentation gives some hints that the default hashtable and hashset implementation are assuming C strings as keys. However, the "official" documentation use some defines that have been removed in the meantime, e.g. in 9b4c3cb.

The documentation is definitely a pain point and, in my opinion, should be deleted and replaced with code examples that build and link against the library. That way if they become outdated, they crash the build. (coming soon™)

There is also no test case that would show how much boilerplate code is required (in comparison to string keys).

Well, there is , but the boilerplate is separated out into setup and tear-down functions which makes it a poor substitute for documentation.

Lastly, I don't think the old pointer comparator should make a comeback because of the issue with it discussed here. That said, the hash table doesn't actually need a full comparator, it just needs a simple equality test, so that could be added. ...new_str and ...new_ptr could be also be a thing. I'm not sure what a "generic" one would look like though.

stefanct · 2020-01-30T11:28:34Z

The documentation is definitely a pain point and, in my opinion, should be deleted and replaced with code examples that build and link against the library. That way if they become outdated, they crash the build. (coming soon™)

Yes, stale/wrong documentation is way worse than no documentation and having to figure out the code directly.

There is also no test case that would show how much boilerplate code is required (in comparison to string keys).

Well, there is , but the boilerplate is separated out into setup and tear-down functions which makes it a poor substitute for documentation.

I was referring to the hastset tests that use strings only AFAICT. Looking at the hashtable tests instead would have helped me actually.

Lastly, I don't think the old pointer comparator should make a comeback because of the issue with it discussed here. That said, the hash table doesn't actually need a full comparator, it just needs a simple equality test, so that could be added.

Well, the UB mentioned there is (I think) only relevant if you compare pointers to vastly different things (e.g., function pointers that might point to a completely separate address range in harvard architectures to stack addresses) and this would most probably never be a problem in the wild unless compilers exploit it (which is a big threat actually :). I guess one could convert the pointers to uintptr_t to get rid of the UB and compiler "optimizations" but this would need further investigations. If a simple comparison can be used that that's the way to go surely.

...new_str and ...new_ptr could be also be a thing. I'm not sure what a "generic" one would look like though.

A generic function would have an extra parameter to choose the actual implementation. It's no necessity of course but this would make it much clearer from the prototype(s) alone what's going on - the best documentation are well designed prototypes with good naming ;)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using non-string hash{table,set} is harder than needed #130

Using non-string hash{table,set} is harder than needed #130

stefanct commented Jan 19, 2020 •

edited

srdja commented Jan 30, 2020

stefanct commented Jan 30, 2020

Using non-string hash{table,set} is harder than needed #130

Using non-string hash{table,set} is harder than needed #130

Comments

stefanct commented Jan 19, 2020 • edited

srdja commented Jan 30, 2020

stefanct commented Jan 30, 2020

stefanct commented Jan 19, 2020 •

edited