patricia should use heap offset pointers #18

jart · 2013-03-25T06:01:45Z

The memory overhead of your patricia container on 64-bit systems is suboptimal. You already have your own memory pools so you can very easily cut the patricia overhead in half by replacing the patricia_elem pointers with a uint32_t offset from node_heap. I'd send you a pull request, but you don't have any unit tests 👎

It would be even better to have a 24-bit offset pointer represented as (uint8_t << 16) | uint16_t. This way 32-bit systems would also be able to benefit from the lower memory usage. However such a pointer would place a limitation of having no more than 2^24 nodes. I think that's an acceptable statically-defined behavior since 2^24 patricia_elems would take up 896 megs of memory.

You should also create mowgli_patricia_sizeof() function so people can measure the costs of using the patricia data structure.

Also you should not call patricia a "hashtable" in the README because radix trees don't hash anything. You'd be better off describing it as "a radix tree dictionary that offers performance at the expense of using a hilarious amount of memory".

spb · 2013-03-25T09:54:25Z

And how much memory would this save in a real world example? I don't see it being enough to warrant the added complexity and additional assumptions that would have to be made about the underlying platform.

jart · 2013-03-25T21:25:24Z

It makes no assumptions about the platform, it's just using an int32_t instead of a pointer. There's no preprocessor macros required.

The memory savings are substantial because this reduces the node size from 141 to 73. That's huge.

spb · 2013-03-26T09:57:31Z

Talking about the node size takes no account of the size of what's actually stored in the container, which in many cases will vastly outweigh the size of the node itself. So, what's the real world memory saving for a reasonable application, like Atheme with several thousand users and accounts?

kaniini · 2013-03-27T18:22:50Z

Hello.

Please do not post macro images on bug reports in Atheme projects. It does not add any signal and serves only to rile up people.

Regarding the readme entry, I believe whoever made that documentation change refers to it accidentally as a hashtable there as it presents a hashtable-like API to the end-user (many comparisons may be made to GHashTable in glib, for example).

I am not convinced that we should use pointer offsets in this way because it may break the build on some embedded platforms where pointers are actually offsets into a lookup table. While Mowgli itself is not used directly on such platforms, variants of the underlying code are.

Do you believe your offset proposal will work in an environment where pointers do not represent direct memory addresses?

Also, in general, it is my view that a radix tree still uses less memory than a hashtable when empty due to a limited number of nodes being allocated, verses at least one pointer per bucket being allocated for a hashtable.

In general, I agree that 141 bytes on systems with 64-bit pointers is heavier than it should be for individual tree nodes. It should indeed be looked into.

kaniini · 2013-03-27T18:45:54Z

Also, it seems that using an offset instead of a pointer may break if a memory pool crosses more than one VMA on systems with split-range ASLR like with PaX. The reason being that the base address of the VMA may be from an entirely different range than the first VMA, for example VMAs are split evenly between 0x700000000000 VIRT_BASE and 0x300000 VIRT_BASE. I believe that your proposed change will not work safely in such an environment.

A basic test program exercising the patricia routines actually is available at src/examples/patriciatest and src/examples/patriciatest2, if you would like to evaluate your proposal.

kaniini · 2013-03-27T18:49:01Z

Regarding the image macro, @Aerdan observed that Github has finally added comment deletion. So, we have taken the liberty of removing the image macro and all posts related to it, as they are offtopic.

kaniini · 2013-03-27T18:55:18Z

Regarding mowgli_patricia_sizeof(), do you feel it should calculate the entire memory usage of the data structure (likely by traversing the tree using mowgli_patricia_stats()) or return the consumption of a single element?

I am assuming the former, but am seeking clarification in this regard. In the case of the former, I feel that mowgli_patricia_size() would be a better function name.

keroserene · 2013-03-27T21:26:13Z

Virtual memory translation tables + ASLR should have no relation to the offsets discussed here, since the heap is still contiguous. Even if you have non-contiguous allocations, that's fine because you can just index by "which block" (base address is likely a multiple of the kernel's page size) + internal block offset. Furthermore, since you'll probably have a maximum block size for dynamic growth, and therefore a maximum number of blocks which fit in your heap, your pair of offsets will occupy very few bits.

Offsets-rather-than-pointers is one of the easiest and most effective techniques (you typically see this in malloc implementations) to decrease internal fragmentation. Furthermore, I think this may in fact enhance the clarity of the code, since instead of arbitrary pointers you now have explicit information about which block you're dealing with.

Just my 2 cents.

kaniini · 2013-03-28T00:17:06Z

@keroserene actually, they do, because we have different mapped regions for every Nth page in the memory pool. thusly, it is possible that allocations are not contiguous, as linux will allocate from both the lowest 24 bits and highest 24 bits on amd64 (look at some stacktraces in gdb if you don't believe it).

once you cross that boundary on 64-bits, a 32-bit value cannot describe the offset. it is why ptrdiff_t is still 64-bits long on 64-bit systems.

however, @jart clarified what she was actually after on twitter, so i will work on it this weekend (maybe earlier, but we're busy with planning-related meetings at work, so i am not sure how much time i will have between then and now).

the requested change seems reasonable, the source of confusion was that from my initial read of the bug report that you wanted to store a 32-bit address offset.

we will have to do this in a couple of steps:

the block allocator will need index accessors written to access an individual element given a block.
the block allocator will need index accessors written to access individual blocks given an index number.
once the above is done, we can use these values instead of pointer addresses as described by you and also @jart on twitter earlier today.

i need to think a little on how to implement step 2, i have a few solutions but they're kinda ugly. i guess i could make the index class freestanding and make it not my problem.

kaniini added a commit that referenced this issue Mar 27, 2013

Fix description of mowgli_patricia routines. (ref #18)

0dff482

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

patricia should use heap offset pointers #18

patricia should use heap offset pointers #18

jart commented Mar 25, 2013

spb commented Mar 25, 2013

jart commented Mar 25, 2013

spb commented Mar 26, 2013

kaniini commented Mar 27, 2013

kaniini commented Mar 27, 2013

kaniini commented Mar 27, 2013

kaniini commented Mar 27, 2013

keroserene commented Mar 27, 2013

kaniini commented Mar 28, 2013

patricia should use heap offset pointers #18

patricia should use heap offset pointers #18

Comments

jart commented Mar 25, 2013

spb commented Mar 25, 2013

jart commented Mar 25, 2013

spb commented Mar 26, 2013

kaniini commented Mar 27, 2013

kaniini commented Mar 27, 2013

kaniini commented Mar 27, 2013

kaniini commented Mar 27, 2013

keroserene commented Mar 27, 2013

kaniini commented Mar 28, 2013