Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefix B+-tree #106

Open
andy-byers opened this issue Jul 1, 2023 · 1 comment
Open

Prefix B+-tree #106

andy-byers opened this issue Jul 1, 2023 · 1 comment

Comments

@andy-byers
Copy link
Owner

Is your feature request related to a problem? Please describe.
Not necessarily, this is more of an optimization.

Describe the solution you'd like
I'd like to experiment with converting the B+-tree into a prefix B+-tree as described in this paper.
This has the potential to save quite a bit of space in internal nodes, depending on how similar the keys are.
Also, if pivot keys are very long, we may be able to avoid having them spill onto overflow pages.

Describe alternatives you've considered
This seems to be the paper that introduced this concept, making it a good starting point.

Additional context

NOTE: The paper talks about B*-trees, rather than B+-trees, but there doesn't seem to be too much of a difference, at least for our purposes.

This feature can be implemented by modifying PayloadManager::promote(), which is used to convert a cell read from an external node into a pivot cell suitable for posting to an internal node.
It needs to accept the cell immediately to the left of the cell being promoted.
Additionally, the cell being promoted must be detached before being passed to PayloadManager::promote(), since it will be overwritten.
PayloadManager::promote() should be modified to compare the keys, s and t, of the 2 cells.
The shortest key, k, should be determined, such that s < k <= t.
This is the shortest key that is necessary to direct searches toward the correct external node.
The cell should then be modified to hold k (the key size and overflow ID should be corrected).
Note that k will never be longer than either s or t, so we may be able to avoid copying an overflow chain.

At some later point, Tree::redistribute_cells() should be further changed to consider multiple candidates around the chosen split point, selecting the one that produces the shortest pivot key.
This technique is described in the paper, but should be easier to implement here due to the presence of the indirection vector.

I'm pretty sure that's it for the "simple prefix B-tree" part of the paper. At this point, we can consider implementing the "prefix B-tree".
It would likely produce further fan-out improvement at the cost of some more complexity (pivots may need to be recomputed after SMOs).

andy-byers added a commit that referenced this issue Jul 2, 2023
In internal tree nodes, we store as little pivot key as possible. This addresses part of issue #106.
@andy-byers
Copy link
Owner Author

099ba10 added basic suffix truncation using Tree::make_pivot() and Tree::post_pivot(). Suffix truncation turned out to be less intrusive on the current design and was easy to implement.

I'm still considering different types of prefix compression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant