Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk Loading Algorithm(s) #2

Open
c-blake opened this issue Nov 7, 2019 · 1 comment
Open

Bulk Loading Algorithm(s) #2

c-blake opened this issue Nov 7, 2019 · 1 comment

Comments

@c-blake
Copy link

c-blake commented Nov 7, 2019

B-Trees have efficient algorithm(s) to populate a tree for later modification from a set of entries already in-order (or in-reverse order). The algorithm is fast both in constant factors and asymptotically and creates a dense tree. https://en.wikipedia.org/wiki/B-tree has details.

@c-blake
Copy link
Author

c-blake commented May 26, 2020

Well, I have an impl over in https://github.com/c-blake/adix of this. The usual B-Tree invariants are violated in the course of this initial phase and require a final touch up at the end. So, for example in Unix shell script-ese (I use Zsh , but it would probably work in Bash, too) you can do:

cd adix/tests
nim c ppss
nim c btshell
(for i in {1..100}; do echo A1 $i 0; done; echo D1 0; echo c; echo p)| ./ppss | ./btshell | less -R  # -R for color structure highlighting

The A1 $i 0s do the initial phase bulk add (with zero extra slots) while the D1 0 does the finalization/B-Tree invariant restoring part. You can also do A1 $i 1 and D1 1 to do only 18, not 19. (The default is a 10..20 link tree.) That could help avoid some splits if you think you might add a few elements here & there after initial construction but still get you 18/19 = 95% utilization. I suspect, but do not know for sure, that this algorithm can be generalized to allow any valid amount of spare space (well, less than half to satisfy the B-tree invariants anyway).

You'd need >~ 19*19=361 objects to start to see a third level which actually isnt' so bad if you pipe to a pager like less, but you could also just compile btshell with -d:btTall to get more structure sooner. Tall is often more useful for debugging than benchmarking/actual use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant