New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added initial prolly tree ADL Spec #254
base: master
Are you sure you want to change the base?
Conversation
I'll leave my 0.02$ here.
|
@SionoiS By Entree do you mean Entry (I think English and French have different spelling)? I'm down to add wording to that extent. Should make some of the wording more terse. Mind elaborating on what you mean by "Doesn't matter if you hash stuff or not either. Multiple Entrees get then chunked and the tree built."
I'm still not sure about this since any intermediate node could be useful as a root. In particular if it contains a subset of the keyspace that's relevant for a particular use case. |
Hi 👋 Following this, as I'm super interested in this as a possible solution for directory sharding in WNFS: wnfs-wg/spec#8
My thinking on this would be, there should be 'special' ("blessed") root nodes that contain the config. What I think is more likely is that there will be two prolly tree roots that share a subset of nodes. Perhaps because they were derived from the same original prolly tree or perhaps because one is a subset view of the other or something similar. In these cases you end up with both of these trees starting in these blessed root nodes which can contain the config. |
Yes your right, entry/entries as in dictionary entries or ledger entry, would be the right word.
I just meant that keys can be anything that can be ordered. Entries can be; key-value pairs, just keys if used as a set instead of a map or links to other blocks of the tree with some kind of data used for ordering.
Yes, but only the builder the the actual tree will know which subset is relevant, not us.
Also, nothing prevent us from adding config to any new blocks as the tree is updated. |
Please keep in mind that the overhead is on link (to a config) per node. I would like to argue that this is not at all that much. More importantly, what will happen is that root nodes will become non-root nodes in newer trees (ex. when items are added). If we were to follow the strict distinction between root/non-root, we would have to change the prior root node into a non-root node. Which would mean that links that point to the old root node are invalidated. If we don't change/invalidate the old root, we need to make a 'copy' of the old root to insert it into the new tree, which would be against the idea of Merkle-ization. If we don't distinguish between root and non-root, we simply create the new root node, link from it to the old and everything stays valid. |
That's a good point. I'm convinced! Link at every node makes more sense. |
I'm not quite convinced. In newer trees, the old root will actually always be split into multiple nodes, so that the new root links to multiple children. I think the only case when the root node becomes a perfect child is if you're appending the "perfect" key value pairs to e.g. the end of the sequence, but I think that's fairly unlikely? In any case - I don't think it's really worth arguing about. I wouldn't care much if there's a 40 byte overhead in each node. I'd be more concerned about the crosslinking perhaps, but even that tools should be able to handle.
I'd say, today: Add some signaling. E.g. some integer identifying the chunking algorithm in a table that you make up specifically for this spec. The table may only have a single entry today, but implementations will know how to error out properly if they encounter a value they don't expect.
Hash the keys, not the values, right? If you'd hash the keys then an in-place update of a value could change the chunking.
A developer should configure an entry limit per nodes. Similar to the 1000 entry limit for UnixFS directories, computed from the key-values given a max allowed directory name size. The developer should choose a limit that allows the block to stay within their chosen block size, e.g. 262kB or 1MB. That parameter informs the probability distribution function such that it cuts off at a certain point. Such a 'cutoff point' is inefficient, as you'll end up splitting and merging your node all the time, but that should be super rare given a well-chosen probability distribution function.
The price we pay for adding an integer that signals the chunking strategy, even if it's just one is really low, IMO. So yeah, I'd say limit to one, but add that signaling so there's room to grow eventually.
🤷♂️ Seems reasonable to me? Is there any value in linking to values at non-0 level?
Perhaps? I wouldn't mind a lot if it's "missing" from the spec but then is mentioned in benchmarks/there are defaults in implementations. |
Had a call with Mikeal today to talk about this. Some notes from what we talked about:
I'll think more about this and some of the other comments and propose some spec revisions to what we have so far. Our current spec is based on what we learned from Dolt, but I think there's room for changes to fit with IPLD more cleanly. |
So, I've been talking to folks more and ruminating and have come up with some thoughts for the questions that were initially made:
One thing that really stuck out to me here was Mikeal's ideal to have the tree structure itself be more general purpose than just prolly trees. With that in mind, I think we should remove the Prolly part of branch and leaf nodes, and have a general purpose "tree" with keys and values. Then we can have a prolly root that links to the config and the tree root. e.g.
We should include the hash function to be used in the chunking strategy config. In particular, using the byte from the multihash table to represent it. One thing to keep in mind is that some chunking strategies require the ability to supply salt (like the Weibull or RollingHash). This means not all functions in the table will be viable for every strategy. This is probably something that will need to error out on write. We should also include the codec and other CID stuff in the config since sniffing that from the root CID is kind of cumbersome and hacky. Having all the config options you need to encode / chunk a tree in a single object can make it easier to pass around during tree construction. We should also take @sssion 's suggestion of using a keyed union representation of the configs. IMO, it'd be safe to suggest using cbor to start unless folks have a better idea. We (folks working on the devgrant) were also talking about making a custom encoding with multicodec so that we can squeeze every bit of of performance out, but we're going to put that off for future work and stick to what's fastest to get working now to start.
In order to support structures where the same key can appear multiple times, we should use both the key and the value in the hashing. It's also useful to introduce further randomness to the chunking to make it more difficult to create specially crafted keys that can overload the structure. So, the hashing should look like I don't think we can prevent large values, but there should be wording in there about fixed sized value structures being the best option, and if you need large data, you should use a CID that points to the actual tree. This will also give us room for more fine grained optimization on writes and on storage down the line. Regarding what @matheus23 said "Hash the keys, not the values, right? If you'd hash the keys then an in-place update of a value could change the chunking.". I'm actually not sure about why we shouldn't do this other than increasing randomness. @mikeal mentioned it was important but I honestly don't remember why other than avoiding some attacks and multi-key scenarios.
We should have some wording in the spec around "Hey, try to keep your blocks within this size, and here's some parameters that can help with that". 2 MiB is a limitation of the IPFS bitswap protocol, but it's not necessarily a limitation of all IPLD encoded blocks. With that in mind we should figure out some parameters to suggest folks to use as part of the spec. One thing that we should consider is whether the
Talking to folks, it sounds like having weibull and rolling hash down the line is useful, but that it'd also be useful to start with something simple like the chunking factor based on the hash of the key-value pair to start. With that in mind, I'd propose to remove the existing weibull and rolling hash code, and leave room for it in subsequent prs when we add it to the list of functions. Ideally we'll start with the hash suffix and do some tests on it, then add weibull and rolling hash as we learn more. In particular, the weibull one seems useful for controlling the tree size in a more fine-grained manner that's more resistant to carefully crafted key-value pairs making the chunk too big. One thing we might want to bikeshed is whether we should use a byte pointing to a table or a string. Personally I like strings since they're explicit and make it harder to have accidental collisions as we add new strategies.
I'm still not sure about this one. The tradeoffs are that having an extra field will lead to a couple extra bytes, but will lead to slightly cleaner code and structs where it's obvious what part is values and what part can be expected to be prolly trees. It was also mentioned that we could get rid of the So it could either look like one of the following. IMO the first one is easier to make clean code interfaces for since you have an obvious mapping
or
Maybe we can also use a keyed union representation? I'm not sure if that's possible along with a tuple representation (which is useful for saving on bytes for the property names + makes it easier to use whatever property names you want). I was also thinking that we could skip the
We should at least provide one viable config for folks to test out and test their implementations against. |
@RangerMauve, this is a reply to your comment. To keep it shorter, I'll only comment on the things I care about and don't agree with.
At least for the
I'm strongly in favour of this one. Having only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work, that's really a good start.
I only had a quick look at the algorithms, I'd hope someone else would have a deeper look.
In addition to my inline comments I've a few general ones.
For the function definitions, I think it would be great to have a short description on what they actually do. Kind of the things you would put into your API docs.
I'm a bit confused about the images. They don't really seam to build upon each other, yet there are posted in sequence without further text. It's not really clear what each of them tries to explain (only partially).
At the beginning there are some references, but I think it would be great to have a section that explains the general idea which role the hashing plays for the chunking and why it leads to a deterministic tree.
Co-authored-by: Volker Mische <volker.mische@gmail.com>
Co-authored-by: Volker Mische <volker.mische@gmail.com>
Co-authored-by: Volker Mische <volker.mische@gmail.com>
Co-authored-by: Volker Mische <volker.mische@gmail.com>
Co-authored-by: Volker Mische <volker.mische@gmail.com>
K! I don't think we should merge this yet, but I think the spec is ready for a final review by @mikeal for the devgrant while we work on the initial ADL implementation. Ty to everyone that helped figure out the little details! |
Co-authored-by: ch3 <72873632+che-ch3@users.noreply.github.com>
Co-authored-by: ch3 <72873632+che-ch3@users.noreply.github.com>
Co-authored-by: ch3 <72873632+che-ch3@users.noreply.github.com>
So, doing the differentiation via types isn't easy to do with how IPLD Schema works right now, so instead we'll change With this in place the spec should be good to go for now and we'll leave it open as we get the Rust and Golang ADL implementations in place 👍 We might make changes to the spec as part of our learnings from those implementations, but it's likely the general structure won't change until we add new chunking strategies. 🎉 |
@RangerMauve I’m super excited about this spec. I’ve been looking into Prolly Trees for some time to see if they would work for what we do at Mintter. Have you looked at Merkle Search Trees which are used by Bluesky? It seems like they are very similar to Prolly Trees, if not exactly the same (I haven’t read that paper deeply yet). Maybe it’s worth mentioning the differences/similarities between these in the spec? |
Prolly trees are based on merkle search trees. The main difference is that Merkle Search Trees assign a "level" to the key-value pair based on the hash, and Prolly Trees assign chunk boundries based on the key-value pair hash. In general prolly trees are better for sequential reads. Not sure about the approach bluesky is taking, but I'd love to collaborate with them on this. 👍 |
Golang ADL based on the spec is here: https://github.com/kenlabs/go-ipld-prolly-trees/pull/7/files#diff-7123ca137150a938ea6af380a465add25710f8f58e1b99c3a5187e505bf992b1 |
2023-01-24 maintainer conversation: we're going to let the implementation move forward and inform the spec. |
|
||
### Note on Multiple and Single Values and Sets | ||
|
||
The described tree can represent a data structure with multiple, single or no values per key. However, given that IPLD Maps (which a Prolly Tree loosely maps to) only allow one value for a key, implementations should merge duplicate keys into a single value. This is also important to have consistent ordering of key value pairs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a recommended approach to merging duplicate keys? If I have a database index over a recipe database for "contains meat", my queries will return rows organized by a boolean key, thus being identical for some or even all result rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is very dependent on the application.
For some applications you can treat the value as a sort of CRDT and merge them, for some you might want a last write wins with some sort of clock, in some cases you might want to have both key value pairs beside each other and iterate over them both when doing a search.
For indexes I usually have the document's primary key in the index key so that they can be unique.
Here's where I did stuff in my hyperbee based DB which is also a key-value store: https://github.com/RangerMauve/hyperbeedeebee/blob/default/index.js#L782
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that it belongs in the spec but where would you put guidance to library developers as to what the api for prolyl tree ADL should be.
Insert(key, value): Add keyvalue if key not present
Update(key, value): Last writer wins if key present
Upsert(key, value): Last writer wins whether or not key present
Merge(merge_func(key, existing_value, incomeing_value)): call the function with the old and new value put the output in the key.
https://clojure.github.io/clojure/clojure.core-api.html#clojure.core/swap!
Little nit: prolly trees predate MSTs by several years. |
@aboodman Snap, good to know. Mind sending some links for info on that so I can add them to the spec text? |
They were first documented in 2016: attic-labs/noms@8abe3a6#diff-dd95875d468bbbdd06a921cef4ad250818b906f448f642e868ade571f8a26c96R117 The implementation in that repo dates to probably the beginning of 2016 or late 2015. |
Raw keys(keys/values input from users) for leaf node. Key-value pairs are sorted by byte value with the "larger" keys being at the end. Values are compared at the first byte, and going down to the end. This means that keys that are just a prefix come before keys that are prefix + 1 byte. | ||
|
||
### `TreeNode.values` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about what is said here.
"Values corresponding to keys."
"Values can point to arbitrary IPLD nodes and it is up to applications to generate and process them."
"For non-leaf node these will be Links to other nodes."
Would make more sense IMO.
edit: What about when used as a set? Values should not exist no?
### `ProllyTreeConfig.hashLength` | ||
|
||
This is the multihash length parameter which should be used for generating CIDs. | ||
It can be set to `null` to use the default hash length from the hash function output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does null represent here? Zero?
AFAIK a tuple representation cannot have null/optional value.
cc @tabcat |
This spec isn't 100% complete yet, but it's at a point where we have enough figured out that we can get some final feedback on the details from folks that have deep IPLD knowledge.
This is based on the first milestone of this devgrant.
We have an MVP implementation where we sketched some code up based on Dolt just to see if we could get it working with IPLD at all. It's here and it has some tess and methods of rendering trees to graphviz.
Once we finish this spec we'll reimplement stuff based on the spec phrasing to make sure it's compliant (and expose as an ADL). kenlabs/ptree-bs#1
On the TODO list is:
value
necessary to be hashed and if so what sort of limitations do we have with "large" values.values
andlinks
into a single field and say that if level != 0 it must contain a CID to a ProllyNode.cc @taoshengshi