Added initial prolly tree ADL Spec #254

RangerMauve · 2022-11-23T03:23:38Z

This spec isn't 100% complete yet, but it's at a point where we have enough figured out that we can get some final feedback on the details from folks that have deep IPLD knowledge.

This is based on the first milestone of this devgrant.

We have an MVP implementation where we sketched some code up based on Dolt just to see if we could get it working with IPLD at all. It's here and it has some tess and methods of rendering trees to graphviz.

Once we finish this spec we'll reimplement stuff based on the spec phrasing to make sure it's compliant (and expose as an ADL). kenlabs/ptree-bs#1

On the TODO list is:

cc @taoshengshi

SionoiS · 2022-11-23T16:05:09Z

I'll leave my 0.02$ here.

Everything should instead be called Entree and the ONLY requirement is strict ordering.
keys:values -> strict ordering -> Entrees
links -> strict ordering-> Entrees
??? -> strict ordering -> Entrees
Doesn't matter if you hash stuff or not either. Multiple Entrees get then chunked and the tree built.
Config should be optional. If some sub-tree is meaningful add config to skip getting the root. Is the root the only meaningful entree point? skip config in other nodes.

RangerMauve · 2022-11-25T17:38:51Z

@SionoiS By Entree do you mean Entry (I think English and French have different spelling)? I'm down to add wording to that extent. Should make some of the wording more terse.

Mind elaborating on what you mean by "Doesn't matter if you hash stuff or not either. Multiple Entrees get then chunked and the tree built."

Config should be optional

I'm still not sure about this since any intermediate node could be useful as a root. In particular if it contains a subset of the keyspace that's relevant for a particular use case.

matheus23 · 2022-11-25T20:19:50Z

Hi 👋

Following this, as I'm super interested in this as a possible solution for directory sharding in WNFS: wnfs-wg/spec#8

Config should be optional

I'm still not sure about this since any intermediate node could be useful as a root. In particular if it contains a subset of the keyspace that's relevant for a particular use case.

My thinking on this would be, there should be 'special' ("blessed") root nodes that contain the config.
It's unlikely that a specific intermediate node boundaries match exactly the boundaries someone else is looking for.

What I think is more likely is that there will be two prolly tree roots that share a subset of nodes. Perhaps because they were derived from the same original prolly tree or perhaps because one is a subset view of the other or something similar.

In these cases you end up with both of these trees starting in these blessed root nodes which can contain the config.

SionoiS · 2022-11-26T13:40:06Z

@SionoiS By Entree do you mean Entry (I think English and French have different spelling)? I'm down to add wording to that extent. Should make some of the wording more terse.

Yes your right, entry/entries as in dictionary entries or ledger entry, would be the right word.

Mind elaborating on what you mean by "Doesn't matter if you hash stuff or not either. Multiple Entrees get then chunked and the tree built."

I just meant that keys can be anything that can be ordered.

Entries can be; key-value pairs, just keys if used as a set instead of a map or links to other blocks of the tree with some kind of data used for ordering.

I'm still not sure about this since any intermediate node could be useful as a root. In particular if it contains a subset of the keyspace that's relevant for a particular use case.

Yes, but only the builder the the actual tree will know which subset is relevant, not us.
IMO the spec should say all non-root node CAN have a link to the config IF the user benefit from it, otherwise omit to reduce overhead.

My thinking on this would be, there should be 'special' ("blessed") root nodes that contain the config.
It's unlikely that a specific intermediate node boundaries match exactly the boundaries someone else is looking for.

What I think is more likely is that there will be two prolly tree roots that share a subset of nodes. Perhaps because they were derived from the same original prolly tree or perhaps because one is a subset view of the other or something similar.

In these cases you end up with both of these trees starting in these blessed root nodes which can contain the config.

Also, nothing prevent us from adding config to any new blocks as the tree is updated.

che-ch3 · 2022-11-29T11:10:19Z

IMO the spec should say all non-root node CAN have a link to the config IF the user benefit from it, otherwise omit to reduce overhead.

Please keep in mind that the overhead is on link (to a config) per node. I would like to argue that this is not at all that much.
Further I would argue that it simplifies data structures: No discerning between root/non-root node, no consideration of optional data within nodes. I personally highly appreciate simplicity in non-essential parts.
My gut feeling would be that operations on sub-trees happen all the time (as part of larger operations). Whether there's a need to keep the config at each node - I don't know yet.

More importantly, what will happen is that root nodes will become non-root nodes in newer trees (ex. when items are added). If we were to follow the strict distinction between root/non-root, we would have to change the prior root node into a non-root node. Which would mean that links that point to the old root node are invalidated. If we don't change/invalidate the old root, we need to make a 'copy' of the old root to insert it into the new tree, which would be against the idea of Merkle-ization. If we don't distinguish between root and non-root, we simply create the new root node, link from it to the old and everything stays valid.

SionoiS · 2022-11-29T12:52:07Z

More importantly, what will happen is that root nodes will become non-root nodes in newer trees (ex. when items are added). If we were to follow the strict distinction between root/non-root, we would have to change the prior root node into a non-root node. Which would mean that links that point to the old root node are invalidated. If we don't change/invalidate the old root, we need to make a 'copy' of the old root to insert it into the new tree, which would be against the idea of Merkle-ization. If we don't distinguish between root and non-root, we simply create the new root node, link from it to the old and everything stays valid.

That's a good point. I'm convinced! Link at every node makes more sense.

matheus23 · 2022-12-02T09:59:14Z

can folks answer the following questions?

Does including the chunk config CID in every node instead of having official "root nodes" add enough overhead to matter?

More importantly, what will happen is that root nodes will become non-root nodes in newer trees (ex. when items are added). If we were to follow the strict distinction between root/non-root, we would have to change the prior root node into a non-root node. Which would mean that links that point to the old root node are invalidated. If we don't change/invalidate the old root, we need to make a 'copy' of the old root to insert it into the new tree, which would be against the idea of Merkle-ization. If we don't distinguish between root and non-root, we simply create the new root node, link from it to the old and everything stays valid.

That's a good point. I'm convinced! Link at every node makes more sense.

I'm not quite convinced. In newer trees, the old root will actually always be split into multiple nodes, so that the new root links to multiple children. I think the only case when the root node becomes a perfect child is if you're appending the "perfect" key value pairs to e.g. the end of the sequence, but I think that's fairly unlikely?

In any case - I don't think it's really worth arguing about. I wouldn't care much if there's a 40 byte overhead in each node. I'd be more concerned about the crosslinking perhaps, but even that tools should be able to handle.

Should we configure the hash functions used for chunking, if so how

I'd say, today: Add some signaling. E.g. some integer identifying the chunking algorithm in a table that you make up specifically for this spec. The table may only have a single entry today, but implementations will know how to error out properly if they encounter a value they don't expect.
If in the future we come up with some specialized, perhaps parameterized way of defining a chunker, then so be it, we'll add another algorithm identifier and that may inform implementations to look for the parameters somewhere (in a way that keeps old implementations error out expectedly).

How should we do hashing of keys for chunking, namely is the value necessary to be hashed and if so what sort of limitations do we have with "large" values.

Hash the keys, not the values, right? If you'd hash the keys then an in-place update of a value could change the chunking.
I'd expect most 'downstream users' of this prolly tree will instantiate values with

CIDs
Unit (empty struct/record) so it's a map
Some other constant-sized thing

How should we deal with (or should we even) nodes that end up > 2MiB

A developer should configure an entry limit per nodes. Similar to the 1000 entry limit for UnixFS directories, computed from the key-values given a max allowed directory name size.

The developer should choose a limit that allows the block to stay within their chosen block size, e.g. 262kB or 1MB.

That parameter informs the probability distribution function such that it cuts off at a certain point. Such a 'cutoff point' is inefficient, as you'll end up splitting and merging your node all the time, but that should be super rare given a well-chosen probability distribution function.

Should we finalize all the chunking strategies at once, or limit to one with room for growth

The price we pay for adding an integer that signals the chunking strategy, even if it's just one is really low, IMO. So yeah, I'd say limit to one, but add that signaling so there's room to grow eventually.

Should we merge the values and links into a single field and say that if level != 0 it must contain a CID to a ProllyNode.

🤷‍♂️ Seems reasonable to me? Is there any value in linking to values at non-0 level?

Should we suggest default encodings + hash functions + codecs for the CIDs/Blocks?

Perhaps? I wouldn't mind a lot if it's "missing" from the spec but then is mentioned in benchmarks/there are defaults in implementations.

RangerMauve · 2022-12-05T23:26:24Z

Had a call with Mikeal today to talk about this. Some notes from what we talked about:

It would be useful to have a general "ordered tree" spec where a reader can assume that the tree is sorted, and can then traverse it however it was created
Prolly trees could be a superset of this base "tree" pair
Having the config outside of tree nodes would be useful for cases where two "tree" specs create the same intermediate nodes or leaf nodes. In this case it would be good to have a "root" type which links to the config and the root prolly node.
Hash functions should be in the config using the multihash code or similar, this will be used by writers when creating the prolly tree
It would be useful to include the config in a custom multicodec (not sure how much effort this would be in practice)
Values in leaves could be stored in key, value tuples for perf benefits
We might want to consider supporting prolly trees that are not just Map analogues, but also something like append only logs where entries are appended at the end and there are no keys (personally I think it'd be useful to scope things to maps to start and look at this as a separate primitive, just due to priorities I have around the map use case)
For block storage going over 1MiB, the spec doesn't need to have anything specific to guard against it, but we can mention about upper limits for number of entires and suggest configuration options for chunking that would enforce that. There might be larger block sizes in the future in new systems
Parent or child node could just be a boolean, the level might not be necessary? (actually might be needed for some chunking strategies)
Echoing that it'd be useful to start with one chunking strategy which uses the byte suffix of a hash (suffix because multihash usually uses first few bytes for codec reasons)
Should leave space for future chunking strategies like weibull + rolling
Hashing entries for chunking should be both the key and the value

I'll think more about this and some of the other comments and propose some spec revisions to what we have so far. Our current spec is based on what we learned from Dolt, but I think there's room for changes to fit with IPLD more cleanly.

specs/advanced-data-layouts/prollytree/spec.md

RangerMauve · 2022-12-09T02:12:57Z

So, I've been talking to folks more and ruminating and have come up with some thoughts for the questions that were initially made:

Does including the chunk config CID in every node instead of having official "root nodes" add enough overhead to matter?

One thing that really stuck out to me here was Mikeal's ideal to have the tree structure itself be more general purpose than just prolly trees. With that in mind, I think we should remove the Prolly part of branch and leaf nodes, and have a general purpose "tree" with keys and values. Then we can have a prolly root that links to the config and the tree root.

e.g.

type ProllyTree struct {
  config &ProllyTreeConfig
  root &TreeNode
}

type TreeNode struct {
  keys [Bytes]
  links [&TreeNode]
  values [Any]
  level Int
}

Should we configure the hash functions used for chunking, if so how

We should include the hash function to be used in the chunking strategy config. In particular, using the byte from the multihash table to represent it. One thing to keep in mind is that some chunking strategies require the ability to supply salt (like the Weibull or RollingHash). This means not all functions in the table will be viable for every strategy. This is probably something that will need to error out on write.

We should also include the codec and other CID stuff in the config since sniffing that from the root CID is kind of cumbersome and hacky. Having all the config options you need to encode / chunk a tree in a single object can make it easier to pass around during tree construction.

We should also take @sssion 's suggestion of using a keyed union representation of the configs.

IMO, it'd be safe to suggest using cbor to start unless folks have a better idea. We (folks working on the devgrant) were also talking about making a custom encoding with multicodec so that we can squeeze every bit of of performance out, but we're going to put that off for future work and stick to what's fastest to get working now to start.

How should we do hashing of keys for chunking, namely is the value necessary to be hashed and if so what sort of limitations do we have with "large" values.

In order to support structures where the same key can appear multiple times, we should use both the key and the value in the hashing. It's also useful to introduce further randomness to the chunking to make it more difficult to create specially crafted keys that can overload the structure.

So, the hashing should look like hash(key + encode(value)). This can be optimized somewhat with using digests where you can feed chunks to them, so we can avoid concatenating the bytes.

I don't think we can prevent large values, but there should be wording in there about fixed sized value structures being the best option, and if you need large data, you should use a CID that points to the actual tree. This will also give us room for more fine grained optimization on writes and on storage down the line.

Regarding what @matheus23 said "Hash the keys, not the values, right? If you'd hash the keys then an in-place update of a value could change the chunking.". I'm actually not sure about why we shouldn't do this other than increasing randomness. @mikeal mentioned it was important but I honestly don't remember why other than avoiding some attacks and multi-key scenarios.

How should we deal with (or should we even) nodes that end up > 2MiB

We should have some wording in the spec around "Hey, try to keep your blocks within this size, and here's some parameters that can help with that".

2 MiB is a limitation of the IPFS bitswap protocol, but it's not necessarily a limitation of all IPLD encoded blocks. With that in mind we should figure out some parameters to suggest folks to use as part of the spec.

One thing that we should consider is whether the minChunkSize Int and maxChunkSize Int are realistic for using with byte sizes and how to even calculate that without constantly re-encoding the block. My gut feeling is we'd be safe to start with a "maxKeys" and "minKeys" and punt the specific byte stuff down to chunking strategies that have more specific needs for it.

Should we finalize all the chunking strategies at once, or limit to one with room for growth

Talking to folks, it sounds like having weibull and rolling hash down the line is useful, but that it'd also be useful to start with something simple like the chunking factor based on the hash of the key-value pair to start.

With that in mind, I'd propose to remove the existing weibull and rolling hash code, and leave room for it in subsequent prs when we add it to the list of functions. Ideally we'll start with the hash suffix and do some tests on it, then add weibull and rolling hash as we learn more. In particular, the weibull one seems useful for controlling the tree size in a more fine-grained manner that's more resistant to carefully crafted key-value pairs making the chunk too big.

One thing we might want to bikeshed is whether we should use a byte pointing to a table or a string. Personally I like strings since they're explicit and make it harder to have accidental collisions as we add new strategies.

Should we merge the values and links into a single field and say that if level != 0 it must contain a CID to a ProllyNode.

I'm still not sure about this one. The tradeoffs are that having an extra field will lead to a couple extra bytes, but will lead to slightly cleaner code and structs where it's obvious what part is values and what part can be expected to be prolly trees. It was also mentioned that we could get rid of the level:0 heuristic for leaf nodes and inspect whether the value being pointed at is a TreeNode or not, but IMO the level doesn't add much overhead vs how complicated the code for that would end up being.

So it could either look like one of the following. IMO the first one is easier to make clean code interfaces for since you have an obvious mapping

type TreeNode {
  level Int
  keys [Bytes]
  links nullable [&TreeNode]
  values nullable [Any]
}

or

type TreeNode {
  level Int
  keys [Bytes]
  # If it's not a leaf it's a TreeNode
  values [Any]
}

Maybe we can also use a keyed union representation? I'm not sure if that's possible along with a tuple representation (which is useful for saving on bytes for the property names + makes it easier to use whatever property names you want).

I was also thinking that we could skip the keys and values arrays entirely and have an entries array with [key, value] tuples, but @sssion did some investigation and that would add a one or two byte overhead per entry compared to having two lists, and as you approach having 1000's of entries it adds up. Thus I don't think the entries approach is worth the overhead. As well, it might speed up searching through just the keys, but I have no data to back that up. This was done with the CBOR representaiton, but I wouldn't be surprised if similar stuff applied accross all codecs.

Should we suggest default encodings + hash functions + codecs for the CIDs/Blocks?

We should at least provide one viable config for folks to test out and test their implementations against.

vmx · 2022-12-09T12:30:09Z

@RangerMauve, this is a reply to your comment. To keep it shorter, I'll only comment on the things I care about and don't agree with.

One thing that we should consider is whether the minChunkSize Int and maxChunkSize Int are realistic for using with byte sizes and how to even calculate that without constantly re-encoding the block

At least for the maxChunkSize I'd be in favour of having it in bytes. I'd rather make the implementation a bit more complicated, than ending up with consumers that and up with non-bitswappable blocks. For implementations, I could imagine that they have some threshold and they would split rather early before the maxChunkSize is reached and then error in case their heuristic was wrong, and then end up with re-encoding.

Should we merge the values and links into a single field and say that if level != 0 it must contain a CID to a ProllyNode.
type TreeNode {
  level Int
  keys [Bytes]
  # If it's not a leaf it's a TreeNode
  values [Any]
}

I'm strongly in favour of this one. Having only values and not links matches how nodes in trees are usually defined (from my experience). You'd then have some indicator on whether it's an inner or leaf node. Using the level for that sounds useful to me. It may be used for things like statistics or diffing/merging.

vmx

Good work, that's really a good start.

I only had a quick look at the algorithms, I'd hope someone else would have a deeper look.

In addition to my inline comments I've a few general ones.

For the function definitions, I think it would be great to have a short description on what they actually do. Kind of the things you would put into your API docs.

I'm a bit confused about the images. They don't really seam to build upon each other, yet there are posted in sequence without further text. It's not really clear what each of them tries to explain (only partially).

At the beginning there are some references, but I think it would be great to have a section that explains the general idea which role the hashing plays for the chunking and why it leads to a deterministic tree.

specs/advanced-data-layouts/prollytree/spec.md

Co-authored-by: Volker Mische <volker.mische@gmail.com>

RangerMauve · 2022-12-13T19:46:07Z

K! I don't think we should merge this yet, but I think the spec is ready for a final review by @mikeal for the devgrant while we work on the initial ADL implementation. Ty to everyone that helped figure out the little details!

specs/advanced-data-layouts/prollytree/spec.md

Co-authored-by: ch3 <72873632+che-ch3@users.noreply.github.com>

specs/advanced-data-layouts/prollytree/spec.md

RangerMauve · 2022-12-15T16:42:56Z

So, doing the differentiation via types isn't easy to do with how IPLD Schema works right now, so instead we'll change level to be an isLeaf boolean flag.

With this in place the spec should be good to go for now and we'll leave it open as we get the Rust and Golang ADL implementations in place 👍

We might make changes to the spec as part of our learnings from those implementations, but it's likely the general structure won't change until we add new chunking strategies. 🎉

burdiyan · 2022-12-22T18:59:02Z

@RangerMauve I’m super excited about this spec. I’ve been looking into Prolly Trees for some time to see if they would work for what we do at Mintter.

Have you looked at Merkle Search Trees which are used by Bluesky? It seems like they are very similar to Prolly Trees, if not exactly the same (I haven’t read that paper deeply yet). Maybe it’s worth mentioning the differences/similarities between these in the spec?

RangerMauve · 2022-12-23T06:00:10Z

Have you looked at Merkle Search Trees which are used by Bluesky?

Prolly trees are based on merkle search trees. The main difference is that Merkle Search Trees assign a "level" to the key-value pair based on the hash, and Prolly Trees assign chunk boundries based on the key-value pair hash. In general prolly trees are better for sequential reads.

Not sure about the approach bluesky is taking, but I'd love to collaborate with them on this. 👍

RangerMauve · 2023-01-17T20:48:35Z

Golang ADL based on the spec is here: https://github.com/kenlabs/go-ipld-prolly-trees/pull/7/files#diff-7123ca137150a938ea6af380a465add25710f8f58e1b99c3a5187e505bf992b1

BigLep · 2023-01-24T23:17:30Z

2023-01-24 maintainer conversation: we're going to let the implementation move forward and inform the spec.

garbados · 2023-02-06T16:43:49Z

specs/advanced-data-layouts/prollytree/spec.md

+
+### Note on Multiple and Single Values and Sets
+
+The described tree can represent a data structure with multiple, single or no values per key. However, given that IPLD Maps (which a Prolly Tree loosely maps to) only allow one value for a key, implementations should merge duplicate keys into a single value. This is also important to have consistent ordering of key value pairs.


Is there a recommended approach to merging duplicate keys? If I have a database index over a recipe database for "contains meat", my queries will return rows organized by a boolean key, thus being identical for some or even all result rows.

I think this is very dependent on the application.

For some applications you can treat the value as a sort of CRDT and merge them, for some you might want a last write wins with some sort of clock, in some cases you might want to have both key value pairs beside each other and iterate over them both when doing a search.

For indexes I usually have the document's primary key in the index key so that they can be unique.

Here's where I did stuff in my hyperbee based DB which is also a key-value store: https://github.com/RangerMauve/hyperbeedeebee/blob/default/index.js#L782

I don't know that it belongs in the spec but where would you put guidance to library developers as to what the api for prolyl tree ADL should be.

Insert(key, value): Add keyvalue if key not present
Update(key, value): Last writer wins if key present
Upsert(key, value): Last writer wins whether or not key present
Merge(merge_func(key, existing_value, incomeing_value)): call the function with the old and new value put the output in the key.

https://clojure.github.io/clojure/clojure.core-api.html#clojure.core/swap!

aboodman · 2023-02-22T10:50:37Z

Prolly trees are based on merkle search trees.

Little nit: prolly trees predate MSTs by several years.

RangerMauve · 2023-02-22T22:06:07Z

@aboodman Snap, good to know. Mind sending some links for info on that so I can add them to the spec text?

aboodman · 2023-02-22T23:07:27Z

They were first documented in 2016: attic-labs/noms@8abe3a6#diff-dd95875d468bbbdd06a921cef4ad250818b906f448f642e868ade571f8a26c96R117

The implementation in that repo dates to probably the beginning of 2016 or late 2015.

SionoiS · 2023-04-05T19:42:13Z

specs/advanced-data-layouts/prollytree/spec.md

+Raw keys(keys/values input from users) for leaf node. Key-value pairs are sorted by byte value with the "larger" keys being at the end. Values are compared at the first byte, and going down to the end. This means that keys that are just a prefix come before keys that are prefix + 1 byte.
+
+### `TreeNode.values`
+


Not sure about what is said here.

"Values corresponding to keys."
"Values can point to arbitrary IPLD nodes and it is up to applications to generate and process them."
"For non-leaf node these will be Links to other nodes."

Would make more sense IMO.

edit: What about when used as a set? Values should not exist no?

SionoiS · 2023-04-11T20:37:13Z

specs/advanced-data-layouts/prollytree/spec.md

+### `ProllyTreeConfig.hashLength`
+
+This is the multihash length parameter which should be used for generating CIDs.
+It can be set to `null` to use the default hash length from the hash function output.


What does null represent here? Zero?

AFAIK a tuple representation cannot have null/optional value.

maceip · 2023-08-24T14:28:59Z

cc @tabcat

Added initial prolly tree ADL Spec

000b8db

RangerMauve requested review from rvagg and warpfork November 23, 2022 03:23

RangerMauve mentioned this pull request Nov 23, 2022

Open Grant: Prolly Trees in Rust ipfs/devgrants#257

Closed

BigLep assigned RangerMauve Nov 29, 2022

sssion reviewed Dec 8, 2022

View reviewed changes

specs/advanced-data-layouts/prollytree/spec.md Outdated Show resolved Hide resolved

sssion reviewed Dec 8, 2022

View reviewed changes

specs/advanced-data-layouts/prollytree/spec.md Outdated Show resolved Hide resolved

sssion reviewed Dec 8, 2022

View reviewed changes

specs/advanced-data-layouts/prollytree/spec.md Outdated Show resolved Hide resolved

sssion reviewed Dec 8, 2022

View reviewed changes

specs/advanced-data-layouts/prollytree/spec.md Outdated Show resolved Hide resolved

sssion reviewed Dec 8, 2022

View reviewed changes

specs/advanced-data-layouts/prollytree/spec.md Outdated Show resolved Hide resolved

sssion reviewed Dec 8, 2022

View reviewed changes

specs/advanced-data-layouts/prollytree/spec.md Outdated Show resolved Hide resolved

vmx reviewed Dec 9, 2022

View reviewed changes

sssion reviewed Dec 9, 2022

View reviewed changes

specs/advanced-data-layouts/prollytree/spec.md Outdated Show resolved Hide resolved

RangerMauve and others added 7 commits December 9, 2022 11:05

Update specs/advanced-data-layouts/prollytree/spec.md

b28edfe

Co-authored-by: Volker Mische <volker.mische@gmail.com>

Update specs/advanced-data-layouts/prollytree/spec.md

5dc4dcf

Co-authored-by: Volker Mische <volker.mische@gmail.com>

Update specs/advanced-data-layouts/prollytree/spec.md

7e2b749

Co-authored-by: Volker Mische <volker.mische@gmail.com>

Update specs/advanced-data-layouts/prollytree/spec.md

d237a04

Co-authored-by: Volker Mische <volker.mische@gmail.com>

Update specs/advanced-data-layouts/prollytree/spec.md

edb8706

Co-authored-by: Volker Mische <volker.mische@gmail.com>

Clean up prolly tree spec based on PR reviews

0d186ec

Clean up prolly tree spec further

dcf262d

RangerMauve added 4 commits December 13, 2022 14:05

Fix up prolly tree formating per vmx's comments

2e82bee

Fix spelling errors in prolly trees spec

d6de664

Acccount for max chunk size when merging treenodes on deletion

a06bfae

Add index to prolltree ADL

e5042da

che-ch3 reviewed Dec 14, 2022

View reviewed changes

specs/advanced-data-layouts/prollytree/spec.md Outdated Show resolved Hide resolved

che-ch3 reviewed Dec 14, 2022

View reviewed changes

specs/advanced-data-layouts/prollytree/spec.md Outdated Show resolved Hide resolved

che-ch3 reviewed Dec 14, 2022

View reviewed changes

specs/advanced-data-layouts/prollytree/spec.md Outdated Show resolved Hide resolved

RangerMauve and others added 4 commits December 14, 2022 12:13

Update specs/advanced-data-layouts/prollytree/spec.md

183a45d

Co-authored-by: ch3 <72873632+che-ch3@users.noreply.github.com>

Update specs/advanced-data-layouts/prollytree/spec.md

a9757ff

Co-authored-by: ch3 <72873632+che-ch3@users.noreply.github.com>

Update specs/advanced-data-layouts/prollytree/spec.md

f536669

Co-authored-by: ch3 <72873632+che-ch3@users.noreply.github.com>

Replace tabs with spaces in spec

6ad39b4

mikeal reviewed Dec 14, 2022

View reviewed changes

specs/advanced-data-layouts/prollytree/spec.md Outdated Show resolved Hide resolved

Replace level with isLeaf

93d9baf

RangerMauve mentioned this pull request Dec 15, 2022

Add open-proposal-ipld-prolly-trees ipfs/devgrants#194

Merged

RangerMauve added 2 commits December 15, 2022 13:38

Add cidVersion and hashLength to ProllyTreeConfig

6e6a60e

Update wording for hashLength config

776b537

RangerMauve mentioned this pull request Jan 11, 2023

Store snapshots of drand chain on IPFS (or web3.storage) drand/drand#1102

Open

garbados reviewed Feb 6, 2023

View reviewed changes

SionoiS reviewed Apr 5, 2023

View reviewed changes

SionoiS reviewed Apr 11, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added initial prolly tree ADL Spec #254

Added initial prolly tree ADL Spec #254

RangerMauve commented Nov 23, 2022 •

edited

SionoiS commented Nov 23, 2022

RangerMauve commented Nov 25, 2022

matheus23 commented Nov 25, 2022

SionoiS commented Nov 26, 2022

che-ch3 commented Nov 29, 2022

SionoiS commented Nov 29, 2022

matheus23 commented Dec 2, 2022 •

edited

RangerMauve commented Dec 5, 2022

RangerMauve commented Dec 9, 2022

vmx commented Dec 9, 2022

vmx left a comment

RangerMauve commented Dec 13, 2022

RangerMauve commented Dec 15, 2022

burdiyan commented Dec 22, 2022 •

edited

RangerMauve commented Dec 23, 2022

RangerMauve commented Jan 17, 2023

BigLep commented Jan 24, 2023

garbados Feb 6, 2023

RangerMauve Feb 6, 2023

AaronGoldman Apr 5, 2023

aboodman commented Feb 22, 2023

RangerMauve commented Feb 22, 2023

aboodman commented Feb 22, 2023

SionoiS Apr 5, 2023 •

edited

SionoiS Apr 11, 2023

maceip commented Aug 24, 2023


		### Note on Multiple and Single Values and Sets

		The described tree can represent a data structure with multiple, single or no values per key. However, given that IPLD Maps (which a Prolly Tree loosely maps to) only allow one value for a key, implementations should merge duplicate keys into a single value. This is also important to have consistent ordering of key value pairs.

		Raw keys(keys/values input from users) for leaf node. Key-value pairs are sorted by byte value with the "larger" keys being at the end. Values are compared at the first byte, and going down to the end. This means that keys that are just a prefix come before keys that are prefix + 1 byte.

		### `TreeNode.values`

Added initial prolly tree ADL Spec #254

Are you sure you want to change the base?

Added initial prolly tree ADL Spec #254

Conversation

RangerMauve commented Nov 23, 2022 • edited

SionoiS commented Nov 23, 2022

RangerMauve commented Nov 25, 2022

matheus23 commented Nov 25, 2022

SionoiS commented Nov 26, 2022

che-ch3 commented Nov 29, 2022

SionoiS commented Nov 29, 2022

matheus23 commented Dec 2, 2022 • edited

RangerMauve commented Dec 5, 2022

RangerMauve commented Dec 9, 2022

vmx commented Dec 9, 2022

vmx left a comment

Choose a reason for hiding this comment

RangerMauve commented Dec 13, 2022

RangerMauve commented Dec 15, 2022

burdiyan commented Dec 22, 2022 • edited

RangerMauve commented Dec 23, 2022

RangerMauve commented Jan 17, 2023

BigLep commented Jan 24, 2023

garbados Feb 6, 2023

Choose a reason for hiding this comment

RangerMauve Feb 6, 2023

Choose a reason for hiding this comment

AaronGoldman Apr 5, 2023

Choose a reason for hiding this comment

aboodman commented Feb 22, 2023

RangerMauve commented Feb 22, 2023

aboodman commented Feb 22, 2023

SionoiS Apr 5, 2023 • edited

Choose a reason for hiding this comment

SionoiS Apr 11, 2023

Choose a reason for hiding this comment

maceip commented Aug 24, 2023

RangerMauve commented Nov 23, 2022 •

edited

matheus23 commented Dec 2, 2022 •

edited

burdiyan commented Dec 22, 2022 •

edited

SionoiS Apr 5, 2023 •

edited