Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAQ candidate: Can I build a doubly-linked or cyclic data structure in IPLD? #304

Open
lars-hellstrom opened this issue Sep 19, 2023 · 0 comments

Comments

@lars-hellstrom
Copy link

Not natively in the Data Model, no — it's a fundamental restriction of content-addressing: you cannot create two blocks that each contain the address of the other, because this address would be part of the content and thus affect your own address. (If you know how to create such a pair of blocks, then you have broken the hashing scheme.)

By chance I notice that this is mentioned in the docs, in the glossary entry for DAG. That is unlikely to be where anyone who has this question goes hunting for that information, though; the FAQ would be a better place to find it.

You may also want to give some thoughts to crafting advice for people who want to create cyclic data structures (maybe because they're porting something that makes heavy use of those). I personally like my data structures being acyclic, but I have at times encountered communities which seem to think a language is crap if it stops you from turning your data structures into a tangle of spaghetti. Better be prepared.

Tangential musings: PDF

It occurred to me that the kinds in IPLD are a very good match for the types in the data structure making up a PDF document: not only are there the two container types dictionary (Map) and array (List), there is also the reference (Link) as tool for building data structures. Both specs furthermore have booleans and the null as values distinct from numbers, and in principle distinguish integers from reals (but you shouldn't ever depend on that last part under either spec). What may seem like a difference is that PDF has these name values which are quite distinct from strings, but that works too: PDF strings are Bytes (the 8-bit nature of them is very ingrained into PDF), leaving String available to serve as counterpart of PDF names, which indeed they must be because of how IPLD and PDF both restrict what values are possible keys in Maps/dictionaries.

PDF streams may seem like the odd man out, but once you get past the filtering and other encoding trickeries (of which there are plenty) a stream is essentially a combination of Map and Bytes: it carries both kinds of data. Dunno if that is something which in the IPLD setting could be arranged with a custom codec; PDF streams must always be indirect objects, i.e., they are values to which there exists a reference, so there would have to be a separate block.

What kills the idea of mapping PDF straight into the IPLD data model instead sits one level up in the data structures: PDF is very fond of double-linking things — there are trees where children hold references back to their parent, doubly-linked lists, and cyclic lists. I have the feeling there is less of this in the more recent additions to the specification, but you cannot make a PDF document without a Pages tree, and that is doubly-linked.

Still, if you ever feel a need for test cases of in-the-wild data structures where a field sometimes is one kind of thing (say, a String) and other times another kind of thing (like a List of Strings, or a List mixing Strings and Maps, and there may at any point be an intermediate Link), then know this: there is plenty of that going on in PDF. Color spaces can be one reasonably self-contained, yet diverse, subsystem to look at for starters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant