Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing support for Filecoin codecs #64

Open
lidel opened this issue Jul 21, 2020 · 11 comments
Open

Missing support for Filecoin codecs #64

lidel opened this issue Jul 21, 2020 · 11 comments
Labels
effort/days Estimated to take multiple days, but less than a week exp/intermediate Prior experience is likely helpful kind/enhancement A net-new feature or improvement to an existing feature need/analysis Needs further analysis before proceeding need/community-input Needs input from the wider community need/maintainers-input Needs input from the current maintainer(s) P1 High: Likely tackled by core team if no one steps up

Comments

@lidel
Copy link
Collaborator

lidel commented Jul 21, 2020

Problem

Even when we update to latest cids library, when user enters a CID with Filecoin-related codec, they get error because ipld-filecoin decoder does not exist:

2020-07-21--14-18-48

Test CIDs from multiformats/multihash#129 (comment):

baga6ea4seaqggjjfh7whhdoxvhrix6jbcgobmdyhajcimfn33iedcp3kr23gruq 
baga6ea4seaqidbk23bub2dmg2hur4aawpe44wzuu2lccflgsbcqaokjzjb7wtgi 
bagboea4b5abcax5zbow3g7cyeg3nsvjguqnjkbdibnhhzc3whinr2sousvoijbrb 
bagboea4b5abcb245dcsepbaelwd7hrt46itun2mvv5nckntzkg5kf73m2ry4ja7r

Solution

IPLD Explorer already supports Bitcoin and Ethereum (but we dont provide examples – see #62):

    "ipld-bitcoin": "^0.3.0",
    "ipld-ethereum": "^4.0.0",

I believe IPLD Explorer (included in IPFS WebUI/Desktop and on https://explore.ipld.io) should support Filecoin CIDs .. somehow. Even if it's one sentence with a link to Filecoin-specific tool.

Ref. https://specs.ipld.io/data-structures/filecoin/

@ribasushi @vmx @rvagg – were there any prior/ongoing discussions regarding creating ipld-filecoin ?

@ribasushi
Copy link

ribasushi commented Jul 21, 2020

@mikeal ^^ this goes to the previously discussed question "where do PL-provided tools stop". One could generate render a commP/D graph, though it would be of limited utility. A generic explorer for a sealed replica... I don't even see how to write something like that.

@lidel lidel transferred this issue from ipfs/ipld-explorer-components Jul 21, 2020
@lidel lidel added need/analysis Needs further analysis before proceeding need/community-input Needs input from the wider community need/maintainers-input Needs input from the current maintainer(s) need/triage Needs initial labeling and prioritization labels Jul 21, 2020
@rvagg
Copy link
Member

rvagg commented Jul 24, 2020

So, some context is important here re Filecoin, sorry for some of the weeds here, will try and get to the big picture:

CIDs with fil-commitment-unsealed and fil-commitment-sealed codecs point to the tips of very large merkle trees, in the commonly used sense of "merkle trees" where they are trees of combined hashes of levels below. fil-commitment-unsealed is strictly a binary tree and is done on top of the base stored IPLD data that's first packed into a CAR file, then zero-padded on the end until the size is a power-of-2 * (254/256) and then it's padded again by adding 2 bits per 254 bits such that the final result is an actual power-of-2 and makes a neat binary merkle tree. The merkle tree is built and the top spits out a fil-commitment-unsealed with a multihash of sha2-256-trunc254-padded (that's a SHA2-256 with 2 bits lopped off to account for these 2-bit holes). So this fil-commitment-unsealed w/ sha2-256-trunc254-padded CID that's taken off the top of that tree is "CommP" ("piece commitment") and is used for making deals and performing retrieval on the data. Unfortunately, it's not as useful as it might seem from an explorer POV because the "block" it points to is simply 2 concatenated hashes, the top-1 level of the binary merkle. These also could be represented as a CID and you could do this all the way to the bottom. But sadly we lack a clear signal that we are at the bottom unless a Filecoin miner that's storing these things tell us. And even then, you can't reverse all that padding unless you know the original length of the input CAR file.

Sectors in Filecoin are just a bunch of "pieces" together filling out whatever sector size is being used (64Gb?), and because they all form a binary merkle tree together, the tip of that sector of unsealed data also ends up having a CID with fil-commitment-unsealed w/ sha2-256-trunc254-padded. But this CID represents "CommD" ("data commitment").

Then we have sealed data, "CommR" (replica commitment), which comes out as fil-commitment-sealed w/ poseidon-bls12_381-a2-fc1. This data is even more obfuscated and unuseful outside of a miner and Poseidon doesn't use an easy to describe merkle tree structure, it's got different arity at different levels, or something like that (too much detail for me). But it also points to the next level of a merkle tree that's just a concatenation of hashes of the level below that. You may be able to traverse down but you probably won't find very interesting data there.

So the CIDs with these filecoin-* multicodecs in them aren't strictly IPLD, which is why we didn't label them as ipld in the multicodec table. Their utility as CIDs isn't so much that you can find a way to deserialize them, it's more about identifying that these things point to things in Filecoin and can be used for Filecoin-related intercourse.

If anyone reading this wants more gory details on this you can read the discussion at multiformats/multicodec#161 and multiformats/multicodec#172 and the other issues linked from there.

Implications

So what to do with these codecs in the IPLD explorer? I don't think there's too much we can do for now with them other than simply say what they are: "Filecoin Unsealed Commitment" and "Filecoin Sealed Commitment". Perhaps there will be (or is?) an API to query a miner or client about these at some point to get more information that we could display, but we can't derive anything very interesting from them on their own. I think we're still yet to see exactly how these CIDs will be thrown around publicly, if at all. Perhaps @travisperson could suggest whether there's interesting information to be queried and then we could figure out if we want to go to the trouble of building "query a miner" functionality into the IPLD explorer.

Filecoin chain data

So the other question that's related is the Filecoin chain data, like https://explore.ipld.io/#/explore/bafy2bzaceb72fu7v56lsi2l45624kgag4cw2shnbweiqndv47p7tgaw5rqox2 that @travisperson shared.

That is the cid of a block in the calibration networks chain at height 6544, you can see this one piece of information in index 7 of the cbor data.

The 5/* are links to the parent tipset
Index 8 is the parent state root, 9 is the parent message receipts, and 10 are the block messages

The Filecoin chain is all in DAG-CBOR, so the explorer can read it natively as it is. But without additional context it's difficult to know whether this information is from Filecoin or just some random thing that someone put into IPFS, it's just dag-cbor w/ sha2-256.

@olizilla says that we're already doing some funky shape-checking for DAG-PB nodes, so that raises the possibility that we could do the same with DAG-CBOR. We happen to have been building a tool just for this kind of purpose: IPLD Schemas. One of the strict goals of IPLD Schemas is that they be fast to validate, the kind of fast that you'd want if you were using them for a protocol, "does this chunk of data match without needing anything other than the schema and data?". So they have nothing that requires that links be followed, we have no dependent types or logic, just shape information (and some hints about what you're likely to find if you follow a link, but they are advisory).

There's some discussion in filecoin-project/specs#998 about bringing back IPLD Schemas to the Filecoin specs (there was a form of IPLD Schemas in an earlier iteration of these docs but it was ditched for plain Go structs). If we had all the schemas for the various blocks we could build a library in the IPLD Explorer and do quick validation as DAG-CBOR blocks are loaded to check whether they match the right shape or not. Of course there's a risk that a shape happens to also match someone else's data so we'd want to be careful about including ones that are too generic. Hopefully there are enough cases for the more complex blocks in the chain (🤞 the headers at least) that we could say with some high probability "this is a chunk of the Filecoin chain". If we also had a client, or a catalogue of the chain we may even be able to cross-reference and get information about where in the chain it is (height isn't encoded).

I started work on a validator with the JS IPLD Schema library but didn't get far enough for it to be useful. @mikeal made one too but I don't think it's complete enough to be useful yet either. But this might be the perfect use-case to get it all wired up and working, as long as we can get some solid IPLD Schemas written for Filecoin chain blocks.

@ribasushi
Copy link

I think we're still yet to see exactly how these CIDs will be thrown around publicly, if at all.

Fwiw commD/commR show up in the status of a sector

./lotus-miner sectors status 0
SectorID:	0
Status:		Proving
CIDcommD:	baga6ea4seaqdsvqopmj2soyhujb72jza76t4wpq5fzifvm3ctz47iyytkewnubq
CIDcommR:	bagboea4b5abcb3y33yn6e3dgnkudzbbroli6dy4sbraybz266svt64fciuitra2w
Ticket:		95182049a2a31467c9c8fb7b2e2b2e24c1475add885c993d078dbd899c0a3dae
TicketH:	886
Seed:		85c4b7669e1a69eb30b165db232164ac810099e3555a2d89ae7106c68e9e7a25
SeedH:		1966
Proof:		b836d164afd146d9401c05ad5ef7a0d49135c0666f91fef29134759a5549cb38cb051e82f2a7497b374a7b318fb5b113b5367f9e4ae352b7d2364bb549be874e516d3ffbeb2887870c83822771a7863127b0e77e86511509ccdc2fea0ee2d08f05df80ca8a776ae330a5c738af27fd3b0a09522ca0317e7e3f6efdf3861d93520f76269af243da3c050527a2f146ba128afa7ab81f0e93a0d1ab845fbbc1aef461b2872af93754b3faf178efeb253ea93c48224252cacd676b8d8da57d159d08
Deals:		[0]
Retries:	0

And commP of course shows up in deals:

./lotus-miner storage-deals list  | jq .[0].Proposal
{
  "PieceCID": {
    "/": "baga6ea4seaqiv2kvf3wxcqu5oexgezqta5e5hforo3vpkis5ceraijgxppwvwca"
  },
  "PieceSize": 131072,
  "VerifiedDeal": false,
  "Client": "t3qpw7piwiyqu3urgzzhirx3yr4rhijwzrjxf2qujuhyyaaucocmxipp6f43zgwb4ktxvmzxz6gf54yfwzssvq",
  "Provider": "t03360",
  "Label": "",
  "StartEpoch": 11988,
  "EndEpoch": 39588,
  "StoragePricePerEpoch": "2000000",
  "ProviderCollateral": "130048",
  "ClientCollateral": "0"
}

@lidel
Copy link
Collaborator Author

lidel commented Aug 6, 2020

Thank you for the context @rvagg and @ribasushi !
Makes it much easier for us to reason about the UX we can (should) deliver in IPLS Explorer.

Namely, I agree with:

So what to do with these codecs in the IPLD explorer? I don't think there's too much we can do for now with them other than simply say what they are: "Filecoin Unsealed Commitment" and "Filecoin Sealed Commitment".

@jessicaschilling @rafaelramalho19
tldr here is that we want IPLD Explorer to show meaningful info when someone pastes a CID produced by Filecoin, instead of an error. This feels like a low-hanging fruit that we could ship in time for Filecoin launch under the "onboarding improvements" umbrella. Some thoughts below.

In situation where we are unable to traverse the DAG atm (like for codecs mentioned above), we could add a second, simpler visualization mode, where instead of object explorer and DAG visualization there is just human-readable name, maybe a one-paragraph description and clickable link to learn more.

If we do that, there should be a static map where people can PR name+url mapping for custom codecs.
On top of link to codec-specific docs, we could support optional customExplorerUrl like https://filscout.io/en/pc/tipset/t_detail?hash=%s (example)

That way people are pointed at codec-specific explorer such as , decreasing the scope of what we need to support in IPLD Explorer while also making it more useful to users.

Thoughts?

@jessicaschilling
Copy link
Contributor

@lidel If I follow correctly, something like this?
image

@lidel
Copy link
Collaborator Author

lidel commented Aug 7, 2020

@jessicaschilling Yes, exactly what I had in mind! 👌

@rvagg @ribasushi
Is there an official "Filecoin Explorer" we could use?
Are there docs at https://docs.filecoin.io we could link to?
(did a quick search but failed to find "sealed commitment" page)

@ribasushi
Copy link

@lidel honestly the best doc I've seen about this is https://proto.school/verifying-storage-on-filecoin/03

@jessicaschilling
Copy link
Contributor

Quick note on color - for sealed commitments let's use official Filecoin blue #0090FF, for unsealed #39C1CB (the light stop on the official gradient; at present there's no secondary colors in the Filecoin brand guide).

@jessicaschilling
Copy link
Contributor

@ribasushi Are you still able to provide a replacement for that lipsum text? 😊 Thanks!

@jessicaschilling
Copy link
Contributor

ps - fwiw in the sketch above, I was reading "Filecoin explorer" as going to the filscout.io page @lidel linked to in his comment earlier.

@jessicaschilling jessicaschilling added exp/intermediate Prior experience is likely helpful effort/days Estimated to take multiple days, but less than a week kind/enhancement A net-new feature or improvement to an existing feature P1 High: Likely tackled by core team if no one steps up and removed need/triage Needs initial labeling and prioritization labels Oct 23, 2020
@SgtPooki
Copy link
Contributor

This work would need to be done in ipld-explorer-components, and shoul be trivial to do now if there are JS libraries for the codecs & hashers.

Currently, we get a friendly error message indicating when things are not available:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/days Estimated to take multiple days, but less than a week exp/intermediate Prior experience is likely helpful kind/enhancement A net-new feature or improvement to an existing feature need/analysis Needs further analysis before proceeding need/community-input Needs input from the wider community need/maintainers-input Needs input from the current maintainer(s) P1 High: Likely tackled by core team if no one steps up
Projects
None yet
Development

No branches or pull requests

5 participants