Spike: Direct File CID #83

hopeyen · 2024-05-02T15:59:13Z

Problem statement

Consider removing File Manifest CID with direct File CID, using IPFS merkledag for proofs.

Expectation proposal

Clearly list out the benefit/costs of switching verification schemes

hopeyen · 2024-05-03T01:05:49Z

Partial verification is possible with HTTP range requests. The paths to get there is

Client first download the root block from provider, and client verifies it locally.

Either the file is small and then it's made of just one block, then client already downloaded the complete file.
If the file is big and root block is a list of hash and offsets, the root block indicates 0-999: Qmfoo 1000-1999: Qmbar.
For a range request 0-512 the verifier client only downloads Qmfoo which themselves can be root, or leaves or both.

Recursively verify intermediate blocks until getting to the leave block.
At the leave block, piece out the range requested, or simply take the entire block content.

Essentially, we must verify all blocks on the path from root to the leaves; the leaves are hash chunks of the actual data, whereas all nodes in the middle contains some kind of metadata.

The significance is that multiple exchanges between Prover (server) and Verifier (client) is required to verify a block to achieve be trustless concurrent/incrementally verifiable transfers. Number of exchanges between the prover and the verifier are $O(log_2(n))$ for checking 1 leave node when there are $n$ leave nodes; the size of chunks in a node is usually 512 bytes, though configurable.

Question arises:

if the server provides proof exchanges, should the server charge for those exchanges? "premium" trustless service for intermediate block proofs, or trusted service with posted hashes
does it make sense to post block structure (DAG) publicly on IPFS and share with the client? Excluding the leave nodes, as they contains the actual content. The memory requirement is roughtly $\frac{32n}{mc}$ bytes where $n$ is number of bytes of the content, $m$ bytes per chunk, and $c$ max children. This is a large memory requirement and merely a constant factor of reduction than the actual content. Also defeats the purpose of using direct File CID as the file containing DAG is shared.
should there be a third party role that handles verification for the client? how would pricing work, concensus, trust model defeats being trustless.

QmHash is CID v0, now IPFS is migrating to V1. These two formats seems bijective.
If we have QmHash, explore the Merkle Forest
ex. https://explore.ipld.io/#/explore/QmT5NvUtoM5nWFfrQdVrFtvGfKFmG7AHE8P34isapyhCxX
Docus on blocks: https://docs.ipfs.tech/how-to/work-with-blocks/
Rust CID: https://github.com/multiformats/rust-cid

hopeyen · 2024-05-09T18:09:49Z

Other crates
https://github.com/ipfs-rust/ipfs-embed for temporary recursive pins for building dags, preventing races with the garbage collector
https://github.com/dariusc93/rust-ipfs with examples in dag creation
https://github.com/dariusc93/ipfs-server built on rust-ipfs

hopeyen added type:spike Additional research required size:medium Medium p1 High priority labels May 2, 2024

hopeyen self-assigned this May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spike: Direct File CID #83

Spike: Direct File CID #83

hopeyen commented May 2, 2024

hopeyen commented May 3, 2024 •

edited

hopeyen commented May 9, 2024

Spike: Direct File CID #83

Spike: Direct File CID #83

Comments

hopeyen commented May 2, 2024

hopeyen commented May 3, 2024 • edited

hopeyen commented May 9, 2024

hopeyen commented May 3, 2024 •

edited