Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: compression as part of the CARv2 standard #288

Open
aatifsyed opened this issue Jul 3, 2023 · 6 comments
Open

Discussion: compression as part of the CARv2 standard #288

aatifsyed opened this issue Jul 3, 2023 · 6 comments

Comments

@aatifsyed
Copy link

aatifsyed commented Jul 3, 2023

Hello 馃憢 I'm coming from forest, a filecoin node implementation by ChainSafe, funded by the Filecoin Foundation, and I saw the current draft of CARv2.
I'd like to open a discussion for CARv2 being compression-aware.

We recently added support for randomly accessing CID data in an uncompressed CAR file: ChainSafe/forest#3085.
We do this by creating an index up front, and seeking to offsets in the file, which seems to rhyme with the goals for CARv2:

CARv2 is a minimal upgrade to the [CARv1](../carv1/) format with the primary aim of adding an optional index within the format for fast random-access to blocks.

See also docs for this feature

We publish zstd-compressed CAR "snapshots", as do lotus, the reference filecoin implementation.
Compression ratios hover around 2x for typical "calibnet" snapshots today:

                      567120_2023_05_17T16_13_00Z.car     | 2.2G 
                      567120_2023_05_17T16_13_00Z.car.zst | 1.2G 
forest_snapshot_calibnet_2023-06-29_height_690463.car     | 2.7G 
forest_snapshot_calibnet_2023-06-29_height_690463.car.zst | 1.5G 

Our users will care about:

  • The space their snapshot takes up on disk (compression)
  • The usability of their snapshots (random access)

We believe it's possible, and desirable to integrate the two, and will be working on this.
However, we think CARv2 will be a perfect opportunity to create a standard here.

Are you open to inclusion of this in CARv2? Happy to work on an implementation and proposal for the above!

@willscott
Copy link
Member

Thanks for raising

How do you imagine compression / access working?

Would you imagine compression being applied per-block, or would you imagine a CarV2 index that's able to provide offsets / hints for the zst-compressed car?

My understanding of the challenge here is that for most compression algorithms, once the data is compressed, there isn't an efficient way to read specific bytes of the underlying data, because the dictionary/compression table is built up dynamically from that earlier in the stream.

I don't think there's an opposition to describing and standardizing a mechanism for compression of car data.

@aatifsyed
Copy link
Author

I'm PoC-ing an implementation with the following:

  • Blocks are stored in CompressedGroups, which are (de)compressed at once
  • The index maps CIDs to a group and index into that group
struct Index {
    index: HashMap<Cid, Location>,
    compression_type: u64,
    roots: Vec<Cid>
}

struct Location {
    compressed_group_offset_into_file: u64,
    block_index_in_group: u32,
}

struct CompressedGroup {
    blocks: Vec<Block>,
}

struct Block {
    cid: Cid,
    data: Vec<u8>,
}  

My understanding of the challenge here is that for most compression algorithms, once the data is compressed, there isn't an efficient way to read specific bytes of the underlying data, because the dictionary/compression table is built up dynamically from that earlier in the stream.

I think you're right - an alternative implementation might try and be clever with e.g zstd frames, and making sure that they line up with block boundaries in the CAR, but if there is going to be a new format anyway, we can be more principled about it.

What would you like to see from a PoC here?

@willscott
Copy link
Member

High level points:

  • this feels like a different format that than what gets called 'car' today
  • It's unclear this would be the format that one would want to transfer "on the wire" - since one could likely get better overall compression by sending an uncompressed car file over HTTP with transport-level compression.
  • the logic for selecting which blocks are grouped into which CompressionGroup for good results seems potentially non-trivial
  • For storing at rest, why would one want this mechanism, versus flatfs where each block is a file, and compression and deduplication are left to the file system? e.g. if one were storing on the car data in such a way on a zfs-compressed file system I would expect it to do okay.

@aatifsyed
Copy link
Author

Thanks for the feedback!

  • It's unclear this would be the format that one would want to transfer "on the wire" - since one could likely get better overall compression by sending an uncompressed car file over HTTP with transport-level compression.

I think this warrants some investigation - I'm going to see what compression ratios if we e.g compress each block vs compress the entire file

  • the logic for selecting which blocks are grouped into which CompressionGroup for good results seems potentially non-trivial

I think this is obviously true

I didn't know about flatfs - I'll have a look at that too! I think we want to save newer users from e.g setting up ZFS, which is why a one file approach is appealing

I'll come back with some results - perhaps per-block compression, or none at all

@aatifsyed
Copy link
Author

To close the loop on this, Forest is now using .forest.car.zst files, where we basically follow the above.

$ du forest_snapshot* --human-readable
3.6G    forest_snapshot_calibnet_2023-10-02_height_964087.car
2.0G    forest_snapshot_calibnet_2023-10-02_height_964087.car.zst
2.3G    forest_snapshot_calibnet_2023-10-02_height_964087.forest.car.zst

There is more information in the forest documentation

@rvagg
Copy link
Member

rvagg commented Oct 10, 2023

I think maybe this should be titled "CARv3" since CARv2 is a thing and is pretty fixed. Unfortunately the characteristics array at the start doesn't really give us much room to manoeuvre wrt the byte layout of data so this is very unlikely to be something we can fit into the current design.

But, a couple of thoughts that are related to this proposal: first, it's my belief that any CARv3, or similar effort, should consider a divergence between the on-disk blockstore and/or archival friendly format and the over-the-wire transport format. We effectively have that today in fact, with CARv2 being the former and CARv1 being the latter; but we're in the unfortunate position of there being significant confusion (even in our code) about what should be used for what purpose with people who don't want to care about such things just choosing the higher number because that's obviously going to be better! So a proposal for compression might be worthwhile considering for an on-disk format and we just isolate it from the transport format where it likely makes more sense to rely on compression at other layers.

Having said that, one of the proposals (by @hannahhoward) for a CARv3 that's particularly transport-oriented introduces the possibilities of each of what we call "sections" today would also carry an additional indicator that tells you what type of section it is, which may be a standard cid:block section, or it may be something else relevant to the transport of the data (attestation block, error message, "skipme/missing", BAO block, etc. in such a format you could even make "header", "trailer" and perhaps even "index" one of these types to capture current CARv1 & 2 functionality). Such a format may lend itself to a compression format where you benefit more by grouping many blocks together and need to make compromise decisions about how much to group (e.g. when streaming you probably don't want to retain too many blocks while calculating but maybe send an indicator: "next 10 blocks form a compression group").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants