Skip to content

Commit

Permalink
Add some format documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
unknownbrackets committed Nov 1, 2014
1 parent cc55a61 commit e6631c0
Show file tree
Hide file tree
Showing 4 changed files with 143 additions and 3 deletions.
9 changes: 6 additions & 3 deletions README.md
Expand Up @@ -22,7 +22,7 @@ Features
* Processes multiple files in one command.
* Can take a CSO or DAX file as a source.
* Able to output at larger block sizes.
* Support for experimental cso formats using [lz4][] (faster decompression)
* Support for experimental [CSO v2][] and [ZSO][] formats using [lz4][] (faster decompression.)
* Tuning of deflate or lz4 compression threshold.


Expand Down Expand Up @@ -97,7 +97,7 @@ Platforms

maxcso has only been tested on Windows so far. The code was written to be portable, however.
If you'd like to port it to another platform, pull requests are accepted. It may just compile
out of the box with a Makefile or similar.
out of the box with a Makefile or similar, but 7-zip is probably the biggest problem.

### Windows

Expand All @@ -116,6 +116,7 @@ libraries. Licensing is as follows:
* [Zopfli][] is licensed under Apache 2.0.
* [libuv][] is licensed under MIT.
* [zlib][] is licensed under zlib.
* [lz4][] is licensed under BSD.


Other tools
Expand All @@ -136,4 +137,6 @@ Other tools
[CisoMC]: http://wololo.net/talk/viewtopic.php?f=20&t=32659
[ciso]: http://sourceforge.net/projects/ciso/
[ciso-python]: http://virtuousflame.blog.163.com/blog/static/177177172201111833413485/
[lz4]: https://code.google.com/p/lz4/
[lz4]: https://code.google.com/p/lz4/
[CSO v2]: README_CSO.md
[ZSO]: README_ZSO.md
83 changes: 83 additions & 0 deletions README_CSO.md
@@ -0,0 +1,83 @@
CSO format
===========

The original CSO format was created by BOOSTER.

This document includes an experimental v2 format of CSO, proposed by Unknown W. Brackets.


Overview
===========

A CSO file consists of a file header, index section, and data section.

Typically, the file extension .cso is used.


Format (version 1)
===========

The header is as follows (little endian):

char[4] magic; // Always "CISO".
uint32_t header_size; // Does not always contain a reliable value.
uint64_t uncompressed_size; // Total size of original ISO.
uint32_t block_size; // Size of each block, usually 2048.
uint8_t version; // May be 0 or 1.
uint8_t index_shift; // Indicates left shift of index values.
uint8_t unused[2]; // May contain any values.

Following that are index entries, which are each a uint32_t (little endian). The number of
index entries can be found by taking `ceil(uncompressed_size / block_size) + 1`.

The lower 31 bits of each index entry, when shifted left by `index_shift`, indicate the
position within the file of the block's compressed data. The length of the block is the
difference between this entry's offset and the following index entry's offset value.

Note that this size may be larger than the compressed or uncompressed data, if `index_shift` is
greater than 0. The space between blocks may be padded with any byte, but NUL is recommended.

Note also that this means index entries must be incrementing. Reordering or deduplication of
blocks is not supported.

The high bit of the index entry indicates whether the block is uncompressed.

When compressed, blocks are compressed using the raw [deflate][] algorithm, with window size
being 15 (when using zlib, specify -15 for no zlib header.)

The final index entry indicates the end of the data segment and normally EOF.


Format (version 2)
===========

The header is more strictly defined:

char[4] magic; // Always "CISO".
uint32_t header_size; // Must always be 0x18.
uint64_t uncompressed_size; // Total size of original ISO.
uint32_t block_size; // Size of each block.
uint8_t version; // Must be 2.
uint8_t index_shift; // Indicates left shift of index values.
uint8_t unused[2]; // Must be 0.

The index data follows the same format as version 1, but the interpretation of the size and high
bit is handled differently.

In version 2, when the length of a compressed block (that is, the difference between two index
entry offset values) is >= `block_size`, the block must not be compressed.

Note again that when `index_shift` is greater than 0, the size may include additional padding.
If the compressed size plus this padding would result in `block_size` or more bytes, the data
must not be compressed (or decompressed.) This won't result in any observed file size
difference, because the padding would have been wasted bytes anyway.

When the size of the compressed block is less than `block_size`, the data is always compressed.
The high bit of the index entry indicates which compression method has been used. When it is
set, the data is compressed with [lz4][], otherwise it is compressed with [deflate][].

The final index entry must not have the high bit set.


[lz4]: https://code.google.com/p/lz4/
[deflate]: https://www.ietf.org/rfc/rfc1951.txt
53 changes: 53 additions & 0 deletions README_ZSO.md
@@ -0,0 +1,53 @@
ZSO format
===========

Please note that this format is not final, and is experimental.

This format has been proposed by [codestation][] in a patch to [procfw][].


Overview
===========

The general format is the same as the CSO v1 format. It consists of a file header, index
section, and data section.

Unlike the original CSO format, blocks are compressed using [lz4][] rather than [deflate][].
Additionally, the magic bytes differ (ZISO), and the preferred extension is "zso".


Format
===========

The header is as follows (little endian):

char[4] magic; // Always "ZISO".
uint32_t header_size; // Always 0x18.
uint64_t uncompressed_size; // Total size of original ISO.
uint32_t block_size; // Size of each block, usually 2048.
uint8_t version; // Always 1.
uint8_t index_shift; // Indicates left shift of index values.
uint8_t unused[2]; // Always 0.

Following that are index entries, which are each a uint32_t (little endian). The number of
index entries can be found by taking `ceil(uncompressed_size / block_size) + 1`.

The lower 31 bits of each index entry, when shifted left by `index_shift`, indicate the
position within the file of the block's compressed data. The length of the block is the
difference between this entry's offset and the following index entry's offset value.

Note that this size may be larger than the compressed or uncompressed data, if `index_shift` is
greater than 0. The space between blocks may be padded with any byte, but NUL is recommended.

Note also that this means index entries must be incrementing. Reordering or deduplication of
blocks is not supported.

The high bit of the index entry indicates whether the block is uncompressed.

The final index entry indicates the end of the data segment and normally EOF.


[codestation]: https://github.com/codestation
[procfw]: https://code.google.com/p/procfw/
[lz4]: https://code.google.com/p/lz4/
[deflate]: https://www.ietf.org/rfc/rfc1951.txt
1 change: 1 addition & 0 deletions src/input.cpp
Expand Up @@ -375,6 +375,7 @@ bool Input::DecompressSectorDeflate(uint8_t *dst, const uint8_t *src, unsigned i
}

bool Input::DecompressSectorLZ4(uint8_t *dst, const uint8_t *src, int dstSize, std::string &err) {
// Must use fast, because we don't know the size of the input data. It could include padding.
if (LZ4_decompress_fast(reinterpret_cast<const char *>(src), reinterpret_cast<char *>(dst), dstSize) < 0) {
err = "LZ4 decompression failed.";
return false;
Expand Down

0 comments on commit e6631c0

Please sign in to comment.