New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSO vs CHD #73
Comments
Given the quality and accuracy of information about CHD (and what it supports or does), it wouldn't surprise me if it was just using lzma after all regardless of what this info tool says. I'll also note that it's using larger block sizes than you're using with CSO - at least 10x larger, which will lead to better compression. You should try It might be that there are duplicate blocks in the ISO and that CHD is detecting and reusing these blocks, which CSO as a format doesn't support (but could - of course, clients of it would need to change.) A zip file might not handle this situation well if the blocks were far apart. That said, I've heard (and seen from a couple tests of different ISOs; I don't own the Japanese release of that game) that most of the time, a CSO with larger block sizes compresses about as well. Anyway, this doesn't really sound like an issue about maxcso. -[Unknown] |
For chd, I did not use the default parameter compression, because the default compression would have lzma encoding involved. If lzma is involved, the overall file size will be smaller.and encoding slower. |
No, the default block size is 2048 for files smaller than 2GB, such as this one. If you got a file smaller by 9% with default settings, it was some other parameter. Also, your understanding that PSP ISO files contain error correction bits is incorrect. That's only for PSP CD ISOs, in MODE2, which -[Unknown] |
By reading the comments in the code, I finally understood that “Copy from self” does not directly copy the data block without compression, but checks the data blocks during compression, and logically copies the same data block when it has the same hash as it had before, and do not taking up storage space more. This function has a significant improvement in compression efficiency for multiple identical files or parts of files in an image. This is also the reason why in this sample, the chd compression efficiency can significantly exceed cso, and 7z deflate. When this condition is not met, chd compression efficiency is not much higher than cso. So, can this simple checking duplicate blocks in compression be also introduced into cso? |
By reading the comments in the code, I finally understood that “Copy from self” does not directly copy the data block without compression, but checks the data blocks during compression, and logically copies the same data block when it has the same hash as it had before, and do not taking up storage space more. This function has a significant improvement in compression efficiency for multiple identical files or parts of files in an image. This is also the reason why in this sample, the chd compression efficiency can significantly exceed cso, and 7z deflate. When this condition is not met, chd compression efficiency is not much higher than cso. So, can this simple checking duplicate blocks in compression be also introduced into cso? |
The CSO that is supported by various tools, as a format, doesn't support that. A new experimental CSO format (i.e. like CSOv2) could be created to do that though, yes. Software would need to be updated to support it (much like software would need to be updated if they added new features to CHD, or even PNG, or any other format.) There are more tricks a new format could use. Most compression formats have a minimum overhead of at least a few bytes, so zero-sized and 1-byte sized could have special meaning. A four byte sized block could indicate a reference to another block. Zstd could be used fairly trivially. The trickiest thing is to decide if (and how precisely) a dictionary should be used to improve compression. This could all be done while maintaining decompression speed and keeping blocks and sectors aligned, for efficiency. PPSSPP already uses zstd, for example, so it wouldn't even add much to support such a format. Anyway, it hasn't been high on my priority list as I'm usually investigating specific behaviors of the PSP and making PPSSPP's emulation more accurate. Adding a new variant of CSO (which wouldn't be supported on PSP or PS2 hardware, likely) and the confusion that might cause makes me more likely to just work on PPSSPP instead with the time I have. -[Unknown] |
Can you update the compression libs of maxcso ? The existing libs are too old. Some new version libs increase the efficiency of compression |
They were updated in 2021 - DEFLATE is a stable format and lz4 hasn't really changed much (I think ARM64 decode has gotten faster, but that matters for the decoder - maxcso would compress the same files either way.) As noted, switching to a different compression alogrithm (i.e. zstd) wouldn't "just work" - it'd create a new version of the format that other tools would have to support. -[Unknown] |
Test sample: Jeanne d'Arc (Japan).iso 1,245,249,536 Byte
Compressed to Jeanne d'Arc (Japan).zip using 7zip, deflate : 700,926,052 Byte
Compressed to Jeanne d'Arc (Japan).cso using maxcso, default parameters:752,663,981 Byte
Because the address seeking information is added during cso compression,So it's a bit bigger, which is reasonable
Compressed to CHD using parameters:-c cdzl,cdfl : 486,284,966 Byte
Do not use lzma encoding, only deflate is used to create the same conditions as the above two opponents to ensure fair comparison
CHD can also address seeking, the size is much smaller than the previous two, even far smaller than the direct zip compression of the file. how could it be?
All three are lossless compression, and after decompression, they are all the same as the original file checksum.
The text was updated successfully, but these errors were encountered: