Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chdman] some Dreamcast CHDs are missing some data after going back to CUE/BIN #11903

Closed
alucryd opened this issue Jan 4, 2024 · 22 comments · Fixed by #11913
Closed

[chdman] some Dreamcast CHDs are missing some data after going back to CUE/BIN #11903

alucryd opened this issue Jan 4, 2024 · 22 comments · Fixed by #11913

Comments

@alucryd
Copy link
Contributor

alucryd commented Jan 4, 2024

MAME version

0.261

System information

Arch Linux

INI configuration details

No response

Emulated system/software

No response

Incorrect behaviour

Some Redump dumps of Dreamcast games are missing data after being converted to CHD, then back to CUE/BIN.

For instance, WWF Royal Rumble (USA) is 352800 bytes smaller after going back and forth. Haven't yet tried to see if the BIN is still usable despite the missing piece.

Other games like Ecco the Dolphin - Defender of the Future (USA) (En,Fr,De,Es) are fine. Can't spot any obvious difference between a working game and a non-working one, at least not by looking at the CUE sheet alone.

Expected behaviour

CHDs converted back to CUE/BIN should be a perfect match to the original files.

Steps to reproduce

  • Get your hands a on a Redump dump of WWF Royal Rumble (USA)
  • Convert it to CHD
  • Convert it back to CUE/BIN (make sure to pass both -o and -ob to separate the TOC)
  • Observe that the resulting BIN is 352800 bytes smaller than the sum of all source BINs

Additional details

No response

@alucryd
Copy link
Contributor Author

alucryd commented Jan 4, 2024

Looks like the missing data is a padding of 352800 0 at the start of track 2. Hope this helps.

Edit: Confirmed, adding the missing zeroes at the start of track 2 restores the file to be an exact match to the source.

@987123879113
Copy link
Contributor

I checked it out just to make sure it wasn't a possible more widespread issue but it seems to just be an issue with the GDROM-related code which I don't have any knowledge or experience with.

This stood out to me at a quick glance though. The header for parse_gdicue specifically states it changes the layout of Redump cue/bin to match a TOSEC gdi. This could be what you're seeing.

mame/src/lib/util/cdrom.cpp

Lines 2591 to 2592 in 7aca06f

* TOSEC layout is preferred and this code adjusts the TOC and INFO generated by a Redump .cue to match the
* layout from a TOSEC .gdi.

mame/src/lib/util/cdrom.cpp

Lines 2917 to 2954 in 7aca06f

/*
* Strip pregaps from Redump tracks and adjust the LBA offset to match TOSEC layout
*/
for (trknum = 1; trknum < outtoc.numtrks; trknum++)
{
uint32_t prev_pregap = outtoc.tracks[trknum-1].pregap;
uint32_t prev_offset = prev_pregap * (outtoc.tracks[trknum-1].datasize + outtoc.tracks[trknum-1].subsize);
uint32_t this_pregap = outtoc.tracks[trknum].pregap;
uint32_t this_offset = this_pregap * (outtoc.tracks[trknum].datasize + outtoc.tracks[trknum].subsize);
if (outtoc.tracks[trknum-1].pgtype != CD_TRACK_AUDIO)
{
// pad previous DATA track to match TOSEC layout
outtoc.tracks[trknum-1].frames += this_pregap;
outtoc.tracks[trknum-1].padframes += this_pregap;
}
if (outtoc.tracks[trknum-1].pgtype == CD_TRACK_AUDIO && outtoc.tracks[trknum].pgtype == CD_TRACK_AUDIO)
{
// shift previous AUDIO track to match TOSEC layout
outinfo.track[trknum-1].offset += prev_offset;
outtoc.tracks[trknum-1].splitframes += prev_pregap;
}
if (outtoc.tracks[trknum-1].pgtype == CD_TRACK_AUDIO && outtoc.tracks[trknum].pgtype != CD_TRACK_AUDIO)
{
// shrink previous AUDIO track to match TOSEC layout
outtoc.tracks[trknum-1].frames -= prev_pregap;
outinfo.track[trknum-1].offset += prev_offset;
}
if (outtoc.tracks[trknum].pgtype == CD_TRACK_AUDIO && trknum == outtoc.numtrks-1)
{
// shrink final AUDIO track to match TOSEC layout
outtoc.tracks[trknum].frames -= this_pregap;
outinfo.track[trknum].offset += this_offset;
}
}

@aguyfromuranus
Copy link

Don't know if this is related, but I get a similar issue with sa1 and sa2, as I see that the hashes of extracted split bins (using binmerge --split) don't match. Apparantely these same bins are hash perfect when using redump's gdi sheet instead of cue as createcd input, so it clearly has something to do with cue specifically

@alucryd
Copy link
Contributor Author

alucryd commented Jan 5, 2024

Thanks for the quick reply, it's a feature then. That's unfortunate in my case because the reverted files won't verify against the Redump database so I will need to "repair" the files after reversing them. I'll try to reach out to Redump so I can get more input before working on something.

@tjanas
Copy link

tjanas commented Jan 5, 2024

CHD does not support preservation of ISRC values that may be present in a cuesheet, such as those present in the audio tracks for Ghost Blade.

http://redump.org/disc/70116/

As a result, CHD isn’t truly a lossless preservation of CD-based media. Furthermore, I believe the game itself depends on these ISRC values as a means of copy-protection.

CHD is also lacking in that it may not fully preserve CD tracks that may contain multiple indexes within a single track, and other attributes that may be represented in a cuesheet.

@tjanas
Copy link

tjanas commented Jan 5, 2024

flyinghead/flycast#906

@tjanas
Copy link

tjanas commented Jan 5, 2024

@rb6502
Copy link
Contributor

rb6502 commented Jan 5, 2024

@tjanas None of those things have anything to do with what is being discussed here.

@alucryd
Copy link
Contributor Author

alucryd commented Jan 5, 2024

I was under the impression that CHD was a lossless format (at least for what is supported in v5), but that deliberate stripping says otherwise indeed.

Redump confirmed they were intentionally keeping the 150 sectors gap for the sake of accuracy, and because they sometimes include actual data, not just zeroes.

It would be nice if chdman could preserve them as well, interested in the rationale behind the TOSEC preference, losing data intentionally sounds counter-intuitive. Maybe the fact that it can contain data wasn't known at the time of writing.

@tjanas
Copy link

tjanas commented Jan 5, 2024

TOSEC doesn't contain MIL-CD based games in any of its dats as far as I am aware. TOSEC predates Redump and has looser standards than Redump. Also, the scope of Redump is limited to video game optical media, while TOSEC also includes magnetic media, digital-only media, etc.

The issue with CHD is not limited to Dreamcast discs but others as well (such as audio CDs). It also has a challenge with Atari Jaguar CDs (those are multisession discs where the data is mastered as redbook audio tracks). CHD does not preserve the multisession structure or even the DCP flags from the cuesheet.

Example: http://redump.org/disc/74613/
Using chdman from mame0261 with the above redump bin/cue:

REM SESSION 01
FILE "Space Ace (USA) (Track 1).bin" BINARY
  TRACK 01 AUDIO
    FLAGS DCP
    INDEX 01 00:00:00
REM SESSION 02
FILE "Space Ace (USA) (Track 2).bin" BINARY
  TRACK 02 AUDIO
    FLAGS DCP
    INDEX 01 00:00:00
FILE "Space Ace (USA) (Track 3).bin" BINARY
  TRACK 03 AUDIO
    FLAGS DCP
    INDEX 00 00:00:00
    INDEX 01 00:01:74
FILE "Space Ace (USA) (Track 4).bin" BINARY
  TRACK 04 AUDIO
    FLAGS DCP
    INDEX 00 00:00:00
    INDEX 01 00:01:74
FILE "Space Ace (USA) (Track 5).bin" BINARY
  TRACK 05 AUDIO
    FLAGS DCP
    INDEX 00 00:00:00
    INDEX 01 00:01:74

Cuesheet generated from CHD extractcd:

FILE "Space Ace (USA).bin" BINARY
  TRACK 01 AUDIO
    INDEX 01 00:00:00
  TRACK 02 AUDIO
    INDEX 01 00:46:38
  TRACK 03 AUDIO
    INDEX 00 00:52:36
    INDEX 01 00:54:35
  TRACK 04 AUDIO
    INDEX 00 47:42:36
    INDEX 01 47:44:35
  TRACK 05 AUDIO
    INDEX 00 47:50:33
    INDEX 01 47:52:32

For the sake of preservation and accuracy, keeping an original redump-verified bin/cue is recommended vs. only keeping a derived CHD.

To be clear, Redump bin/cue isn't perfect for all game-play purposes; it is simply the best available CD disc image format that is deterministic with reproduceable dumps for datting. Some PC-based discs that use copy-protection methods like SecuROM v4, StarForce v3.x, etc may need mds/mdf dumps with highly-accurate DPM capture, which is beyond the scope of both CHD and redump bin/cue. Also, it is not perfect for CD+G audio CDs, since the CD+G subchannel instructions are not captured by that format (something like CloneCD ccd/img/sub would be more appropriate, but it is currently impossible to have a deterministic sub dump).

@rb6502
Copy link
Contributor

rb6502 commented Jan 5, 2024

I will say again louder: as the person who created the optical media support in CHDMAN, it was not intended to be a 100% archival format, just that you can roundtrip the data for common arcade CDs, including Naomi (when not converting to a different format like importing GDI and exporting bin/cue or something).

The plan for CHDv6 was that CHD would be a compressed wrapper around the AARU (formerly DiscImageChef) universal format and that we would immediately gain multisession and a lot of other support. Unfortunately the "libaaru" library that would enable that is (completely understandably) not Claunia's priority so it seems like that's not happening.

So PRs to improve the current situation would be great.

@tjanas
Copy link

tjanas commented Jan 5, 2024

Not sure if Redumper has library functionality that would be better suited than AARU for optical discs?

https://github.com/superg/redumper

@rb6502
Copy link
Contributor

rb6502 commented Jan 5, 2024

I don't immediately see any kind of library API in Redumper, and we are trying to avoid GPLv3.

@alucryd
Copy link
Contributor Author

alucryd commented Jan 5, 2024

Managed to patch chdman so that it recreates matching files, will submit a PR tomorrow. Hopefully it's acceptable even if it's not TOSEC compliant.

@alucryd
Copy link
Contributor Author

alucryd commented Jan 6, 2024

There you go: #11913

I don't speak C++ but the changes were straightforward. Verified working on a couple Redump CUE/BIN.

@TheRealGusBus
Copy link

It seems chdman now throws an error when compressing bin/cues with more than ~3 tracks. Verified on the Redump versions of "4x4 Evo (USA)" and "102 Dalmatians - Puppies to the Rescue (UK)"

@TomTurbine
Copy link

Not sure if you should close this one just yet.

Ready 2 Rumble Boxing (USA) (RE)
Tee Off (USA)

They do not decompress into the same things that went back in.

@987123879113
Copy link
Contributor

@TomTurbine Can you elaborate? I just tested using chdman from latest master and everything works as expected. The same data that went in comes back out as can be seen in the SHA-1 sums of the extracted .bin compared to a file made up of the combined .bins of the separate tracks.

Ready 2 Rumble Boxing (USA) (RE):

>./chdman createcd -i Ready\ 2\ Rumble\ Boxing\ \(USA\)\ \(RE\).cue -o ready2rumble_usa_re.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.263 (mame0252-3947-gc2c61bf29c3)
Output CHD:   ready2rumble_usa_re.chd
Input file:   Ready 2 Rumble Boxing (USA) (RE).cue
Input tracks: 7
Input length: 122:02:00
Compression:  cdlz (CD LZMA), cdzl (CD Deflate), cdfl (CD FLAC)
Logical size: 1,344,343,680
Compression complete ... final ratio = 28.0%

>./chdman extractcd -i ready2rumble_usa_re.chd -o ready2rumble_usa_re.cue
chdman - MAME Compressed Hunks of Data (CHD) manager 0.263 (mame0252-3947-gc2c61bf29c3)
Output TOC:   ready2rumble_usa_re.cue
Output Data:  ready2rumble_usa_re.bin
Input CHD:    ready2rumble_usa_re.chd
Warning: extracting GD-ROM CHDs as bin/cue is not fully supported and will result in an unusable CD-ROM cue file.
Extraction complete

>sha1sum -b Ready\ 2\ Rumble\ Boxing\ \(USA\)\ \(RE\).cue ready2rumble_usa_re_source.bin ready2rumble_usa_re.bin
4bcb43cb73c46077a0fb9a410cd38a49590b2ccb *Ready 2 Rumble Boxing (USA) (RE).cue
40b92e81e906c9e6f382fa6d3471eb2fecc480f2 *ready2rumble_usa_re_source.bin
40b92e81e906c9e6f382fa6d3471eb2fecc480f2 *ready2rumble_usa_re.bin

Tee Off (USA):

>./chdman createcd -i Tee\ Off\ \(USA\).cue -o teeoff.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.263 (mame0252-3947-gc2c61bf29c3)
Output CHD:   teeoff.chd
Input file:   Tee Off (USA).cue
Input tracks: 3
Input length: 122:02:00
Compression:  cdlz (CD LZMA), cdzl (CD Deflate), cdfl (CD FLAC)
Logical size: 1,344,333,888
Compression complete ... final ratio = 27.2%

>./chdman extractcd -i teeoff.chd -o teeoff.cue
chdman - MAME Compressed Hunks of Data (CHD) manager 0.263 (mame0252-3947-gc2c61bf29c3)
Output TOC:   teeoff.cue
Output Data:  teeoff.bin
Input CHD:    teeoff.chd
Warning: extracting GD-ROM CHDs as bin/cue is not fully supported and will result in an unusable CD-ROM cue file.
Extraction complete

>sha1sum -b Tee\ Off\ \(USA\).cue teeoff_source.bin teeoff.bin
a86d22eb15a1157770ea310329d7dcc0ea40a7d4 *Tee Off (USA).cue
410f583bf39d89a6be0eab59f9dbc3d07f6c1e66 *teeoff_source.bin
410f583bf39d89a6be0eab59f9dbc3d07f6c1e66 *teeoff.bin

@TomTurbine
Copy link

Not sure what to say, can try again later maybe I just know I compressed them from the ReDump verified version to CHD. I tried using the GDI files, then I tried to verify with NKit and it would fail for those 2 games, so I extracted back to cue file and that failed too. Used the GDI files because I know CHDMan was giving issues using the straight cue sheet not too long ago.

Don't know the technical details, just know I tried that. Sorry I can't be of much more help.

Looking at your log, you used the cue sheet, I take it the issues with CHDMan and Dreamcast using the cue sheets instead of GDI files have been resolved?

@987123879113
Copy link
Contributor

987123879113 commented Mar 18, 2024

Extracting Dreamcast games back out to .bin/.cue is broken still (chdman gives a warning since it doesn't generate a valid Dreamcast cue, so it has no chance of verifying at all), but chdman no longer discarding data from the Redump input when creating the CHDs so it's possible to restore it in the future.

I covered some of this in my comment on my PR: #12087 (comment)

If extracting the Dreamcast CHDs back into a format that can be verified against Redump is important to you then I don't recommend using CHDs to store your Dreamcast games for now until Dreamcast .cue exporting is properly implemented. If you create a CHD using Redump .gdi and then extract back into .gdi then you should get fully matched data (the .gdi will have different formatting, but it should have all of the same information as the original Redump .gdi). I wouldn't recommend using Redump .gdis to create CHDs though because it's impossible to tell them apart from TOSEC .gdis and so the data can't be rearranged internally into the format MAME/chdman expects for GD-ROMs, and I wouldn't be surprised if Redump .gdi CHDs don't work in emulators.

@tjanas
Copy link

tjanas commented Mar 18, 2024

Redump uses bin/cue.
TOSEC uses gdi.

@angelosa
Copy link
Member

angelosa commented Mar 19, 2024

Gotta love attempts at verifying software while not even populating SW list as per #12154 .
If MAME driver cannot boot a raw .bin/.cue (*) then I'm mildly curious to check why assuming there's a "known working on another emulator", if not then I'm not even sure why MAME should care at all.

(*) which I'm not sure why you should, it has been intended to be used with .gdi specifically for DC, and converting a .cue to .gdi doesn't require rocket science.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants