Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when ingesting these files #245

Open
rickj opened this issue Nov 2, 2022 · 11 comments
Open

Errors when ingesting these files #245

rickj opened this issue Nov 2, 2022 · 11 comments

Comments

@rickj
Copy link
Contributor

rickj commented Nov 2, 2022

When ingesting all of the tests, and using EPUBCheck 5.0.0 beta, these errors were observed:
Error File
fatal epubcheck errors pkg-version-backward
bad XML markup pub-foreign_bad-fallback
bad XML markup pub-foreign_json-spine
XML parsing aborted pub-xml-external-id
fatal epubcheck errors pub-xml-names
fatal epubcheck errors pub-xml-non-validating_unclosed

@rickj
Copy link
Contributor Author

rickj commented Nov 2, 2022

Update... these errors are NOT from EPUBCheck 5.

Details:
pub-xml-external-id
Error parsing XHTML: parsing aborted
line: 4
file: epub://paginated.epub/EPUB/content_001.xhtml

pub-foreign_bad-fallback
Error parsing XHTML: not well-formed (invalid token)
line: 1
file: epub://paginated.epub/EPUB/foo.dmg

pub-foreign_json-spine
Error parsing XHTML: not well-formed (invalid token)
line: 1
file: epub://paginated.epub/EPUB/moby.json

these are errors specific to the VST ingestion workflow, not EPUBCheck. Essentially if there's a spine item, we expect it to be HTML/XHTML, and if we are trying to parse files like DMG and JSONs in there, these files will never ingest into our system.

Is this expected behavior?

@dauwhe
Copy link
Contributor

dauwhe commented Nov 2, 2022

Several of the tests involve deliberately-invalid EPUBs in order to test aspects of the specification. Some of the deliberate errors are quite extreme, for example to test what happens in fallback chains. I fully expect some of these files to not work in various reading systems--in fact I hope they don't.

@rickj
Copy link
Contributor Author

rickj commented Nov 2, 2022

should those we expect to fail be identified in the contributing guide for testing? At least the tester could know to look for other options (like side loading) for testing

@rickj rickj changed the title Errors when running EPUBCheck 5.0.0 beta against these files Errors when ingesting these files Nov 2, 2022
@iherman
Copy link
Member

iherman commented Nov 3, 2022

should those we expect to fail be identified in the contributing guide for testing? At least the tester could know to look for other options (like side loading) for testing

I agree. I would think that a very explicit statement should be present in the 'description' field in the package file (which is then taken over on the test description).

@dauwhe, as far as I could see, these are your tests, could you add such descriptions to the metadata? I am afraid if I do those, I may misunderstand the exact goals of the tests...

@iherman
Copy link
Member

iherman commented Nov 3, 2022

@dauwhe, if you can't do it I will do something... sometimes in the coming days.

@iherman
Copy link
Member

iherman commented Nov 4, 2022

@rickj

pub-xml-external-id Error parsing XHTML: parsing aborted line: 4 file: epub://paginated.epub/EPUB/content_001.xhtml

pub-foreign_bad-fallback Error parsing XHTML: not well-formed (invalid token) line: 1 file: epub://paginated.epub/EPUB/foo.dmg

pub-foreign_json-spine Error parsing XHTML: not well-formed (invalid token) line: 1 file: epub://paginated.epub/EPUB/moby.json

I have looked at these. I have created PR #246 adding a remark in the package file of the first two cases, whereby an RS may reject the respective EPUB document altogether. In my view, the RS does pass the test in that case.

I do have a problem with the third case, though. What happens there is that the spine item is a JSON file but a fallback is also specified for it to a bona fide HTML file. In my view, this is a correct EPUB document per spec, but it is based on the assumption that the RS implements the fallback. Put it another way, the RS error message shows that the RS does not implement fallbacks, which means that the test should fail.

Cc-ing the other testers here, too, to check whether my view is correct...

cc @mattgarrish @bduga @davemanhall @danielweck @wareid

@iherman
Copy link
Member

iherman commented Nov 4, 2022

I was also looking at the other three cases (pkg-version-backward, pub-xml-names, and pub-xml-non-validating_unclosed): the epubcheck errors are "intentional", too, in the sense that the EPUB documents are indeed meant to be invalid.

I think this falls under a situation we have already discussed elsewhere; these tests are aimed at RS-s that allow sideloading, and they should react in some proper way to those errors. On the other hand, it is understood that some RS-s are solely based on an ingestion flow that includes epubcheck; this is one of the situations for which we introduced lately the "n/a" (for "non-applicable") case in the test report. I guess this is exactly your case for those tests, @rickj.

@rickj
Copy link
Contributor Author

rickj commented Nov 30, 2022

Finishing up our testing, and applying 'n/a' for files that fail epubcheck, however, for files that pass epubcheck but do not open in a reading system, is 'n/a' the appropriate test result also?

@iherman
Copy link
Member

iherman commented Nov 30, 2022

Are these the cases when the test relies on fallback but they do not open because fallback is not implemented? If that is the case then, I am afraid, the tests fail due to missing functionalities...

@rickj
Copy link
Contributor Author

rickj commented Dec 3, 2022

Are these the cases when the test relies on fallback but they do not open because fallback is not implemented? If that is the case then, I am afraid, the tests fail due to missing functionalities...

Possibly. We will do more testing for these as side loaded and update where needed. Right now not passing epubcheck prevents them from even getting to the reading system

@iherman
Copy link
Member

iherman commented Dec 4, 2022

Are these the cases when the test relies on fallback but they do not open because fallback is not implemented? If that is the case then, I am afraid, the tests fail due to missing functionalities...

Possibly. We will do more testing for these as side loaded and update where needed. Right now not passing epubcheck prevents them from even getting to the reading system

Ah. That was not clear to me. Based on what you told before, that would mean a "n/a" then, because your implementation does not have side loading and everything is filtered through epubcheck...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants