Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUPS gives no visual indication of duplicate dictionary keys #34

Open
petervwyatt opened this issue Nov 15, 2022 · 6 comments
Open

RUPS gives no visual indication of duplicate dictionary keys #34

petervwyatt opened this issue Nov 15, 2022 · 6 comments

Comments

@petervwyatt
Copy link

Yes, PDFs with duplicate keys are not allowed by ISO 32K but it would be useful to have some form of visualization to know that this may be the cause of some other error like when the keys have different values. At the moment this is invisible...

Example PDF: https://assets.devoted.com/plan-documents/2022/DH-DisenrollmentForm-2022-ENG.pdf
Object 19 has 2 keys /h.32tc8hbyo16k but with different values - AFAICT RUPS chooses the last key in the dictionary

@MatthiasValvekens
Copy link
Contributor

Without looking at the code: I'm almost completely sure that RUPS simply lets iText Core do (most of) the heavy lifting when it comes to parsing, so RUPS only ever sees iText representations of PDF objects. I'm not sure how easy this feature would be to add without making sweeping changes in Core.

One way I could see this being implemented is by using a "secondary" parser specifically written to collect information about problems with object serialisation/representation. The code probably won't be pretty, though...

@Minothor
Copy link

Matthias hit the nail on the head there, that dictionary collapsing functionality is more a product of iText Core backing the dictionaries in standards accepted manner and backing the dictionaries with a HashMap accordingly.

Michaël and I have had a chat about how we could implement this in RUPS without having to change Core to be "looser" in it's standards implementation.
Again Matthias is right on the not-pretty part, some solutions would be prettier than others, but more like a parade of breed-standard english bulldogs and pugs - in the eye of the beholder.

@petervwyatt
Copy link
Author

Is there any way that just the presence of such an issue might be flagged or indicated (messages in the log? pop-up dialog?) rather than the heavy load of having to support duplicate keys in the PDF DOM tree?

Even a non-specific message simply stating that the PDF contained one or more duplicate key names gives PDF forensic investigators something to start looking for, even if iText Core cannot report the object or key name (obviously more info is better but I understand the complexity issue).

@MatthiasValvekens
Copy link
Contributor

The way I would handle that, if it were up to me, would be to implement a "recoverable error recorder" on PdfReader or PdfDocument in Core, and have iText write messages to that thing whenever it makes an explicit decision to ignore some situation that isn't allowed by the spec. RUPS could then query that. Still requires changes in Core, but at least it doesn't change current API behaviour...

@Minothor
Copy link

Michaël and I had a similar discussion to the same effect, we could expand the logging from Core and listen to the log events for situations in which Core has overridden invalid elements of the raw document.

Still requires changes to Core to make it more verbose in the logs but won't require functionality changes.

@petervwyatt
Copy link
Author

Just FYI and shameless self-promotion: I have hacked my Arlington TestGrammar PoC (C++) to now detect and report duplicate keys (mostly reliably) when using the hacked copy of pdfium that is in that repo. It is unfortunately finding more PDFs than I expected - thankfully most (but not all!) have the same key value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants