Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF format #611

Open
heinrich5991 opened this issue Jun 13, 2022 · 3 comments
Open

PDF format #611

heinrich5991 opened this issue Jun 13, 2022 · 3 comments

Comments

@heinrich5991
Copy link

Specification: https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf
Sample: https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf

@armijnhemel
Copy link
Collaborator

please note: PDF is typically parsed from the end of the file using an index with offsets, which is difficult with kaitai struct, as you first have to jump all the data, search for the index and then parse the file using the information from the index.

@Kreijstal
Copy link

if you want to understand pdf better use qpdf

@rillig
Copy link

rillig commented Mar 31, 2024

It would definitely be interesting to see how far Kaitai Struct can model the PDF format, due to these specialties:

  • Embedded streams that can be decoded into other file formats (TTF, PNG, JPEG)
  • Multiple references to the same PDF object
  • Possible gaps in the file that could be garbage-collected or used for steganography
  • Circular references between PDF objects
  • Textual PDF commands l, m, Tj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants