Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid xref stream for lazy: true #79

Open
fulf opened this issue Jan 18, 2021 · 0 comments
Open

Invalid xref stream for lazy: true #79

fulf opened this issue Jan 18, 2021 · 0 comments

Comments

@fulf
Copy link

fulf commented Jan 18, 2021

Ruby: 2.5.1
Origami: 2.1.0

When trying to read some PDFs with lazy: true, the parser raises an exception and stops. The same PDFs are read without a problem with lazy: false and no errors are indicated.

Origami::PDF.read(pdf_content_stream, lazy: true, verbosity: Origami::Parser::VERBOSE_TRACE)
[info ] ...Reading header...
[error] Breaking on: "\xBF\xBD\xEF\xBF\xBD\x04|\r\xEF\xBF..." at offset 0x3445c
[error] Last exception: [Origami::InvalidObjectError] Object shall begin with '%d %d obj' statement
[debug] Skipping this indirect object.
[trace] Read Stream object, 33 0 R
Origami::Parser::ParsingError: Invalid xref stream
from /.rvm/gems/ruby-2.5.1/gems/origami-2.1.0/lib/origami/parsers/pdf/lazy.rb:159:in `parse_revision_from_xrefstm'

I've managed to trace the error to the fact that in the snippet below, parse_object fails on its first attempt, logging the two [error]s, and then successfully returns a Origami::Stream object. Of course Origami::Stream != Origami::XRefStream so the exception is raised. But an interesting thing is that XrefStream < Stream.

# lib/origami/parsers/pdf/lazy.rb:157
def parse_revision_from_xrefstm(revision)
                xrefstm = parse_object
                raise ParsingError, "Invalid xref stream" unless xrefstm.is_a?(XRefStream)
# ...

I don't know much about PDF files, so I don't know if this is working as intended, or not. In any case, what solutions would there be to properly reading the file? Any ones more proper than below?

begin
  Origami::PDF.read(pdf_content_stream, lazy: true)
rescue Origami::Parser::ParsingError
  Origami::PDF.read(pdf_content_stream, lazy: false)
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant