Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract_text goes on forever. #23

Open
Rammurthy5 opened this issue Aug 4, 2020 · 3 comments
Open

extract_text goes on forever. #23

Rammurthy5 opened this issue Aug 4, 2020 · 3 comments

Comments

@Rammurthy5
Copy link

I installed latest PDFBox on my Mac via pip.
I did an import and called on to the extract_text() method. And it keeps running perpetually for a 196 KB file.
Please help.

>>> import pdfbox as p, os
>>> os.path.exists(f).  # f is the file path
True
>>> pp = p.PDFBox()
>>> pp.extract_text(f)


extract_text(f) doesn't end, runs perpetually.

@lebedov
Copy link
Owner

lebedov commented Aug 5, 2020

What version of Python, Java, and MacOS are you running? Can you attach the file you are trying to process? As noted in #14, I haven't been able to reproduce the problem.

@Rammurthy5
Copy link
Author

macOS: 10.15.6
Python: 3.7.1
Java: 1.8.0_202
pdf copy.pdf
File attached.

@lebedov
Copy link
Owner

lebedov commented Aug 6, 2020

I didn't encounter any errors with the file you posted using the package versions in #14. Can you try using OpenJDK 14 rather than Oracle's Java?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants