Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some missing words from converting PDF to Image #282

Open
jason-ng-zq99 opened this issue Apr 8, 2024 · 0 comments
Open

Some missing words from converting PDF to Image #282

jason-ng-zq99 opened this issue Apr 8, 2024 · 0 comments

Comments

@jason-ng-zq99
Copy link

jason-ng-zq99 commented Apr 8, 2024

Hi, I am currently encountering the titled issue when using the convert_from_bytes function.

On my Mac, this happens specifically if I open up a fillable pdf and fill in with the preview function
Words that are filled in this way do not get converted.
Screenshot 2024-04-08 at 20 58 08

Screenshot 2024-04-08 at 21 00 40

If i use strict=True, and also when i test out with the pdftoppm -r 200 -jpeg sample_pdf.pdf out command on my terminal,
I get the following error message:

Syntax Error: Unknown font tag 'ArialMT'
Syntax Error: Unknown font tag 'ArialMT'
Syntax Error (69): No font in show

I have also gotten Unknown font tag 'Helvetica' on other files.

I have also verified that these fonts are present in my system using the fc-match ArialMT command, which returns me the respective matched font, in this case it'sVerdana.ttf: "Verdana" "Regular"

Interestingly, texts that are filled in via the textbox function remains converted as seen below:
Screenshot 2024-04-08 at 21 03 48
Screenshot 2024-04-08 at 21 03 59

This problem was first found on my Debian GNU/Linux 11 docker, and has the exact same behavior.

I have also already tried installing fonts like fonts-freefont-ttf fonts-liberation fonts-liberation2 ttf-mscorefonts-installer but the same issue persists.

P.S. Suspecting it might be an issue with editable fields, I also tried to flatten the pdf first using fillpdf before using convert_from_path, but the same issue remains.

Problem replicated on two systems:

  • OS: macOS Sonoma 14.2.1

  • pdf2img version: 1.17.0

  • pdftoppm/pdftotext version: 24.03.0

  • OS: Debian GNU/Linux 11

  • Poppler version: 22.11.0

  • Poppler-data version: 0.4.10

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant