Missing characters like "Ň č ľ 付" when converting a fillable pdf form to image #254

ae-f · 2023-02-01T05:45:03Z

On running pdftoppm -r 200 -jpeg x.pdf out I am getting error similar to: Syntax Error: AnnotWidget::layoutText, cannot convert U+0147.

Font available on system is DejaVu and poppler-data is also installed.

OS: Ubuntu
Pdf2image version: 1.16.0

The text was updated successfully, but these errors were encountered:

jjbiggins · 2023-02-08T13:12:39Z

It's probably related to the version of poppler you're using. Here's the discussion on a related issue. https://gitlab.freedesktop.org/poppler/poppler/-/issues/1070

ae-f · 2023-02-08T13:17:53Z

Hello, thank you for your response. I have tried with v20.12.1 and v23.01.0-0. I am facing the issue similar issue in both the version cannot convert U+0147.

jjbiggins · 2023-02-08T14:33:55Z

If you share the PDF causing the issue, I can take a deeper dive investigating.

ae-f · 2023-02-13T05:13:52Z

Hello @jjbiggins,
I have attached the link to flattened pdf and the output pdf using pdf2image. Please have a look at missing characters when the pdf is converted to image.

Base PDF: https://drive.google.com/file/d/19GmVt1EzuTrhZS21Xxac74Y_VM31Uwdx/view?usp=share_link

Flattened PDF: https://drive.google.com/file/d/1CnoQbkVtIywK7zTyEd0j-MtNY-C54Tw-/view?usp=share_link

Output from pdf2image (windows): https://drive.google.com/file/d/1Hwy2UVBhExXIb3cKQM8QflzL0PqhKnTz/view?usp=share_link

Please open Base PDF and Flattened PDF outside drive PDF viewer so that input fields can be seen.

hash3liZer · 2024-04-01T22:23:13Z

@ae-f Were you able to sort this out? I am facing a similar issue as well.

hash3liZer · 2024-04-03T12:59:40Z

For anyone who's stuck at this issue. After spending days on this. This is how i sorted out the problem:

Print the PDF using firefox: (FYI: tried chrome as well but the characters were jumbled up on ubuntu)

from time import sleep
 
from helium import start_firefox
from selenium.webdriver import FirefoxOptions
 
options = FirefoxOptions()
options.add_argument("--headless")
options.set_preference("print.always_print_silent", True)
options.set_preference("print.printer_Mozilla_Save_to_PDF.print_to_file", True)
options.set_preference("print_printer", "Mozilla Save to PDF")
 
driver = start_firefox("file:///path/to/firefox.pdf"), options=options)
 
driver.execute_script("window.print();")
sleep(5)  # Found that a little wait is needed for the print to be rendered otherwise the file will be corrupted
 
driver.quit()

And then use tools like pdftoppm or pdftocairo to flatten the pdf file produced by firefox.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing characters like "Ň č ľ 付" when converting a fillable pdf form to image #254

Missing characters like "Ň č ľ 付" when converting a fillable pdf form to image #254

ae-f commented Feb 1, 2023

jjbiggins commented Feb 8, 2023 •

edited

ae-f commented Feb 8, 2023

jjbiggins commented Feb 8, 2023 •

edited

ae-f commented Feb 13, 2023 •

edited

hash3liZer commented Apr 1, 2024

hash3liZer commented Apr 3, 2024

Missing characters like "Ň č ľ 付" when converting a fillable pdf form to image #254

Missing characters like "Ň č ľ 付" when converting a fillable pdf form to image #254

Comments

ae-f commented Feb 1, 2023

jjbiggins commented Feb 8, 2023 • edited

ae-f commented Feb 8, 2023

jjbiggins commented Feb 8, 2023 • edited

ae-f commented Feb 13, 2023 • edited

hash3liZer commented Apr 1, 2024

hash3liZer commented Apr 3, 2024

jjbiggins commented Feb 8, 2023 •

edited

jjbiggins commented Feb 8, 2023 •

edited

ae-f commented Feb 13, 2023 •

edited