Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility pdftools + R 4.2.1/4.2.2 #120

Open
susannabolz opened this issue Jan 24, 2023 · 1 comment
Open

Compatibility pdftools + R 4.2.1/4.2.2 #120

susannabolz opened this issue Jan 24, 2023 · 1 comment

Comments

@susannabolz
Copy link

I have a problem when using pdftools::pdf_text() with some PDFs when using R 4.x.
For most PDFs, everything works, but there are some PDFs where a fatal error occurs. I'm using Rstudio 2022.12.0+353 "Elsbeth Geranium". I tried to figure out whether there is a particular characteristic of the PDFs where the fatal error occurs, but did not find any. In case it is related to the PDF characteristics, I attached the respective files.
The same problem occurs when using R.4.1.
When using an older version of R (I tried 4.1.3.), everything works as expected.

sessionInfo() R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)
Matrix products: default
locale: [1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8 [3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C [5] LC_TIME=German_Germany.utf8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] dplyr_1.0.10 data.table_1.14.6 stringr_1.5.0 pdftools_3.3.2
loaded via a namespace (and not attached): [1] Rcpp_1.0.9 rstudioapi_0.14 magrittr_2.0.3 tidyselect_1.2.0 timechange_0.2.0 [6] R6_2.5.1 rlang_1.0.6 fansi_1.0.3 tools_4.2.2 utf8_1.2.2 [11] cli_3.6.0 DBI_1.1.3 askpass_1.1 assertthat_0.2.1 tibble_3.1.8 [16] lifecycle_1.0.3 qpdf_1.3.0 vctrs_0.5.1 glue_1.6.2 stringi_1.7.12 [21] compiler_4.2.2 pillar_1.8.1 generics_0.1.3 lubridate_1.9.0 pkgconfig_2.0.3

PDF_no error.pdf
PDF_fatal error.pdf

@ugfmoritz
Copy link

I have the same problem!

It must be some kind of protection that pdftools cannot work with. I identified one PDF where this problem exists and tried to solve it with pikepdf (Python) programatically which shows some effect in that it is not protected anymore, but there is still a fatal error when I open it with pdftools. I also unprotected it with the online tool of ilovepdf and that is where it works. So I guess it is some protection mechanism.

R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default
locale: [1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8 LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C LC_TIME=German_Germany.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] janitor_2.1.0 installr_0.23.4 forcats_0.5.2 stringr_1.5.0 dplyr_1.0.10 purrr_1.0.0 readr_2.1.3 tidyr_1.2.1 tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2 [12] parsedate_1.3.1 data.table_1.14.6 lubridate_1.9.0 timechange_0.1.1 pdftools_3.3.2

loaded via a namespace (and not attached): [1] qpdf_1.3.0 tidyselect_1.2.0 haven_2.5.1 gargle_1.2.1 snakecase_0.11.0 colorspace_2.0-3 vctrs_0.5.1 generics_0.1.3 utf8_1.2.2 rlang_1.0.6 [11] pillar_1.8.1 glue_1.6.2 withr_2.5.0 DBI_1.1.3 dbplyr_2.2.1 modelr_0.1.10 readxl_1.4.1 lifecycle_1.0.3 munsell_0.5.0 gtable_0.3.1 [21] cellranger_1.1.0 rvest_1.0.3 tzdb_0.3.0 fansi_1.0.3 broom_1.0.2 Rcpp_1.0.9 scales_1.2.1 backports_1.4.1 googlesheets4_1.0.1 jsonlite_1.8.4 [31] fs_1.5.2 askpass_1.1 hms_1.1.2 stringi_1.7.8 grid_4.2.2 cli_3.5.0 tools_4.2.2 magrittr_2.0.3 crayon_1.5.2 pkgconfig_2.0.3 [41] ellipsis_0.3.2 xml2_1.3.3 reprex_2.0.2 googledrive_2.0.0 assertthat_0.2.1 httr_1.4.4 rstudioapi_0.14 R6_2.5.1 compiler_4.2.2

--

Attached you find all three documents.
DE000A0WMPJ6_Q1_2015_unlocked_works.pdf
DE000A0WMPJ6_Q1_2015_original.pdf
DE000A0WMPJ6_Q1_2015_unlocked_doesnt_work.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants