Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update python-pdfbox to support PDFBox 3.* #27

Open
lebedov opened this issue Apr 2, 2021 · 5 comments
Open

update python-pdfbox to support PDFBox 3.* #27

lebedov opened this issue Apr 2, 2021 · 5 comments

Comments

@lebedov
Copy link
Owner

lebedov commented Apr 2, 2021

The command-line interface to PDFBox was changed in version 3.*.

@fakabbir
Copy link
Contributor

fakabbir commented Jun 6, 2023

currently a fork of python-pdfbox is available which works smoothly.
pip install python-pdfbox-v2

@mara004
Copy link

mara004 commented Jun 22, 2023

@fakabbir Your fork currently just pins to pdfbox 2.0.28. That is a workaround, not a solution.
(FWIW, I believe porting the CLI entrypoint calls shouldn't be too difficult.)

Apart from that, removing the pdfbox download logic as seen in 613a1a5 isn't nice, better adjust it to download from the v2 release branch only. If offline distributability is desired, you could build a wheel package bundling the jar.

@mara004
Copy link

mara004 commented Jun 26, 2023

@fakabbir Also, python-pdfbox main already had a better workaround with #29, so why did you submit #32 after that? See also #32 (comment)

@fakabbir
Copy link
Contributor

@mara004 As far as I remember, #29 was not merged or working when I discovered the breaking changes due to pdfbox v3. If #29 is working now, its great and we can discard #32.

The other issue is the package looks for the jar during runtime and make it unpredictable and also network depended. To resolve that I create the fork as pdf-box-v2.

What I think the best option would be to have the following option

  • An option to also download the jar file as static during pip install.
  • To fallback to in package jar file in case the file is not downloaded from internet
  • Migration to pdfbox-v3 should be available.

I am not sure @lebedov is still maintaining the project, so as a workaround only for non production high risk environment, python-pdfbox-v2 exisit.

Do you have any plans to maintain this repo in future ?

@mara004
Copy link

mara004 commented Jun 26, 2023

Thank you, those are all good considerations and I think I'm on the same page.

Concerning the python-pdfbox < 3 workaround, I just tested the latest release on PyPI and it seems to work without any problems. Actually #29 just looks like a minor fixup that has not been released yet, the main code for this already existed previously.

The other issue is the package looks for the jar during runtime and make it unpredictable and also network depended. To resolve that I create the fork as pdf-box-v2.

  • An option to also download the jar file as static during pip install.
  • Migration to pdfbox-v3 should be available.

Agreed. As I hinted at above and in #10, I think it would be best to refactor the code to download pdfbox on setup and also build wheels which bundle pdfbox.

I am not sure @lebedov is still maintaining the project,

Hmm, yes, looks as if python-pdfbox development might have halted.

so as a workaround only for non production high risk environment, python-pdfbox-v2 exisit.

Ah, I see. Sorry. Yes, for the purpose of avoiding downloads on runtime, such a fork makes sense as workaround.

Do you have any plans to maintain this repo in future ?

Not this repo, but I have a weak ambition to more or less restart from scratch with setup infrastructure and a few API-based helpers. I doubt very much if I have the time, though, and in case I do, I won't be able to put in as much effort as I did for pypdfium2.

See also pypdfium2-team/pypdfium2#230
I've also experimented with a few gists:
https://gist.github.com/mara004/51c3216a9eabd3dcbc78a86d877a61dc
https://gist.github.com/mara004/881d0c5a99b8444fd5d1d21a333b70f8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants