Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extraction of tables might include digital watermark #517

Open
skwskwskwskw opened this issue Feb 20, 2023 · 1 comment
Open

Extraction of tables might include digital watermark #517

skwskwskwskw opened this issue Feb 20, 2023 · 1 comment

Comments

@skwskwskwskw
Copy link

skwskwskwskw commented Feb 20, 2023

I am working on a PDF file which might include watermark when extracting the table. The watermark might occur at different locations. 2 approaches I am thinking but I am not sure how to approach it:

  1. Dont extract words that are rotated.
  2. When extracting, it should be absolute location of watermark as seen on PDF - but the tabula defined the watermark at different location.

The watermark looks like this (the number that is rotated):

image

@germainepym
Copy link

germainepym commented Jul 28, 2023

Hey, just wondering if you managed to find a solution/ workaround for the problem? I have a similar PDF that have a text watermark at the side too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants