Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traverse PDF layers and classes #543

Open
ChavdarVojvoda opened this issue Dec 16, 2023 · 1 comment
Open

Traverse PDF layers and classes #543

ChavdarVojvoda opened this issue Dec 16, 2023 · 1 comment

Comments

@ChavdarVojvoda
Copy link

I have architecture plans which in which different data is stored in different layers and classes and would like to fetch the data only from a particular class.
As there are a limited amount of examples for the library, I was wondering if there is a way to do this without manually converting the data in a traversable form i.e. dictionary.

@jbarlow83
Copy link
Member

PDF is a very complex format. In this case, the PDFs you have are being used to store data custom application specific data. There is a sort of standard way to do layers. I don't know what "classes" would be in PDF internal data structures - there are many ways one could express that.

If you're familiar with HTML, it's sort of like looking at a block of text and wondering how it got its particular formatting - maybe there are some CSS rules that select it, maybe Javascript dynamically modified the rules, maybe there's inline CSS, maybe the text is rendered with SVG or Canvas. Maybe it's an iframe. Without getting into the details of how a specific HTML application works, you can't answer that question. And a different application that looks identical to the user may have an entirely different technical implementation.

You could use a tool like iText RUPS to inspect the structure of the PDF, and the PDF reference manual, and see if you can find where the data you want to access is located in the PDF. Then pikepdf gives you an efficient way to retrieve that information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants