You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But, it just grabs all the text without keeping the semantics. I wonder if there is a API method that is provided by pdf.js to extract text semantically?
Thanks
The text was updated successfully, but these errors were encountered:
The getTextContent API (refer to https://github.com/mozilla/pdf.js/blob/master/examples/node/getinfo.js#L45 for a usage example) can only give you the text content of a single page, but there are no more semantics. This is mainly because in the PDF format text is just a series of glyphs and positions and in general no more information is included. Exceptions are tagged PDFs, which we don't support yet but we do track the support in #6269.
I found an example on how to extract text on a StackOverflow thread
This is the example code linked on the thread
But, it just grabs all the text without keeping the semantics. I wonder if there is a API method that is provided by pdf.js to extract text semantically?
Thanks
The text was updated successfully, but these errors were encountered: