Skip to content

Derek-Jones/pdf-2-csv

Repository files navigation

pdf to csv

Lots of data is held in graphs within pdf files.

Some of these graphs are represented using an image format, e.g., jpeg, while others are created using pdf operations (e.g., draw a cross at 10, 20).

If the pdf operations that create a graph are known, it is possible to extract the coordinates of the points in a graph; proof of concept

This project aims to add an option to Mozilla's pdf renderer to extracts the x/y coordinates of all the points appearing in a graph highlighted by the user.

pdf disassemblers

qpdf does an excellent job of mapping the contents of a pdf to text.

pdffigures extracts figures from pdfs.

Related tools

Manual conversion to svg and then automatic conversion from svg.

chemdataextractor, as the name suggests, is oriented towards extracting chemical information from pdfs, e.g., chemical names and formula.

utopia attempts to extract structural features of an article, including citations.

pdfgrep

pdftabextract

xpdf is used as a library by many tools.

poppler is a popular pdf rendering library.

About

Extracts points in graphs to csv

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages