Skip to content

pdfliberation/pdf_table_extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

experiment_table-extraction

using poppler, pdf-table-extract and numpy to extract tabular data

visuals

The "visuals" folder contains the output of pdftotext -bbox with additional css and js code to visualize the locations of all detected text.

pdftoppm -r 150 -png -f 1 -l 1 pdf_documents/DHS.TSA.2013.pdf visuals/DHS.TSA.2013.png

pdftotext -r 150 -bbox -f 1 -l 1 pdf_documents/DHS.TSA.2013.pdf visuals/DHS.TSA.2013.html

table-extract

The "table-extract" folder looks at the performance of the pdf-table-extract library on various source documents. Also some rough code for generating pandas dataframes from the output.

About

experimenting with pdf2text and python pdf-table-extract

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published