Skip to content

alexbrandsen/pdf-to-nested-xml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python script to convert a folder of Dutch archaeological reports (PDFs) to XML files nested by section, chapter, heading.

Requires:

- pdf-extract (https://github.com/CrossRef/pdfextract)
- pdftohtml (http://pdftohtml.sourceforge.net/)

About

Python script to convert a folder of Dutch archaeological reports (PDFs) to XML files nested by section, chapter, heading.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages