Skip to content

v1.19

Latest
Compare
Choose a tag to compare
@Shahabks Shahabks released this 24 Mar 05:47
· 41 commits to master since this release
8739377

Converter-pdf-files-to-.txt-or-.html
PDFs are notoriously difficult to scrape. This program converts them to *.txt or *.html formats. The program has tested for Latin alphabets and Japanese.

+ Download ---testpdf2txt.exe--- from the releases branch below.

- note: This program cannot open encrypted PDF, Before using this program you need to decrypt your pdf file

Introduction
I built this package on the work of Gorkovenko (Stanford University) and Greenfield (Harvard University) to convert *.pdf to *.txt or *.html. It is a standalone executable version of the package testpdf2txt.exe. You could download and use it even if you do not have python 3 installed on your machine.

You can save the program anywhere in your computer and run it by double-clicking on it directly from your machine.

Put your PDF file in a folder.
Double-click the program and follow the instruction on the screen.
You may save *.txt and *.html in a different directory, please enter the path to those directory if you wish.
Enter the filename of your PDF.
Converting Multiple PDFs to .txt