Textify-PDF

Textify-PDF is a Python script that extracts text from all PDF files in a specified folder path and saves them as .txt files. It uses the Tika library to extract text from the PDF files.

Installation

Clone the repository or download the ZIP file and extract it to a folder.
Install the required Python libraries using pip: pip install -r requirements.txt

Usage

Open a terminal or command prompt in the folder where you extracted the files.
Run the script using the command: python main.py
Enter the path to the folder containing the PDF files when prompted, eg: 'D:\CODES\textify-pdf'

The script will extract text from all PDF files in the specified folder and save them as .txt files in a "txt" subfolder. It also generates a zip file containing all the processed .txt files.

New feature: The script now supports processing of password-protected PDF files. If a password-protected PDF file is encountered, the script will skip the file and log a warning message.

Usage screenshot Samples

License

Textify-PDF is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
sample.pdf		sample.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

LICENSE

LICENSE

README.md

README.md

main.py

main.py

requirements.txt

requirements.txt

sample.pdf

sample.pdf

Repository files navigation

Textify-PDF

Installation

Usage

Usage screenshot Samples

License

About

Releases

Packages

Languages

License

1994nikunj/textify-pdf

Folders and files

Latest commit

History

Repository files navigation

Textify-PDF

Installation

Usage

Usage screenshot Samples

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages