Skip to content

Converts a directory of PNGs to text. Used specifically for converting screenshots of an eBook (with numerically ordered filenames) into text through Tesseract OCR, passing to Google Text to Speech, and outputs an MP3 file.

bmw2621/pngToMP3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 

Repository files navigation

This utility is designed to convert a directory of images to MP3 files by way of Tesseract OCR and the Google Text to Speech API. To use this script the user must have installed Tesseract OCR, the pytesseract package available in the pip repository, and the gTTS package which is also available in the pip repository.

For the pytesseract package to work, the Tesseract binary must be available in your system path using the call "tesseract".

This script accomplishes its task by walking through all files in the directory the script is run in, passes each image to pytesseract to convert to text, and passes the generated text to Google Text to Speech to generate an mp3. Script assumes mpg123 is users playback program, if not, update script with appropriate call on line 52.

Ensure there are no other files in the directory of pngs.

Required Packages:
opencv-python (pip)
pytesseract (pip)
gTTS (pip)
Tesseract OCR - https://github.com/tesseract-ocr/tesseract/wiki/Downloads or in most Linux distribution repositories

Optional Package:
mpg123 https://www.mpg123.de/download.shtml or in most Linux distribution repositories or change system call to preferred program

About

Converts a directory of PNGs to text. Used specifically for converting screenshots of an eBook (with numerically ordered filenames) into text through Tesseract OCR, passing to Google Text to Speech, and outputs an MP3 file.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages