This is a small and simple cli ocr script to automatically ocr an image or split a pdf into images and then ocr the images of the pages.
-
Updated
Feb 10, 2024 - Python
This is a small and simple cli ocr script to automatically ocr an image or split a pdf into images and then ocr the images of the pages.
Performs a very fast OCR on a list of images (file path, url, base64, bytes, numpy, PIL ...) using Tesseract and returns the recognized text, its coordinates, and line-based word grouping in a DataFrame.
Text extraction from image through OCR
The app extracts tabular data from PNG, JPG, or PDF files uploaded by the user and converts it into a downloadable CSV file.
This repository contains a document scanner app that could perform Optical Character Recognition.
Scripts to convert low-quality scanned PDFs to text files using Google Cloud Vision and GPT-3 for spellchecking
Windows application for text decoding using the TesseractOCR library.
Multiprocessing OCR with Tesseract
A Baybayin OCR software package. These algorithms aim to recognize Baybayin texts at the character, word, and block levels.
A Python-based project that extracts text from images using Optical Character Recognition (OCR) techniques, leveraging the Tesseract OCR engine.
A simple desktop application to extract text from images using OpenCV and Pytesseract-OCR module of Python3.And the GUI is implemented using Tkinter module of python3.
BizCardX is a Streamlit-based tool that uses OCR to extract and manage business card data. Easily upload cards, extract information, and store it in a PostgreSQL database.
Sample project showing how to integrate the Docutain SDK into a Windows Forms application.
Sample project showing how to integrate the Docutain SDK into a WPF application.
Intelligent File Delivery Tool
Add a description, image, and links to the ocr topic page so that developers can more easily learn about it.
To associate your repository with the ocr topic, visit your repo's landing page and select "manage topics."