Skip to content

mmatiaschek/pypdfocr-docker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

pypdfocr-docker

PyPDFOCR on Docker

get rid of your paperwork...

what is pypdfocr

PyPDFOCR converts a scanned PDF into an OCR'ed PDF using Tesseract-OCR and Ghostscript

Dockerfile

Trusted Build

This Docker image is based on the official Ubuntu base image.

It incorporates a patch for issue #41 of pypdfocr 0.9.0 likely to be fixed in 0.9.1

How to use this image

docker run --rm mmatiaschek/pypdfocr [-h] [-d] [-v] [-m] [-l LANG] [--preprocess]
                [--skip-preprocess] [-w WATCH_DIR] [-f] [-c CONFIGFILE] [-e]
                [-n]
                [pdf_filename]

Case 1: Single Document

docker run -v ~/:/media --rm pypdfocr /media/filename.pdf

--> reads filename.pdf from your Home directory, filename_ocr.pdf will be generated

Case 2 : Watch folder

docker run -v ~/Documents/Paper:/media --rm mmatiaschek/pypdfocr -w /media -f -c /media/config.yaml

For sample config see config.yaml or pypdfocr authors repository here.

Help

docker run --rm mmatiaschek/pypdfocr [-h] [-d] [-v] [-m] [-l LANG] [--preprocess]
                [--skip-preprocess] [-w WATCH_DIR] [-f] [-c CONFIGFILE] [-e]
                [-n]
                [pdf_filename]

Interactive Shell

docker run --entrypoint=/bin/bash -t -i mmatiaschek/pypdfocr

How i use this image

  1. I use Scanner Pro on iOS (scanbot on Android) to scan and upload documents to a WebDAV folder without OCR
  2. The WebDAV folder is hosted on my Synology DiskStation NAS via HTTPS and shared between devices with CloudStation
  3. I run this PyPDFOCR on Docker manually on Mac OS X or hosted on a local server

This way my personal documents don't have to leave my hardware or network aka personal cloud.

About

PyPDFOCR on Docker

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published