tesseract-ocr-lambda-function

AWS Lambda function that executes Tesseract (Optimal Character Recognition Engine) on Base 64 encoded images.
The function also implements horizontal whitespace detection through use of Tesseract's TSV output.

Dependencies

Tesseract-OCR
- Tesseract pre-compiled binaries for Amazon Linux 1 and Windows are included in this repo.
Windows
- Required for local testing.
Python 3.7.5
- Required for local testing.
- The Tesseract Linux binary in this repo is only compatible with Amazon Linux 1, which corresponds to Lambda Python runtimes ≤ Version 3.7.5.

Testing

Local testing is supported only for Windows.
Modify /test_suite to add test cases.
Execute ocr_tester.py to run tests.

Usage

Include a Base 64 encoded image in the function invocation payload.
The function will return a JSON response with the following variables:

text - String containing the recognized text or error info.
statusCode - Integer representing function success status. See table below:

Result	Status Code
Success	200
Invalid Base 64	400
OCR error	500

Deployment

Create a ZIP file with the following structure (tested using 7 Zip):

    .
    ├── lambda_handler.py
    ├── ocr.py
    └── dependencies
        └── tesseract_ocr_linux

Sign up for an AWS account.
Create S3 bucket.
Upload ZIP file to S3 bucket.
Create Lambda function.
Configure Lambda function with the following settings:

Setting	Value
Runtime	Python 3.7
Handler	lambda_handler.lambda_handler
Timeout	30+ seconds

Import source code from S3 bucket ZIP file.
Ready to use!

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
dependencies		dependencies
test_suite		test_suite
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
lambda_handler.py		lambda_handler.py
ocr.py		ocr.py
tesseract-ocr-lambda-fuction.zip		tesseract-ocr-lambda-fuction.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dependencies

dependencies

test_suite

test_suite

.gitignore

.gitignore

LICENSE.md

LICENSE.md

README.md

README.md

lambda_handler.py

lambda_handler.py

ocr.py

ocr.py

tesseract-ocr-lambda-fuction.zip

tesseract-ocr-lambda-fuction.zip

Repository files navigation

tesseract-ocr-lambda-function

Dependencies

Testing

Usage

Deployment

About

Releases

Packages

Contributors 2

Languages

License

sethepeterson/tesseract-ocr-lambda-function

Folders and files

Latest commit

History

Repository files navigation

tesseract-ocr-lambda-function

Dependencies

Testing

Usage

Deployment

About

Topics

Resources

License

Stars

Watchers

Forks

Languages