Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having trouble with simple OCR #4088

Open
chaudhryfaisal opened this issue Jun 12, 2023 · 1 comment
Open

Having trouble with simple OCR #4088

chaudhryfaisal opened this issue Jun 12, 2023 · 1 comment

Comments

@chaudhryfaisal
Copy link

Current Behavior

!tesseract score.jpg test --oem 1 -l eng --psm 11; cat test.txt

Yields to

Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 335
LAJOVIC

0 ty

Beat)

0

Expected Behavior

Proper prediction

LAJOVIC...0 0
CRESSY 0 15

Suggested Fix

No response

tesseract -v

tesseract 4.1.1
leptonica-1.79.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
Found AVX2
Found AVX
Found FMA
Found SSE
Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4

Operating System

Ubuntu 20.04 Focal

Other Operating System

NAME="Ubuntu"
VERSION="20.04.5 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.5 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

uname -a

Linux 8a4ef56ba3d1 5.15.107+ #1 SMP Sat Apr 29 09:15:28 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

score

@naourass
Copy link

naourass commented Aug 25, 2023

You need to preprocess the image for the ocr to work properly, especially binarizing/thresholding the image:
https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#binarisation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants