Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong text recognition when there is a mix of numbers and characters #34

Open
efollana-sistel opened this issue Nov 30, 2018 · 1 comment

Comments

@efollana-sistel
Copy link

efollana-sistel commented Nov 30, 2018

Environment

Tesseract Version:
Tesseract 32 bits version:
tesseract v4.0.0-beta.1.20180608
leptonica-1.75.3
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.2.0

Platform:
Windows 10 32 and 64 bits

Current Behavior:

Tesseract cannot extract text correctly from following image:
test9

Output text is following:

180448013
706618
72.150/17
16.01.17
25495

More details:

Image is a 300 dpi resolution.
I used best tessdata configuration files.

Command line:

tesseract.exe "test9.png" "[MyPath]\output" -l spa --psm 3 --oem 3 pdf

Expected Behavior:

Tesseract must extract "18044801J" instead of "180448013".
Note: Using data files from https://github.com/tesseract-ocr/tessdata works fine.

@efollana-sistel
Copy link
Author

@Shreeshrii any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant