Wrong text recognition when there is a mix of numbers and characters #34

efollana-sistel · 2018-11-30T09:36:19Z

Environment

Tesseract Version:
Tesseract 32 bits version:
tesseract v4.0.0-beta.1.20180608
leptonica-1.75.3
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.2.0

Platform:
Windows 10 32 and 64 bits

Current Behavior:

Tesseract cannot extract text correctly from following image:

Output text is following:

180448013
706618
72.150/17
16.01.17
25495

More details:

Image is a 300 dpi resolution.
I used best tessdata configuration files.

Command line:

tesseract.exe "test9.png" "[MyPath]\output" -l spa --psm 3 --oem 3 pdf

Expected Behavior:

Tesseract must extract "18044801J" instead of "180448013".
Note: Using data files from https://github.com/tesseract-ocr/tessdata works fine.

The text was updated successfully, but these errors were encountered:

efollana-sistel · 2018-12-13T08:46:16Z

@Shreeshrii any update on this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong text recognition when there is a mix of numbers and characters #34

Wrong text recognition when there is a mix of numbers and characters #34

efollana-sistel commented Nov 30, 2018 •

edited

efollana-sistel commented Dec 13, 2018

Wrong text recognition when there is a mix of numbers and characters #34

Wrong text recognition when there is a mix of numbers and characters #34

Comments

efollana-sistel commented Nov 30, 2018 • edited

Environment

Current Behavior:

Output text is following:

More details:

Command line:

Expected Behavior:

efollana-sistel commented Dec 13, 2018

efollana-sistel commented Nov 30, 2018 •

edited