Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing GREEK LUNATE SIGMA SYMBOL in grc and script/Greek models #55

Open
nisbet-hubbard opened this issue Dec 21, 2023 · 4 comments
Open
Labels
bug Something isn't working

Comments

@nisbet-hubbard
Copy link

Current Behavior

A lunate sigma (ϲ, U+03F2) is recognised under language ‘grc’ but is being output as a normal sigma (σς).

Expected Behavior

Outputting it as U+03F2.

Suggested Fix

No response

tesseract -v

5.3.0-6-g76ae

Operating System

No response

Other Operating System

No response

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

No response

@stweil
Copy link
Contributor

stweil commented Dec 21, 2023

That's not an issue of tesseract, but of the model which does not include the GREEK LUNATE SIGMA SYMBOL (see unicharsets for grc and script/Greek). Therefore I move this issue to langdata_lstm.

@stweil stweil transferred this issue from tesseract-ocr/tesseract Dec 21, 2023
@stweil stweil changed the title Lunate sigma recognised as normal sigma Missing GREEK LUNATE SIGMA SYMBOL in grc and script/Greek models Dec 21, 2023
@stweil
Copy link
Contributor

stweil commented Dec 21, 2023

The symbol is not recognized because it was not part of the training data. Therefore Tesseract detects another symbol which looks somehow similar.

@stweil stweil added the bug Something isn't working label Dec 21, 2023
@nisbet-hubbard
Copy link
Author

nisbet-hubbard commented Dec 21, 2023

Thanks for moving it. If I understand you correctly, the fact that I’m seeing the regular sigmas σ (when non-final) and ς (when final) in the OCR text whenever a lunate sigma ϲ is present in the image isn’t because the lunate sigma gets actually recognised as a sigma, but rather just because ϲ looks similar to σ/ς.

@stweil
Copy link
Contributor

stweil commented Dec 22, 2023

Yes, that's right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants