Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decreased accuracy in Kraken 5.x compared to same setup in Kraken 4.x with Arabic language model #589

Open
bmwmy opened this issue Apr 18, 2024 · 2 comments

Comments

@bmwmy
Copy link

bmwmy commented Apr 18, 2024

Hi
I tried the same page with same setup with both Kraken 5.x and Kraken 4.x with provided Arabic_best.ml and there is more errors in the latest version (5.x) I think this relate to changes in segmenter which now been modified to allow curly segments which is probably not good for Arabic (I cannot find the issue #).

@mittagessen
Copy link
Owner

mittagessen commented Apr 18, 2024 via email

@bmwmy
Copy link
Author

bmwmy commented Apr 18, 2024

this is the command
kraken -i "yarab_deskewed.png" "yarab.txt" segment -bl ocr -m arabic_best.mlmodel
Kraken_Dated_07-09-2022.pdf
Kraken_4.13.20.pdf
kraken_5dev23.pdf

yarab_deskewed (the original file being OCRed)

in every major update in kraken, decreased accuracy being noted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants