Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Indic numerals and missing punctuation to Arabic #131

Open
mustafa0x opened this issue Jul 12, 2018 · 4 comments
Open

Add Indic numerals and missing punctuation to Arabic #131

mustafa0x opened this issue Jul 12, 2018 · 4 comments

Comments

@mustafa0x
Copy link

mustafa0x commented Jul 12, 2018

Previously: #71 and tesseract-ocr/tessdata_best#11 (also contains a pertinent discussion on how well the different traineddata deal with these characters).

• Indic numerals: (٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩)
• Punctuation: (؛, ،, ﴿﴾)
• Also, a ligature very commonly found in Arabic texts: ﷺ

If I can do this myself please simply point me the way.

CC @Shreeshrii

@Shreeshrii
Copy link
Contributor

Please see tesseract-ocr/tesseract#2263 (comment)
and test if the traineddata files linked there add all the required characters.

@wewark
Copy link

wewark commented Feb 11, 2020

Is this fixed? I've tried the latest version and it didn't detect any Indic numerals.

@ShroukMansour
Copy link

@wewark you have to use Arabic.traineddata file. It recognizes arabic, English letters and Arabic-Indic and Arabic numbers

@AhmedElsayedTaha
Copy link

@ShroukMansour I use ara.traindata and texts not accuracy also numbers have no accuracy . Is there a solution for this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants