Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arabic training text is only 80 lines #6

Open
Shreeshrii opened this issue Nov 28, 2018 · 2 comments
Open

Arabic training text is only 80 lines #6

Shreeshrii opened this issue Nov 28, 2018 · 2 comments

Comments

@Shreeshrii
Copy link
Contributor

The training text in langdata_lstm/ara is only 80 lines or so.

@Shreeshrii
Copy link
Contributor Author

Training text for other languages is thousands of lines.

It seems Arabic training text in the repo is same//similar to the one in langdata (for 3.04).

https://github.com/tesseract-ocr/langdata_lstm/blob/master/ara/ara.training_text

https://github.com/tesseract-ocr/langdata/blob/master/ara/ara.training_text

@Shreeshrii
Copy link
Contributor Author

Other languages with small training_texts:

     5826 Jun 18 09:48 tgl/tgl.training_text
     6022 Jun 18 09:48 afr/afr.training_text
     7386 Jun 18 09:48 ara/ara.training_text
     7544 Jun 18 09:48 kur/kur.training_text
    38579 Jun 18 09:48 amh/amh.training_text
   143591 Jun 18 09:48 asm/asm.training_text
   412473 Jun 18 09:48 bih/bih.training_text

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant