New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to train a model for multiple types of sources? #332
Comments
We don't know exactly how the standard models were trained because that was done by Google. Only some hints are available. |
But have you ever trained, or do you know of any case where, through a dataset of images, the assertiveness got to be greater than or equal to the standard model? This type of information is very scarce, I would like to have a north of the amount of a possible dataset to have a reasonably functional model. |
Yes, we trained lots of models meanwhile. See https://github.com/tesseract-ocr/tesstrain/wiki/GT4HistOCR or https://github.com/tesseract-ocr/tesstrain/wiki/GT4HistOCR for examples. |
Wow what a fuck! How have I not seen this before. But I still have some doubts: 1 - I saw that you use xml, in the dataset. Is this xml just to extract the words and use them as png and .gt.txt or is the xml used with the whole image? 2 - What is the order of magnitude of the dataset that you guys usually use (100k, 1M, 10M)? 3 - Do you do a lot of data augmentation to improve reading? |
|
The last question, I swear. Does it have much impact on assertiveness in training a model with multiple sources? Several images with different fonts, always keeping the proportion between them, of course. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I would like to know how the default model is trained. If it is trained with several images (if so, what order of magnitude), or if images are generated automatically with different sources.
I want to train a model of mine from the pattern and using images it seems that low resolution images the pattern model reads better even adding more and more dataset. I'm training with images varying DPI using characters, words and phrases. Should I be doing it differently?
The text was updated successfully, but these errors were encountered: