New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad box coordinates in boxfile string! #338
Comments
Please have a look at https://github.com/tesseract-ocr/tesstrain/blob/main/ocrd-testset.zip how to prepare custom data for training. |
@zdenop thanks for your reply, this data does not provide any box files at all, how does tesseract know which character is which? |
Did you try to follow the instructions on https://github.com/tesseract-ocr/tesstrain/? |
@zdenop Thanks, after I removed the
This was generated for the following image: And I only put the files
I just wonder how it works and if there is an article about this process, I have not found anything about version 5 and it seems relatively new, right? But there are a lot of tutorials and examples for version 4, but they are different and the process is also different. p.s. the model created after the training was able to recognize characters it did not recognize before the training (I just used the model |
Did you read and follow https://github.com/tesseract-ocr/tesstrain? |
yes
@zdenop no, tesstrain first created the *.box files itself and it is not mentioned in tesstrain's readme. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I have prepared the following ground truth files:
The box files are based on WordStr, here is the content of the file
1.box
for example:In the file
1.gt.txt
I then have the corresponding text:And here is the image:
Running the command
make training MODEL_NAME=Chechen START_MODEL=rus TESSDATA=../tesseract/tessdata
, gives me an Error:I'm usin tesseract version 5.3.0
The text was updated successfully, but these errors were encountered: