Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect/outdated documentation in README.md #316

Open
pratheesh-prakash opened this issue Oct 5, 2022 · 4 comments
Open

Incorrect/outdated documentation in README.md #316

pratheesh-prakash opened this issue Oct 5, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@pratheesh-prakash
Copy link

pratheesh-prakash commented Oct 5, 2022

In general, the documentation provided in README.md is very vague, and doesn't explain the training parameters and their impact on the output model.

Apart from the above, the information provided in the README.md is incorrect and outdated. Here are some major issues I have noticed.

Line 126 of README.md says

FINETUNE_TYPE Finetune Training Type - Impact, Plus, Layer or blank. Default: ''

However, Makfile doesn't seem to have any method to make use of this parameter. The help documentation (available through make help) also misses out this line. Is it because this option is unavailable in the later versions, or is it because the Makefile is outdated? Additionally, there is no information whatsoever on how these arguments (i.e. Plus, layer or '') would influence the training.

For plotting CER, according to README.md, the user must run './plot/plot_cer.sh'. Unfortunately, there exists no such shell-script in `plot'. Additionally, the python scripts provided in 'plot' would work only if the log-file is parsed to produce a csv.

The documentation also misses on how to interpret the results, how to optimise the hyperparameters, and how to improve the training data (For eg: how can we prevent 'Compute CTC targets failed' errors.).

It would be great if README.md is updated with latest information, and a more clear and detailed explanation of various parameters are provided.

@stale
Copy link

stale bot commented Nov 13, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issues which require input by the reporter which is not provided label Nov 13, 2022
@stweil
Copy link
Collaborator

stweil commented Nov 15, 2022

@pratheesh-prakash, do you want to send a pull request which improves that documentation?

@stale stale bot removed the stale Issues which require input by the reporter which is not provided label Nov 15, 2022
@stweil stweil added the enhancement New feature or request label Nov 15, 2022
@pratheesh-prakash
Copy link
Author

@stweil: I really wish I could contribute to tesseract-ocr. But I do not have in-depth knowledge on the issues which I have raised. I have checked the documentation only to clarify those doubts, and found this information either missing or outdated in the documentation. I would suggest that the update be done by someone among the developers.

@zdenop
Copy link
Contributor

zdenop commented Feb 20, 2023

Some details/explanation of whats happened is in #257.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants