Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Reproducible trainings #580

Open
stweil opened this issue Mar 17, 2024 · 4 comments
Open

[feature request] Reproducible trainings #580

stweil opened this issue Mar 17, 2024 · 4 comments

Comments

@stweil
Copy link
Contributor

stweil commented Mar 17, 2024

Ideally, a training process should be reproducible, as this is required by good scientific practice.

Currently the kraken training is not reproducible. Two recognition trainings with the same ground truth and the same base model give different results (number of epochs, accuracies for the different intermediate models).

eScriptorium shuffles the ground truth randomly, but always uses the same seed, so the resulting training and validation sets are reproducible. But it looks like the training shuffles the training set once more, and that does not seem to be reproducible.

@stweil
Copy link
Contributor Author

stweil commented Mar 17, 2024

I just found my previous issue #302 for that. Is this an eScriporium issue which does not use the kraken API correctly?

@mittagessen
Copy link
Owner

mittagessen commented Mar 18, 2024 via email

@stweil
Copy link
Contributor Author

stweil commented Mar 18, 2024

I currently struggle with eS trainings which end with a model which claims to have 100 % accuracy although all epochs show accuracies lower than 99 %. When I export the final model and examine its metadata, I can see that it is always the model from epoch 0 (eS starts counting the epochs with 0, so it's the result from the first epoch).

@mittagessen
Copy link
Owner

Hmm, you can set deterministic=warn on the KrakenTrainer object in eScriptorium which should eliminate most non-deterministic behavior but won't get rid of it completely. Shuffling the training data twice shouldn't really have an impact as the state of the RNG remains the same between two training runs (if you restart the workers). Otherwise we'd need to re-seed it for each task. IIRC CUDA CTC is always non-deterministic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants