[feature request] Reproducible trainings #580

stweil · 2024-03-17T21:07:02Z

Ideally, a training process should be reproducible, as this is required by good scientific practice.

Currently the kraken training is not reproducible. Two recognition trainings with the same ground truth and the same base model give different results (number of epochs, accuracies for the different intermediate models).

eScriptorium shuffles the ground truth randomly, but always uses the same seed, so the resulting training and validation sets are reproducible. But it looks like the training shuffles the training set once more, and that does not seem to be reproducible.

stweil · 2024-03-17T21:16:18Z

I just found my previous issue #302 for that. Is this an eScriporium issue which does not use the kraken API correctly?

mittagessen · 2024-03-18T09:32:18Z

On 24/03/17 02:16PM, Stefan Weil wrote: I just found my previous issue #302 for that. Is this an eScriporium issue which does not use the kraken API correctly?

We never really tried to make eScriptorium reproducible and it is currently not possible to make training 100% reproducible because of cuda/cudnn limitations. You can try the deterministic training switch on ketos but you'll still see differences between machines/library versions/phase of the moon.

stweil · 2024-03-18T10:38:49Z

I currently struggle with eS trainings which end with a model which claims to have 100 % accuracy although all epochs show accuracies lower than 99 %. When I export the final model and examine its metadata, I can see that it is always the model from epoch 0 (eS starts counting the epochs with 0, so it's the result from the first epoch).

mittagessen · 2024-03-18T10:48:29Z

Hmm, you can set deterministic=warn on the KrakenTrainer object in eScriptorium which should eliminate most non-deterministic behavior but won't get rid of it completely. Shuffling the training data twice shouldn't really have an impact as the state of the RNG remains the same between two training runs (if you restart the workers). Otherwise we'd need to re-seed it for each task. IIRC CUDA CTC is always non-deterministic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] Reproducible trainings #580

[feature request] Reproducible trainings #580

stweil commented Mar 17, 2024

stweil commented Mar 17, 2024

mittagessen commented Mar 18, 2024 via email

stweil commented Mar 18, 2024

mittagessen commented Mar 18, 2024

[feature request] Reproducible trainings #580

[feature request] Reproducible trainings #580

Comments

stweil commented Mar 17, 2024

stweil commented Mar 17, 2024

mittagessen commented Mar 18, 2024 via email

stweil commented Mar 18, 2024

mittagessen commented Mar 18, 2024