New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to ensure reproducibly data augmentation. #38
Comments
Thanks for finding that! Do you have an example / some code for that? Which augmentation methods are you looking at? |
I did experiments on the basis of the code - IXI_HH_sex_classification_resnet. I found that setting random seed in the train function did not ensure reproducible data augmentation: the results of flip, extract_random_example_array changed from time to time. Then I tried to set np.random.seed within the reader_fn, and then the same random examples were extracted. Howerver, this might lead bad effects to training since data were generated in the same pattern. I also tried to pass a seed to reader_fn through reader_params, and recreate train_input_fn with a different seed each time when run nn.train. But the training failed and the accuracy kept decreasing from 1. |
I might by far be able to solve this problem by setting np.random.seed within the reader_fn. Specifically, I passed a seed to reader_fn through reader_params, and recreate train_input_fn with a different seed each time when run nn.train. I tested with the code as below:
Then I save the images in the out* and check whether they were different. It turned out that the flip and crop worked as expected. Then the data augmentation process is reproducible (as far as now). As for the training process, I found the problem is that I did not shuffle the data set adequately. Also you want to pass a seed to dataset.shuffle and tf.estimator.RunConfig to ensure reproducibility. However, the results are still irreproducible with my own data. I cannot ensure the differences of the results of different runs to be acceptably small. I am still working on that. |
In my experiments, I noticed that the data augmentation was not deterministic between different runs, even with the numpy and tf random seeds both being set.
The text was updated successfully, but these errors were encountered: