Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to ensure reproducibly data augmentation. #38

Open
FreemanG opened this issue Feb 28, 2019 · 3 comments
Open

How to ensure reproducibly data augmentation. #38

FreemanG opened this issue Feb 28, 2019 · 3 comments

Comments

@FreemanG
Copy link

In my experiments, I noticed that the data augmentation was not deterministic between different runs, even with the numpy and tf random seeds both being set.

@pawni
Copy link
Contributor

pawni commented Mar 1, 2019

Thanks for finding that! Do you have an example / some code for that? Which augmentation methods are you looking at?

@FreemanG
Copy link
Author

FreemanG commented Mar 2, 2019

I did experiments on the basis of the code - IXI_HH_sex_classification_resnet.

I found that setting random seed in the train function did not ensure reproducible data augmentation: the results of flip, extract_random_example_array changed from time to time.

Then I tried to set np.random.seed within the reader_fn, and then the same random examples were extracted. Howerver, this might lead bad effects to training since data were generated in the same pattern.

I also tried to pass a seed to reader_fn through reader_params, and recreate train_input_fn with a different seed each time when run nn.train. But the training failed and the accuracy kept decreasing from 1.

@FreemanG
Copy link
Author

FreemanG commented Mar 2, 2019

I might by far be able to solve this problem by setting np.random.seed within the reader_fn. Specifically, I passed a seed to reader_fn through reader_params, and recreate train_input_fn with a different seed each time when run nn.train.

I tested with the code as below:

reader_params = {'n_examples': 2, 'example_size': example_size, 'extract_examples': True,'seed': 42}
out1 = read_fn(all_filenames, mode, data_path, params=reader_params)
out2 = read_fn(all_filenames, mode, data_path, params=reader_params)

Then I save the images in the out* and check whether they were different. It turned out that the flip and crop worked as expected. Then the data augmentation process is reproducible (as far as now).

As for the training process, I found the problem is that I did not shuffle the data set adequately. Also you want to pass a seed to dataset.shuffle and tf.estimator.RunConfig to ensure reproducibility.

However, the results are still irreproducible with my own data. I cannot ensure the differences of the results of different runs to be acceptably small. I am still working on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants