How to ensure reproducibly data augmentation. #38

FreemanG · 2019-02-28T09:26:00Z

In my experiments, I noticed that the data augmentation was not deterministic between different runs, even with the numpy and tf random seeds both being set.

pawni · 2019-03-01T16:49:26Z

Thanks for finding that! Do you have an example / some code for that? Which augmentation methods are you looking at?

FreemanG · 2019-03-02T00:30:49Z

I did experiments on the basis of the code - IXI_HH_sex_classification_resnet.

I found that setting random seed in the train function did not ensure reproducible data augmentation: the results of flip, extract_random_example_array changed from time to time.

Then I tried to set np.random.seed within the reader_fn, and then the same random examples were extracted. Howerver, this might lead bad effects to training since data were generated in the same pattern.

I also tried to pass a seed to reader_fn through reader_params, and recreate train_input_fn with a different seed each time when run nn.train. But the training failed and the accuracy kept decreasing from 1.

FreemanG · 2019-03-02T07:01:08Z

I might by far be able to solve this problem by setting np.random.seed within the reader_fn. Specifically, I passed a seed to reader_fn through reader_params, and recreate train_input_fn with a different seed each time when run nn.train.

I tested with the code as below:

reader_params = {'n_examples': 2, 'example_size': example_size, 'extract_examples': True,'seed': 42}
out1 = read_fn(all_filenames, mode, data_path, params=reader_params)
out2 = read_fn(all_filenames, mode, data_path, params=reader_params)

Then I save the images in the out* and check whether they were different. It turned out that the flip and crop worked as expected. Then the data augmentation process is reproducible (as far as now).

As for the training process, I found the problem is that I did not shuffle the data set adequately. Also you want to pass a seed to dataset.shuffle and tf.estimator.RunConfig to ensure reproducibility.

However, the results are still irreproducible with my own data. I cannot ensure the differences of the results of different runs to be acceptably small. I am still working on that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to ensure reproducibly data augmentation. #38

How to ensure reproducibly data augmentation. #38

FreemanG commented Feb 28, 2019

pawni commented Mar 1, 2019

FreemanG commented Mar 2, 2019 •

edited

FreemanG commented Mar 2, 2019 •

edited

How to ensure reproducibly data augmentation. #38

How to ensure reproducibly data augmentation. #38

Comments

FreemanG commented Feb 28, 2019

pawni commented Mar 1, 2019

FreemanG commented Mar 2, 2019 • edited

FreemanG commented Mar 2, 2019 • edited

FreemanG commented Mar 2, 2019 •

edited

FreemanG commented Mar 2, 2019 •

edited