Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shuffle buffer issue? #7

Closed
jfb54 opened this issue Mar 15, 2021 · 8 comments
Closed

shuffle buffer issue? #7

jfb54 opened this issue Mar 15, 2021 · 8 comments

Comments

@jfb54
Copy link

jfb54 commented Mar 15, 2021

I suspect that your reader code is affected by the meta-dataset shuffle buffer issue 54. I did a full run with your reader and the results were mostly consistent with what I would get with get using the official meta-dataset reader except for traffic signs (and a couple of other datasets) where the results were more optimistic that if the data is not shuffled. In a quick look through your code, it seems that the shuffle buffer mechanism is not used.

@mboudiaf
Copy link
Owner

mboudiaf commented Mar 17, 2021

Hmm that's interesting, thanks a lot for bringing this ! Let me look into this and come back to you with more answers ! :)

Update : I may have found the cause of the problem, that comes from the TFRecordDataset that was reading stream linearly without shuffling by default. By passing the argument shuffle_queue_size, it always keeps some samples in the buffer and randomly shuffles them, which should solve the problem (however may require a lot of memory if shuffle_queue_size is set too high). Please let me know if you're able to get consistent results with the official implementation now :) Thanks !

mboudiaf added a commit that referenced this issue Apr 1, 2021
…licate data when num_workers > 0 + attempt to solve the problem of generator pickling #5
@jfb54
Copy link
Author

jfb54 commented Apr 5, 2021

I did a run using your reader with the latest changes and I believe that the shuffle buffer issue is still present. I haven't looked carefully at how you implemented the shuffling, but I do know the recommended buffer size of examples to be selected from a class is 1000.

@mboudiaf
Copy link
Owner

mboudiaf commented Apr 5, 2021

Hey,
Thanks for re-testing again. The only thing I changed for this problem is at this line:

shuffle_queue_size=self.shuffle_queue_size)

By default I've set it to 10, not to pose memory problems, but you can easily hard code it to 1000 by modifying the line. Please let me know if that changes anythings and thanks again !

@jfb54
Copy link
Author

jfb54 commented Apr 10, 2021

Unfortunately setting this to 1000 does run into memory issues. To do proper Meta-Dataset training and evaluation, you need to have 19 iterators (1 for training, 8 for each of the validation datasets, 10 for each of the test datasets). When I ran this on a GPU within an 8 GPU cluster, it used so much in the way of resources that the jobs were automatically killed by the system. Not sure how to work around this.

@mboudiaf
Copy link
Owner

Hi,

I've solved the problem of shuffling by completely getting rid of the idea of keeping buffer :). The idea is simply to pre-create an index file for each .tfrecords file that indicates the (start_byte, end_byte) of each sample in the .tfrecord file. Then, once the iterator is queried, it generates a random ordering of the samples and only loads in memory 1 sample at a time by retrieving the right bytes in memory. This adds 0 memory overhead, is fast, and should scale to an arbitrary number of datasets. Concretely:

  1. To create the .index files for all the 10 datasets, I've written a script that you can simply execute:
bash make_index_files.sh [PATH_TO_CONVERTED_DATA]
  1. Now there is no more "shuffle_queue_size", there is only the binary "shuffle" option. If you activate it, each class dataset will be read in a random order. Once all samples have been processed, a new random permutation will be generated and so on ..

Please let me know if you're able to make it work with this modification !
Best

@jfb54
Copy link
Author

jfb54 commented Apr 23, 2021

I'm testing this now. Looking good so far. Will report back soon.

@jfb54
Copy link
Author

jfb54 commented May 1, 2021

I have done extensive testing between the "official" dataset reader and yours.

The good news: The training curves are almost identical and the shuffle buffer issue has been solved. Thanks!

The bad news: Accuracies on ilsvrc_2012 and mscoco are lower on test and validation by a few percent. All other datasets have very consistent accuracy with the official reader.

Datapoint: I took the model trained with your reader and tested it using the official reader. The ilsvrc_2012 jumped up to what I would expect.

I suspect that somehow ilsvrc_2012 tasks are being sampled differently from the official code (and maybe only the test or validation splits?). I have noticed that the performance of ilsvrc_2012 and mscoco are related as they both have similar content. This makes me think that the differences are due entirely to how ilsvrc_2012 is being handled by your code (maybe in the hierarchical sampling?).

@mboudiaf
Copy link
Owner

mboudiaf commented May 6, 2021

Hi @jfb54,

Thanks a lot for the update ! That is weird, the sampling part is really the one I took care to "copy/paste" as I did not want to interfere with the code. But I could try to double check again.

On my end, I actually observed the exact opposite: when training with my loader, and testing it with the official one, the performance decreased. I identified this coming from the difference between the TensorFlow resizing function (inside decoder.py) and the PyTorch resize that I use: they have different default behaviors. First on the way the preservation of aspect ratio is handled, and second on the anti-aliasing.

The anti-aliasing option is activated by default in PyTorch, and deactivated in TensorFlow, which causes a significant feature shift. Below is an example (left are images from the original loader, right are the ones from my implem.).
image

I have benchmarked the SimpleShot method with a Res-Net18 with my loader. To give you an idea, when testing on ILSVRC_2012, I get the following results: original: 52.7, original + anti-aliasing: 59.7, mine: 60.0.

Does the method you're working on require episodic training ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants