You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
nvidia.dali.fn.readers.webdataset supports reading from multiple tar files, specified as a list of paths
How is reading from multiple sources performed? Are all sources read sequentially one after another?
What happens when random_shuffle parameter is set to True? Are samples drawn to buffer from one source or from all sources with some distribution?
Thank you
Check for duplicates
I have searched the open bugs/issues and have found no duplicates for this bug report
The text was updated successfully, but these errors were encountered:
Thank you for reaching out.
Answering your questions:
How is reading from multiple sources performed? Are all sources read sequentially one after another?
Abstracting away sharding (where each pipeline is assigned to a separate, non-overlapping shard of data) reading is done in sequence in each pipeline.
What happens when random_shuffle parameter is set to True? Are samples drawn to buffer from one source or from all sources with some distribution?
DALI uses an internal buffer of fixed size (initial_fill parameter) where data is read sequentially, and then when the batch is created this buffer is randomly sampled. The expectation from the data sets in containers (RecordIO, TFRecord, or webdataset) is they are preschuffled to avoid grouping samples belonging to one class so the first batch may have a very small representation (regarding classes) compared to the whole dataset.
Describe the question.
nvidia.dali.fn.readers.webdataset supports reading from multiple tar files, specified as a list of paths
How is reading from multiple sources performed? Are all sources read sequentially one after another?
What happens when
random_shuffle
parameter is set toTrue
? Are samples drawn to buffer from one source or from all sources with some distribution?Thank you
Check for duplicates
The text was updated successfully, but these errors were encountered: