Doc polish: "Data Loaders" --> "Datasets" #222

ZhitingHu · 2019-10-04T18:12:03Z

The section is titled "Data Loaders" https://texar-pytorch.readthedocs.io/en/latest/code/data.html#data-loaders

Would "Datasets" be better? Or does "Data Loaders" fit the Pytorch convention better?

AvinashBukkittu · 2019-10-04T18:20:29Z

I personally like Datasets as section heading here. All the classes described under this are Datasets provided by texar. Our Data Iterators share similarities with Data Loaders of pytorch. Also, I see that we are missing the doc for SingleDatasetIterator. I don't know if this was intentional.

ZhitingHu · 2019-10-04T19:18:10Z

The doc of Args is missing for Batch https://texar-pytorch.readthedocs.io/en/latest/code/data.html#texar.torch.data.Batch

huzecong · 2019-10-04T19:19:25Z

I like Dataset as well. I think the terms people use to describe data-related modules are pretty messy, so as long as we're being consistent it's fine. Let me reiterate our definitions:

A data source is something that reads and returns raw data examples one by one. Typical data sources include Python lists and iterators (SequenceDataSource and IterDataSource), lines from text files (TextLineDataSource), and pickled objects from binary files (PickleDataSource).
A dataset (or data loader) defines how data examples are preprocessed into a format suitable for the task, and how these processed examples can be batched. These are called *Data in our framework for compatibility with the TF version (although I kind of prefer names like MonoTextData to MonoTextDataset because it's shorter and nonetheless to the point). Note that dataset does not perform any of the operations by itself.
A data iterator executes the process and batch operations defined in a dataset. PyTorch calls this a "data loader".

It is intentional that we don't include the doc for SingleDatasetIterator. Users are expected to only use the DataIterator interface.

ZhitingHu · 2019-10-04T19:30:44Z

Thanks for the clarifation. Can these definitions be added to somewhere in the doc?

ZhitingHu · 2019-10-04T19:34:42Z

We can probably have an "Overview" page for each set of modules, to give an overview and highlight key features. Like in TF: https://www.tensorflow.org/api_docs/python/tf/data

huzecong · 2019-10-04T22:00:06Z

Sure. I'll get on it.

ZhitingHu added enhancement New feature or request topic: docs Issue about docstrings and documentation labels Oct 4, 2019

huzecong self-assigned this Oct 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc polish: "Data Loaders" --> "Datasets" #222

Doc polish: "Data Loaders" --> "Datasets" #222

ZhitingHu commented Oct 4, 2019

AvinashBukkittu commented Oct 4, 2019

ZhitingHu commented Oct 4, 2019

huzecong commented Oct 4, 2019

ZhitingHu commented Oct 4, 2019

ZhitingHu commented Oct 4, 2019

huzecong commented Oct 4, 2019

Doc polish: "Data Loaders" --> "Datasets" #222

Doc polish: "Data Loaders" --> "Datasets" #222

Comments

ZhitingHu commented Oct 4, 2019

AvinashBukkittu commented Oct 4, 2019

ZhitingHu commented Oct 4, 2019

huzecong commented Oct 4, 2019

ZhitingHu commented Oct 4, 2019

ZhitingHu commented Oct 4, 2019

huzecong commented Oct 4, 2019