[Feature Request] Array of dicts of tensors structure #27

vadimkantorov · 2022-11-07T19:07:38Z

Storing them in "columnar format" can be more compact in some circumstances (e.g. they are copy-on-write safe in multiprocessing context as they are stored as a very small number of tensors not depending on "dataset" size): https://gist.github.com/vadimkantorov/86c3a46bf25bed3ad45d043ae86fff57

vmoens · 2022-11-08T12:03:11Z

Thanks for this @vadimkantorov
How do you see this interacting with TensorDict?
Should Arrays of dicts be a possible data type stored by tensordict? Do you have a typical use case in mind?

vadimkantorov · 2022-11-08T12:24:20Z

I don't know much about TensorDcit project. I just wanted to share a usecase I had for dicts of tensors: represent a dataset in a way that avoids copy-on-write problems: pytorch/pytorch#13246

I represented this array of dicts of tensors as columnar dict of tensors, each key is a tensor that concats all per-items tensors related to a given key.

One way it could integrate with TensorDict: provide a constructor/util function and a "indexing/getitem" method/util that would do slicing of all keys in TensorDict and return a new, "per-item" TensorDict. These may be just recipes from docs or util functions + tests that no copy-on-write/memory expansion is indeed happening, and such structure is safely shared in multiprocessing/dataloading without any copies

Also, similar usecase may be for collecting some partial results from validation loop. Usually, one would store them in a list of dicts of tensors and then analyze it somehow. If such a structure is implemented in some extendable way (as proposed here: pytorch/pytorch#64359), it could be useful

vmoens · 2022-11-08T16:39:16Z

Also, similar usecase may be for collecting some partial results from validation loop. Usually, one would store them in a list of dicts of tensors and then analyze it somehow. If such a structure is implemented in some extendable way (as proposed here: pytorch/pytorch#64359), it could be useful

That is something we have I think.

Here's an example:

>>> tensordict1 = TensorDict({"a": torch.zeros(1, 1)}, [1])
>>> tensordict2 = TensorDict({"a": torch.ones(1, 1)}, [1])
>>> tensordict = torch.stack([tensordict1, tensordict2], 0)
>>> 
>>> tensordict
LazyStackedTensorDict(
    fields={
        a: Tensor(torch.Size([2, 1, 1]), dtype=torch.float32)},
    batch_size=torch.Size([2, 1]),
    device=None,
    is_shared=False)
>>>
>>> tensordict[0] is tensordict1
True
>>> tensordict["a"]
tensor([[[0.]],

        [[1.]]])

The LazyStackedTensorDict does not currently support appending but we might consider that.

vadimkantorov added the enhancement New feature or request label Nov 7, 2022

vadimkantorov assigned vmoens Nov 7, 2022

vadimkantorov changed the title ~~[Feature Request] Array of dicts of tensors~~ [Feature Request] Array of dicts of tensors structure Nov 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Array of dicts of tensors structure #27

[Feature Request] Array of dicts of tensors structure #27

vadimkantorov commented Nov 7, 2022

vmoens commented Nov 8, 2022

vadimkantorov commented Nov 8, 2022 •

edited

vmoens commented Nov 8, 2022

[Feature Request] Array of dicts of tensors structure #27

[Feature Request] Array of dicts of tensors structure #27

Comments

vadimkantorov commented Nov 7, 2022

vmoens commented Nov 8, 2022

vadimkantorov commented Nov 8, 2022 • edited

vmoens commented Nov 8, 2022

vadimkantorov commented Nov 8, 2022 •

edited