-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Array of dicts of tensors structure #27
Comments
Thanks for this @vadimkantorov |
I don't know much about TensorDcit project. I just wanted to share a usecase I had for dicts of tensors: represent a dataset in a way that avoids copy-on-write problems: pytorch/pytorch#13246 I represented this array of dicts of tensors as columnar dict of tensors, each key is a tensor that concats all per-items tensors related to a given key. One way it could integrate with TensorDict: provide a constructor/util function and a "indexing/getitem" method/util that would do slicing of all keys in TensorDict and return a new, "per-item" TensorDict. These may be just recipes from docs or util functions + tests that no copy-on-write/memory expansion is indeed happening, and such structure is safely shared in multiprocessing/dataloading without any copies Also, similar usecase may be for collecting some partial results from validation loop. Usually, one would store them in a list of dicts of tensors and then analyze it somehow. If such a structure is implemented in some extendable way (as proposed here: pytorch/pytorch#64359), it could be useful |
That is something we have I think. Here's an example: >>> tensordict1 = TensorDict({"a": torch.zeros(1, 1)}, [1])
>>> tensordict2 = TensorDict({"a": torch.ones(1, 1)}, [1])
>>> tensordict = torch.stack([tensordict1, tensordict2], 0)
>>>
>>> tensordict
LazyStackedTensorDict(
fields={
a: Tensor(torch.Size([2, 1, 1]), dtype=torch.float32)},
batch_size=torch.Size([2, 1]),
device=None,
is_shared=False)
>>>
>>> tensordict[0] is tensordict1
True
>>> tensordict["a"]
tensor([[[0.]],
[[1.]]]) The |
Storing them in "columnar format" can be more compact in some circumstances (e.g. they are copy-on-write safe in multiprocessing context as they are stored as a very small number of tensors not depending on "dataset" size): https://gist.github.com/vadimkantorov/86c3a46bf25bed3ad45d043ae86fff57
The text was updated successfully, but these errors were encountered: