You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Select transitions from highly-rewarding trajectories - this could be used to perform analyses of how data selection impacts MBRL, objective mismatch, etc.
Select transitions randomly from the replay buffer to have a fixed size of training/validation data.
It should be fairly easy to implement similar to TransitionIterator and BootStrapIterator above. (Taking care of trajectory/episodic boundaries could be a bit tricky.)
The text was updated successfully, but these errors were encountered:
Thanks @RaghuSpaceRajan . cc'ing @natolambert since this is highly relevant to his work. I think this proposal is the most straightforward way to do this on the data management side.
Yes, I have a version of this in my private repo, I will create a PR soon for it. The way I did it was for associating a "weight" for each transition, but some of the core functionality was a function to "update weights" for each trajectories. When updating the weights, it would be easy to create a ranking or heuristic mapping of some sort.
Related comment, I think it may be worthwhile to have an optional "rich logging" mode, where things like candidate actions, action sequences (plans) at each step, trajectories, and more are saved for every trial in the learning process. It accumulates a lot, but having access to this is useful for debugging.
馃殌 Feature Request
Create Replay Buffer Iterators that can select training and validation data in various "interesting" ways, similar to
TransitionIterator
andBootStrapIterator
inhttps://github.com/facebookresearch/mbrl-lib/blob/b0aabd79941efe8b56bcabbd1b43bf497b9b1746/mbrl/replay_buffer.py
Examples:
Motivation
This would make analysis similar to https://arxiv.org/abs/2002.04523 and https://arxiv.org/abs/2102.13651 easy to perform.
Pitch
It should be fairly easy to implement similar to
TransitionIterator
andBootStrapIterator
above. (Taking care of trajectory/episodic boundaries could be a bit tricky.)The text was updated successfully, but these errors were encountered: