Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Store next observations and dones in RolloutBuffer #1273

Open
1 task done
taufeeque9 opened this issue Jan 11, 2023 · 1 comment 路 May be fixed by #1267
Open
1 task done

[Feature Request] Store next observations and dones in RolloutBuffer #1273

taufeeque9 opened this issue Jan 11, 2023 · 1 comment 路 May be fixed by #1267
Labels
enhancement New feature or request

Comments

@taufeeque9
Copy link

馃殌 Feature

Add next_observations and dones fields to the RolloutBuffer and the DictRolloutBuffer classes, similar to how it is done in the ReplayBuffer class.

Motivation

Currently, on-policy algorithms don't store the next observations and dones fields in their buffer in the get_rollouts method. This is because these fields are not required by any of the algorithms in stable-baselines3. However, these fields are required to be stored in the buffer to implement the original variant of the AIRL algorithm in imitation.

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

  • I have checked that there is no similar issue in the repo
@taufeeque9 taufeeque9 added the enhancement New feature or request label Jan 11, 2023
@taufeeque9 taufeeque9 linked a pull request Jan 11, 2023 that will close this issue
16 tasks
@araffin
Copy link
Member

araffin commented Jan 12, 2023

Add next_observations and dones fields to the RolloutBuffer and the DictRolloutBuffer classes, similar to how it is done in the ReplayBuffer class.

dones are stored in episode_starts (shifted by one) and next_observations can be retrieved using observations[i+1] (except for terminal obs)

Alternatives

why not implement a custom buffer for your use case?
(and you can fill it using a callback or custom SB3 version)

. However, these fields are required to be stored in the buffer to implement the original variant of the AIRL algorithm in imitation.

do you have a code example of that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants