Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] MBPO - Epoch loop missing steps_epoch reset #159

Open
matthiaskiller opened this issue Jul 27, 2022 · 2 comments
Open

[Bug] MBPO - Epoch loop missing steps_epoch reset #159

matthiaskiller opened this issue Jul 27, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@matthiaskiller
Copy link
Contributor

Observed

mbpo.py - line 199

in the for loop that is incrementing the steps of the current epoch, the steps_epoch iteration variable is not being reset after we observe a termination (done=True).
This will mean the next epoch will start from the steps_epoch+1 in which the previous epoch ended.
And the next epoch will be shorter than the actual epoch length (i.e. epoch_length - steps_epoch)

 for steps_epoch in range(cfg.overrides.epoch_length):
            if steps_epoch == 0 or done:
                obs, done = env.reset(), False

Is this behaviour desired?

Expected

If it's not desired we would propose something like this to set steps_epoch=0 :

 for steps_epoch in range(cfg.overrides.epoch_length):
            if steps_epoch == 0 or done:
                steps_epoch = 0
                obs, done = env.reset(), False

Thanks!:)

@matthiaskiller matthiaskiller added the bug Something isn't working label Jul 27, 2022
@luisenp
Copy link
Contributor

luisenp commented Jul 28, 2022

This wasn't intentional, good catch! Would you be interested in submitting a PR with the fix? If not, that's OK, thanks a lot for the bug report!

@matthiaskiller
Copy link
Contributor Author

Yes, sure!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants