Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO Implementation Ignores Time Limits #20

Open
bheijden opened this issue Mar 30, 2024 · 4 comments
Open

PPO Implementation Ignores Time Limits #20

bheijden opened this issue Mar 30, 2024 · 4 comments

Comments

@bheijden
Copy link

Hi,

The current PPO implementation does not seem to account for time limits. While the EpisodeWrapper from brax is used, which tracks a truncation flag (source) in the info dictionary for correct termination handling, it appears this aspect is overlooked in the implementation.

Related Information:

Could this be an oversight, or am I missing a part of the implementation that addresses this?

@luchris429
Copy link
Owner

I believe there's ongoing discussion on this for CleanRL, though I've not caught up with the latest.

vwxyzjn/cleanrl#198

My understanding is that properly handling this does not usually result in significant performance differences.

sail-sg/envpool#194 (comment)

@luchris429
Copy link
Owner

That being said, if you would be interested in doing a PR for this with another file (say, ppo_time_limits.py), that would be great!

@bheijden
Copy link
Author

bheijden commented Apr 6, 2024

Thanks, that clears things up. Wasn't sure if it was perhaps handled elsewhere.

Concerning the ablation:
It looks like those benchmarks were done using Atari games, which, as far as I understand, aren't impacted by truncation—they usually just end or terminate. Truncation is more about situations where you have endless tasks, which is common in robotics scenarios like with the Ant or Cheetah. So, I'd be cautious about basing any conclusions solely on studies from Atari games. In fact, there are simpler settings that absolutely require proper truncation management to be solved, like the example from Time Limits in RL (arXiv) in the infinite horizon case:

image

If I end up requiring truncation, I'll see if I can cook up a PR.

@luchris429
Copy link
Owner

That's a good point! I think this could be worth doing in a separate file so people can see the differences. There is a significant downside of doubling the observation size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants