Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Allow users to define gradient steps as a fraction of rollout time-steps #1920

Open
1 of 2 tasks
janakact opened this issue May 6, 2024 · 3 comments
Open
1 of 2 tasks
Labels
check the checklist You have checked the required items in the checklist but you didn't do what is written... enhancement New feature or request

Comments

@janakact
Copy link

janakact commented May 6, 2024

馃殌 Feature

Currently, SB3 algorithms allow you to define the number of gradient steps $= -1$, which will translate into the number of timesteps in the rollout, let's call it $k$.

However, allowing the user to define gradient_steps $=-2, -4$ or $-8$ which translates into consecutively $k/2, k/4, k/8$ gradient steps, will be more beneficial. This will allow users to write simpler hyper-parameter tuning code as well.

Implementation is pretty simple, change off-policy-algorithm L344:

gradient_steps = self.gradient_steps if self.gradient_steps >= 0 else rollout.episode_timesteps
# should be changed to:
gradient_steps = self.gradient_steps if self.gradient_steps >= 0 else rollout.episode_timesteps/(-self.gradient_steps)

Motivation

Simplify hyperparameter tuning. We always take the number of gradient steps as a fraction of the training frequency. (See rl-zoo code)

Pitch

This wouldn't break existing functionalities, and the code change is pretty simple.

Alternatives

No response

Additional context

No response

Checklist

  • I have checked that there is no similar issue in the repo
  • If I'm requesting a new feature, I have proposed alternatives
@janakact janakact added the enhancement New feature or request label May 6, 2024
@araffin araffin added the check the checklist You have checked the required items in the checklist but you didn't do what is written... label May 6, 2024
@araffin
Copy link
Member

araffin commented May 6, 2024

hello,
please don't forget about alternatives too (you ticked the box).

@janakact
Copy link
Author

@araffin sorry, I couldn't think of alternatives. Apologies for the mistake. I unticked the checkbox.

@araffin
Copy link
Member

araffin commented May 23, 2024

Simplify hyperparameter tuning. We always take the number of gradient steps as a fraction of the training frequency. (See rl-zoo code)

the code in the RL Zoo is only one line of code, I'm not sure how much it would simplify things.

However, allowing the user to define gradient_steps or which translates into consecutively

I have to say using -2 to mean /2 seems rather unintuitive (-1 is a special case as it is intentionally an invalid value).

Alternatives

As you pointed out:

n_steps = train_freq * n_envs
sub_factor = 2
gradient_steps = max(n_steps // sub_factor, 1)

seems a pretty simple and flexible alternative

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
check the checklist You have checked the required items in the checklist but you didn't do what is written... enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants