[Feature Request] Allow users to define gradient steps as a fraction of rollout time-steps #1920

janakact · 2024-05-06T06:25:44Z

🚀 Feature

Currently, SB3 algorithms allow you to define the number of gradient steps $= -1$, which will translate into the number of timesteps in the rollout, let's call it $k$.

However, allowing the user to define gradient_steps $=-2, -4$ or $-8$ which translates into consecutively $k/2, k/4, k/8$ gradient steps, will be more beneficial. This will allow users to write simpler hyper-parameter tuning code as well.

Implementation is pretty simple, change off-policy-algorithm L344:

gradient_steps = self.gradient_steps if self.gradient_steps >= 0 else rollout.episode_timesteps
# should be changed to:
gradient_steps = self.gradient_steps if self.gradient_steps >= 0 else rollout.episode_timesteps/(-self.gradient_steps)

Motivation

Simplify hyperparameter tuning. We always take the number of gradient steps as a fraction of the training frequency. (See rl-zoo code)

Pitch

This wouldn't break existing functionalities, and the code change is pretty simple.

Alternatives

No response

Additional context

No response

Checklist

I have checked that there is no similar issue in the repo
If I'm requesting a new feature, I have proposed alternatives

araffin · 2024-05-06T06:32:44Z

hello,
please don't forget about alternatives too (you ticked the box).

janakact · 2024-05-23T08:01:08Z

@araffin sorry, I couldn't think of alternatives. Apologies for the mistake. I unticked the checkbox.

araffin · 2024-05-23T09:22:38Z

Simplify hyperparameter tuning. We always take the number of gradient steps as a fraction of the training frequency. (See rl-zoo code)

the code in the RL Zoo is only one line of code, I'm not sure how much it would simplify things.

However, allowing the user to define gradient_steps or which translates into consecutively

I have to say using -2 to mean /2 seems rather unintuitive (-1 is a special case as it is intentionally an invalid value).

Alternatives

As you pointed out:

n_steps = train_freq * n_envs
sub_factor = 2
gradient_steps = max(n_steps // sub_factor, 1)

seems a pretty simple and flexible alternative

janakact added the enhancement New feature or request label May 6, 2024

araffin added the check the checklist You have checked the required items in the checklist but you didn't do what is written... label May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Allow users to define gradient steps as a fraction of rollout time-steps #1920

[Feature Request] Allow users to define gradient steps as a fraction of rollout time-steps #1920

janakact commented May 6, 2024 •

edited

araffin commented May 6, 2024

janakact commented May 23, 2024

araffin commented May 23, 2024

[Feature Request] Allow users to define gradient steps as a fraction of rollout time-steps #1920

[Feature Request] Allow users to define gradient steps as a fraction of rollout time-steps #1920

Comments

janakact commented May 6, 2024 • edited

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

Checklist

araffin commented May 6, 2024

janakact commented May 23, 2024

araffin commented May 23, 2024

janakact commented May 6, 2024 •

edited