[Feature Request] Allow users to define gradient steps as a fraction of rollout time-steps #1920
Open
1 of 2 tasks
Labels
check the checklist
You have checked the required items in the checklist but you didn't do what is written...
enhancement
New feature or request
馃殌 Feature
Currently, SB3 algorithms allow you to define the number of gradient steps$= -1$ , which will translate into the number of timesteps in the rollout, let's call it $k$ .
However, allowing the user to define gradient_steps$=-2, -4$ or $-8$ which translates into consecutively $k/2, k/4, k/8$ gradient steps, will be more beneficial. This will allow users to write simpler hyper-parameter tuning code as well.
Implementation is pretty simple, change off-policy-algorithm L344:
Motivation
Simplify hyperparameter tuning. We always take the number of gradient steps as a fraction of the training frequency. (See rl-zoo code)
Pitch
This wouldn't break existing functionalities, and the code change is pretty simple.
Alternatives
No response
Additional context
No response
Checklist
The text was updated successfully, but these errors were encountered: