[Feature Request] Add capability to log on a step-based interval in OffPolicyAlgorithm #1708

tobiabir · 2023-10-06T11:47:52Z

🚀 Feature

At the time of writing the logging interval is controlled by the log_interval argument of the learn method. Permitted are integers. In OnPolicyAlgorithm this is the number of rounds (environment interaction + training steps) and in OffPolicyAlgorithm the number of episodes between logging.

Since episodes in general do not have fixed length or an end at all, logging on an episode-basis is not always practical (see Motivation).
Could we add the capability to log on a step-based interval?

Motivation

The main motivation is experiment tracking.
It is good practice to run experiments multiple times with different random seeds and display training plots with confidence intervals or min and max.
If you want to plot against environment steps (e.g. when you are interested sample complexity) you can't really do that properly if the logs are not done on a step-basis.
This is because with variable episode length the logs will not be aligned and it is difficult to compute the confidence intervals (e.g. what should the interval at step x be if you have a log at step x - 3 and x + 5).
So it would be great if this could be added.

Note also that the documentation is currently stating for all algorithms that log_interval is "The number of episodes before logging.", which is not true for on-policy algorithms. Issue #725 is related to that.

Pitch

I propose we change the OffPolicyAlgorithm case to be the same as the OnPolicyAlgorithm case.
Then logging on a step-basis can be done using a train_freq on a step-basis.
This has the additional benefit of more consistency between on-policy and off-policy algorithms.

While we are at it I also propose to move logging to after training in both OnPolicyAlgorithm and OffPolicyAlgorithm.
That way we can get all the information of one round in the same log.
As it is the information of the last training steps will not be logged.

Alternatives

The least invasive (to sb3) alternative is to modify the environments of interest to have fixed episode length.
However, this might not be practical for all environments and seems ugly.

Another alternative would be to allow log_interval to be a tuple (e.g. (4, "episodes")) like train_freq.
This also seems ugly.

Update:
Custom callback.
See discussion below.

Additional context

No response

Checklist

I have checked that there is no similar issue in the repo
If I'm requesting a new feature, I have proposed alternatives

The text was updated successfully, but these errors were encountered:

araffin · 2023-10-06T12:24:44Z

Hello,
have you considered callbacks as an alternative? (see doc and section on tensorboard)
they should allow you to log every n steps or every k iterations.

tobiabir · 2023-10-06T19:44:02Z

Hi.

Thank you for the quick answer and for that pointer!

Indeed, with a custom callback, logging on step-basis can be done.
I quickly tested this on SAC with the following callback.

class TensorboardCallback(sb3.common.callbacks.BaseCallback):
    def __init__(self, log_interval):
        super().__init__()
        self.log_interval = log_interval

    def _on_step(self):
        if self.model.num_timesteps % self.log_interval == 0:
            self.model._dump_logs()

Some minor draw backs remain with this approach:

One has to avoid interference from the other logs. This can be done by setting log_interval in learn to np.inf.
Everyone who needs this has to reimplement a custom callback even though logging is otherwise nicely integrated into the code base.
The logging is still done before the training. So to get the information of the last training steps one has to also implement callback _on_training_end.

Would you still consider some of the proposed changes? I think it could clean things up and improve consistency.
At least I think the documentation of log_interval should be fixed as for on-policy algorithms it is currently incorrect.

araffin · 2023-10-16T18:40:38Z

Adding a callback to do step based logging to the collection provided by sb3 would be a good addition i think.
Logging more often shouldn't be a problem (or solvable as you describe).

tobiabir · 2023-10-17T18:20:13Z

Could you elaborate on your vision for this? Like what set of features should this callback support? Should it be a minimal callback for only step based logging or should it be more general logging callback supporting all sorts of logging applications?

araffin · 2023-10-18T06:10:06Z

Could you elaborate on your vision for this?

Have a simple LogEveryNSteps callback that calls self.logger.dump() every n steps (n calls to env.step(), it might correspond to more than one steps when using multiple envs, as shown in the doc).

general logging callback supporting all sorts of logging applications?

We leave general purpose/custom callbacks to the user.

tobiabir added the enhancement New feature or request label Oct 6, 2023

tobiabir added a commit to tobiabir/stable-baselines3 that referenced this issue Oct 6, 2023

Fix log_interval handling in OffPolicyAlgorithm (DLR-RM#1708)

8712997

tobiabir added a commit to tobiabir/stable-baselines3 that referenced this issue Oct 6, 2023

Update changelog (DLR-RM#1708)

574307a

tobiabir added a commit to tobiabir/stable-baselines3 that referenced this issue Oct 6, 2023

Fix order of logging and training (DLR-RM#1708)

43dc8f9

tobiabir added a commit to tobiabir/stable-baselines3 that referenced this issue Oct 6, 2023

Update log_interval default values (DLR-RM#1708)

d0d1912

tobiabir linked a pull request Oct 6, 2023 that will close this issue

Add capability to log on a step-based interval in OffPolicyAlgorithm #1709

Open

16 tasks

araffin added the help wanted Help from contributors is welcomed label Oct 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add capability to log on a step-based interval in OffPolicyAlgorithm #1708

[Feature Request] Add capability to log on a step-based interval in OffPolicyAlgorithm #1708

tobiabir commented Oct 6, 2023 •

edited

araffin commented Oct 6, 2023

tobiabir commented Oct 6, 2023

araffin commented Oct 16, 2023

tobiabir commented Oct 17, 2023

araffin commented Oct 18, 2023

[Feature Request] Add capability to log on a step-based interval in OffPolicyAlgorithm #1708

[Feature Request] Add capability to log on a step-based interval in OffPolicyAlgorithm #1708

Comments

tobiabir commented Oct 6, 2023 • edited

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

Checklist

araffin commented Oct 6, 2023

tobiabir commented Oct 6, 2023

araffin commented Oct 16, 2023

tobiabir commented Oct 17, 2023

araffin commented Oct 18, 2023

tobiabir commented Oct 6, 2023 •

edited