Create optimizer in `OnPolicyAlgorithm` only after the device is set #1771

cmangla · 2023-12-04T17:57:30Z

Attempt to fix #1770 in a fully backward compatible manner.

Description

In PPO, the optimizer in the policy is created before the computation-device for the class is correctly set. This is a problem when the optimizer checks the target computation-device on initialization. This is a backward compatible fix.

Motivation and Context

Fixes #1770 . One can now use the fused option in the Adam optimizer on CUDA devices, which, according to the documentation, is faster.

I have raised an issue to propose this change

Types of changes

Bug fix

Checklist

I've read the CONTRIBUTION guide (required)
I have updated the changelog accordingly (required).
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.
I have reformatted the code using make format (required)
I have checked the codestyle using make check-codestyle and make lint (required)
I have ensured make pytest and make type both pass. (required)
I have checked that the documentation builds using make doc (required)

cmangla · 2023-12-05T19:02:11Z

@araffin This is ready for the CI tests now and potentially also to merge.

cmangla · 2023-12-06T12:07:31Z

stable_baselines3/common/policies.py

@@ -885,6 +895,7 @@ def __init__(
        normalize_images: bool = True,
        optimizer_class: Type[th.optim.Optimizer] = th.optim.Adam,
        optimizer_kwargs: Optional[Dict[str, Any]] = None,
+        _init_optimizer=True,  # Currently unused, see PR #1771


@araffin I'm currently testing enabling this one too. I will update this PR accordingly, hence switching it back to draft.

cmangla · 2023-12-07T15:04:36Z

@araffin Looks good now

cmangla marked this pull request as draft December 4, 2023 17:57

cmangla mentioned this pull request Dec 4, 2023

[Bug:] Cannot use the fused flag in default optimizer of PPO #1770

Closed

cmangla marked this pull request as ready for review December 5, 2023 10:27

cmangla changed the title ~~Create optimizer in PPO only after the device is set~~ Create optimizer in OnPolicyAlgorithm only after the device is set Dec 5, 2023

cmangla marked this pull request as draft December 5, 2023 12:16

cmangla marked this pull request as ready for review December 5, 2023 14:01

cmangla commented Dec 6, 2023

View reviewed changes

cmangla marked this pull request as draft December 6, 2023 12:07

Create optimizer in PPO after device is set

daf9f7a

cmangla force-pushed the pr-delay-ppo-optimizer branch from 4d07ede to daf9f7a Compare December 7, 2023 09:17

cmangla marked this pull request as ready for review December 7, 2023 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create optimizer in `OnPolicyAlgorithm` only after the device is set #1771

Create optimizer in `OnPolicyAlgorithm` only after the device is set #1771

cmangla commented Dec 4, 2023 •

edited

cmangla commented Dec 5, 2023

cmangla Dec 6, 2023

cmangla commented Dec 7, 2023

Create optimizer in OnPolicyAlgorithm only after the device is set #1771

Are you sure you want to change the base?

Create optimizer in OnPolicyAlgorithm only after the device is set #1771

Conversation

cmangla commented Dec 4, 2023 • edited

Description

Motivation and Context

Types of changes

Checklist

cmangla commented Dec 5, 2023

cmangla Dec 6, 2023

Choose a reason for hiding this comment

cmangla commented Dec 7, 2023

Create optimizer in `OnPolicyAlgorithm` only after the device is set #1771

Create optimizer in `OnPolicyAlgorithm` only after the device is set #1771

cmangla commented Dec 4, 2023 •

edited