SACD Discrete Soft Actor Critic #203

splatter96 · 2023-08-07T12:43:17Z

This PR introduces the Soft Actor Critic for discrete actions (SACD) algorithm.

Description

This PR implements the SAC-Discrete algorithm as described in this paper https://arxiv.org/abs/1910.07207. This implementation borrows code from the papers original implementation (https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch) as well as provided by the issues author who requested this feature in stable baselines (https://github.com/toshikwa/sac-discrete.pytorch)

Context

I have raised an issue to propose this change (required)
Original issue in the stable baselines repo [Feature request] Implement SAC-Discrete DLR-RM/stable-baselines3#157

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

Note: we are using a maximum length of 127 characters per line

currently

critics

araffin · 2023-08-12T07:55:03Z

Hello,
thanks for the PR =)

The functionality/performance matches that of the source (required for new training algorithms or training-related features).

please don't forget that part (see contributing guide).
I think there are discussion about the results here too: vwxyzjn/cleanrl#270

splatter96 · 2023-09-01T13:17:39Z

Hello,
thanks for the feedback :)
Sorry for the late reply! Should I add the performance comparison to the source similarly as it is done in the official stable baselines3 algorithm pages? As in create a baselines3-zoo config for it and add the plots to this PR?

araffin · 2023-09-01T13:25:42Z

yes please =)

Paul Auerbach and others added 9 commits July 31, 2023 16:07

Added first version of SAC Discrete, which is running but not learning

a14ae69

currently

Fixed bugs in that lead to wrong results, currently only working with 2

875b8bc

critics

Reworked code to work whith more than 2 critic networks

7711813

Code style changes

4a37f58

Prepared files for merge request (minor cleanup)

fca2c6d

Added run test for SACD

610fd3d

Added doc page for SACD

d97dbc7

Added save_load test for SACD

bc08ee9

Merge branch 'Stable-Baselines-Team:master' into master

4e99b74

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SACD Discrete Soft Actor Critic #203

SACD Discrete Soft Actor Critic #203

splatter96 commented Aug 7, 2023

araffin commented Aug 12, 2023

splatter96 commented Sep 1, 2023

araffin commented Sep 1, 2023

SACD Discrete Soft Actor Critic #203

Are you sure you want to change the base?

SACD Discrete Soft Actor Critic #203

Conversation

splatter96 commented Aug 7, 2023

Description

Context

Types of changes

Checklist:

araffin commented Aug 12, 2023

splatter96 commented Sep 1, 2023

araffin commented Sep 1, 2023