SAC alpha update problem #346

Shapeno · 2024-02-29T09:35:58Z

In obj_alpha = (self.alpha_log * (self.target_entropy - log_prob).detach()).mean() when alpha_log=0, alpha will be 1forever.
the correct way is obj_alpha = (self.alpha * (self.target_entropy - log_prob).detach()).mean() .

this problem is also found in rlkit.

Algorithm details in the source code of :
https://github.com/rail-berkeley/softlearning/blob/13cf187cc93d90f7c217ea2845067491c3c65464/softlearning/algorithms/sac.py#L256

The text was updated successfully, but these errors were encountered:

Shapeno · 2024-02-29T09:36:52Z

https://github.com/AI4Finance-Foundation/ElegantRL/blob/b4b9d662b9f9cb7cc368ac2b1036b5119eb20be4/elegantrl/agents/AgentSAC.py#L48C13-L48C23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAC alpha update problem #346

SAC alpha update problem #346

Shapeno commented Feb 29, 2024

Shapeno commented Feb 29, 2024

SAC alpha update problem #346

SAC alpha update problem #346

Comments

Shapeno commented Feb 29, 2024

Shapeno commented Feb 29, 2024