Big bug in PPO2 #35

Vinson-sheep · 2022-02-16T09:18:06Z

In dist = Normal(mu, sigma) , sigma should be a positive value, but actor_net output can be negative, so action_log_prob = dist.log_prob(action) can be nan.

Try:

import torch
a = torch.FloatTensor([1]).cuda()
b = torch.FloatTensor([-1]).cuda()
dist = Normal(a,b)
action = dist.sample()
action_log_prob = dist.log_prob(action)

print(action.cpu().numpy())
print(action_log_prob.item())

The text was updated successfully, but these errors were encountered:

jzl20 · 2022-04-15T12:53:43Z

so how can I fix the bug ？

flyinglife001 · 2022-09-15T08:40:54Z

return sigma*sigma

WhiteNightSleepless · 2023-04-07T06:36:57Z

You can add an activation function before the output of actor network. Using relu or softplus function may change sigma into a positive value. Hope it helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big bug in PPO2 #35

Big bug in PPO2 #35

Vinson-sheep commented Feb 16, 2022

jzl20 commented Apr 15, 2022

flyinglife001 commented Sep 15, 2022

WhiteNightSleepless commented Apr 7, 2023

Big bug in PPO2 #35

Big bug in PPO2 #35

Comments

Vinson-sheep commented Feb 16, 2022

jzl20 commented Apr 15, 2022

flyinglife001 commented Sep 15, 2022

WhiteNightSleepless commented Apr 7, 2023