maybe a small bug in the function `explore_vec_env` of discretePPO and discreteA2C? #340

DranZohn · 2023-12-06T14:03:21Z

in the function explore_vec_env of AgentPPO, the variable actions shaped with [horizon_len, self.num_envs, 1], but the following expression convert(action) return the tensor with the 1-dim shape num_envs, which actually should be [num_envs, 1] as it works in explore_vec_env of AgentD3QN. And it indeed faild the demoexamples/demo_A2C_PPO.py.

Folloiwing change works for me:

# ActorDiscretePPO of net.py
  def get_action(self, state: Tensor) -> (Tensor, Tensor):
      state = self.state_norm(state)
      a_prob = self.soft_max(self.net(state))
      a_dist = self.ActionDist(a_prob)
      action = a_dist.sample()
      logprob = a_dist.log_prob(action)
      return action.unsqueeze(1), logprob  # unsqueeze the action

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

maybe a small bug in the function `explore_vec_env` of discretePPO and discreteA2C? #340

maybe a small bug in the function `explore_vec_env` of discretePPO and discreteA2C? #340

DranZohn commented Dec 6, 2023

maybe a small bug in the function explore_vec_env of discretePPO and discreteA2C? #340

maybe a small bug in the function explore_vec_env of discretePPO and discreteA2C? #340

Comments

DranZohn commented Dec 6, 2023

maybe a small bug in the function `explore_vec_env` of discretePPO and discreteA2C? #340

maybe a small bug in the function `explore_vec_env` of discretePPO and discreteA2C? #340