/chapter9/chapter9_questions&keywords #59

qiwang067 · 2021-05-24T01:17:16Z

https://datawhalechina.github.io/easy-rl/#/chapter9/chapter9_questions&keywords

Description

Strawberry47 · 2021-11-16T09:33:17Z

Thanks♪(･ω･)ﾉ

Strawberry47 · 2021-11-16T11:31:25Z

请问actor-critic是off-policy吗

qiwang067 · 2021-11-17T13:47:22Z

请问actor-critic是off-policy吗

您好，A2C 和 A3C 都是 on-policy（同策略）的

15138922051 · 2022-01-08T13:58:31Z

A3C的code有吗，谢谢楼主

qiwang067 · 2022-04-17T10:16:19Z

A3C的code有吗，谢谢楼主

有A2C的 code：
https://github.com/datawhalechina/easy-rl/tree/master/codes/A2C

chenjiaqiang-a · 2023-02-17T07:39:58Z

代码库里实现的a2c算法和理论公式有些出入，我按照理论公式实现了一版，训练出来效果还不错，不知道这样的实现是否有问题，可以帮我看一下吗？

def update(self):
    state_pool, action_pool, reward_pool, next_state_pool, done_pool = self.memory.sample(len(self.memory), True)
    self.memory.clear()

    states = torch.tensor(state_pool, dtype=torch.float32, device=self.device)
    actions = torch.tensor(action_pool, dtype=torch.float32, device=self.device)
    next_states = torch.tensor(next_state_pool, dtype=torch.float32, device=self.device)
    rewards = torch.tensor(reward_pool, dtype=torch.float32, device=self.device)
    masks = torch.tensor(1.0 - np.float32(done_pool), device=self.device)

    probs, values = self.model(states)
    _, next_values = self.model(next_states)

    dist = Categorical(probs)
    log_probs = dist.log_prob(actions)
    advantages = rewards + self.gamma * next_values.squeeze().detach() * masks - values.squeeze()
    actor_loss = -(log_probs * advantages.detach()).mean()
    critic_loss = advantages.pow(2).mean()
    loss = actor_loss + self.critic_factor * critic_loss - self.entropy_coef * dist.entropy().mean()

    self.optimizer.zero_grad()
    loss.backward()
    self.optimizer.step()

qiwang067 added Gitalk /chapter9/chapter9_questions&keywords labels May 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/chapter9/chapter9_questions&keywords #59

/chapter9/chapter9_questions&keywords #59

qiwang067 commented May 24, 2021

Strawberry47 commented Nov 16, 2021

Strawberry47 commented Nov 16, 2021

qiwang067 commented Nov 17, 2021

15138922051 commented Jan 8, 2022

qiwang067 commented Apr 17, 2022

chenjiaqiang-a commented Feb 17, 2023

/chapter9/chapter9_questions&keywords #59

/chapter9/chapter9_questions&keywords #59

Comments

qiwang067 commented May 24, 2021

Strawberry47 commented Nov 16, 2021

Strawberry47 commented Nov 16, 2021

qiwang067 commented Nov 17, 2021

15138922051 commented Jan 8, 2022

qiwang067 commented Apr 17, 2022

chenjiaqiang-a commented Feb 17, 2023