Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MonteCarlo code error #120

Open
beifeng1937 opened this issue Dec 19, 2022 · 1 comment
Open

MonteCarlo code error #120

beifeng1937 opened this issue Dec 19, 2022 · 1 comment
Assignees

Comments

@beifeng1937
Copy link

train function in MonteCarlo.ipynb

agent.update(one_ep_transition) # 更新智能体 should be outside the for-loop

def train(cfg,env,agent):
    print('开始训练!')
    print(f'环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}')
    rewards = []  # 记录奖励
    for i_ep in range(cfg.train_eps):
        ep_reward = 0  # 记录每个回合的奖励
        one_ep_transition = []
        state = env.reset(seed=cfg.seed) # 重置环境,即开始新的回合
        for _ in range(cfg.max_steps):
            action = agent.sample_action(state)  # 根据算法采样一个动作
            next_state, reward, terminated, info = env.step(action)   # 与环境进行一次动作交互
            one_ep_transition.append((state, action, reward))  # 保存transitions
            state = next_state  # 更新状态
            ep_reward += reward  
            if terminated:
                break
        agent.update(one_ep_transition)  # 更新智能体
        rewards.append(ep_reward)
        print(f"回合:{i_ep+1}/{cfg.train_eps},奖励:{ep_reward:.1f}")
    print('完成训练!')
    return {"rewards":rewards}
@beifeng1937
Copy link
Author

There is also a bug in train function of Sarsa.ipynb . action = agent.sample(state) after while True should be deleted.
Correct code is:

def train(cfg,env,agent):
    print('开始训练!')
    print(f'环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}')
    rewards = []  # 记录奖励
    for i_ep in range(cfg.train_eps):
        ep_reward = 0  # 记录每个回合的奖励
        state = env.reset()  # 重置环境,即开始新的回合
        action = agent.sample(state)
        while True:
            #action = agent.sample(state)   should be deleted
            next_state, reward, done, _ = env.step(action)  # 与环境进行一次动作交互
            next_action = agent.sample(next_state)
            agent.update(state, action, reward, next_state, next_action,done) # 算法更新
            state = next_state # 更新状态
            action = next_action
            ep_reward += reward
            if done:
                break
        rewards.append(ep_reward)
        print(f"回合:{i_ep+1}/{cfg.train_eps},奖励:{ep_reward:.1f},Epsilon:{agent.epsilon}")
    print('完成训练!')
    return {"rewards":rewards}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants