Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to fix: Broken with latest gym pip package #16

Open
catid opened this issue Jan 15, 2024 · 1 comment
Open

How to fix: Broken with latest gym pip package #16

catid opened this issue Jan 15, 2024 · 1 comment

Comments

@catid
Copy link

catid commented Jan 15, 2024

The env.step return values changed, so this code is now how to get it going:

        # Number of timesteps run so far this batch
        t = 0 
        while t < self.timesteps_per_batch:
            # Rewards this episode
            ep_rews = []

            obs = self.env.reset()
            if isinstance(obs, tuple):
                obs = obs[0]  # Assuming the first element of the tuple is the relevant data

            terminated = False
            for ep_t in range(self.max_timesteps_per_episode):
                # Increment timesteps ran this batch so far
                t += 1
                # Collect observation
                batch_obs.append(obs)
                action, log_prob = self.get_action(obs)

                obs, rew, terminated, truncated, _ = self.env.step(action)
                if isinstance(obs, tuple):
                    obs = obs[0]  # Assuming the first element of the tuple is the relevant data

                # Collect reward, action, and log prob
                ep_rews.append(rew)
                batch_acts.append(action)
                batch_log_probs.append(log_prob)

            if terminated or truncated:
                break

Note that you now have to check terminated and truncated return values. Latest documentation is here: https://www.gymlibrary.dev/api/core/

Without this if you follow along with the blog post, it will fail at the end of Blog 3 at this step:

import gym
env = gym.make('Pendulum-v1')
model = PPO(env)
model.learn(10000)

Also you need to update Pendulum-v0 to Pendulum-v1.

@Lorenzo69420
Copy link

shouldn't the "if terminated or truncated: break" be inside of the for loop?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants