Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Parallel environment processing BrokenPipeError:[WinError 109] #844

Open
roeslib opened this issue May 22, 2023 · 2 comments
Open

Comments

@roeslib
Copy link

roeslib commented May 22, 2023

Could someone please help me? I am training my PPO model with 128 parallel environments and at the step number 2340992 comes this error that stops the execution of the script. I tried to reduce the number of parallel environments but the error persists.

Traceback (most recent call last):
File "C:\Users\Libia\anaconda3\envs\rlenvironment\lib\multiprocessing\connection.py", line 301, in _recv_bytes
ov, err = _winapi.ReadFile(self._handle, bsize,
BrokenPipeError: [WinError 109] Ha terminado la canalización

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 17, in
train_model(args, writer,run_name)
File "C:\Users\Libia\anaconda3\envs\rlenvironment\PPO_3DCONV2DCONV\modeltraining.py", line 102, in train_model
next_obs, reward, done, info = envs.step(action.cpu().numpy())
File "c:\users\libia\anaconda3\envs\rlenvironment\baselines\baselines\common\vec_env\vec_env.py", line 108, in step
return self.step_wait()
File "C:\Users\Libia\anaconda3\envs\rlenvironment\PPO_3DCONV2DCONV\vectorizedenvs_test.py", line 125, in step_wait
obs, reward, done, info = self.venv.step_wait()
File "c:\users\libia\anaconda3\envs\rlenvironment\baselines\baselines\common\vec_env\vec_normalize.py", line 27, in step_wait
obs, rews, news, infos = self.venv.step_wait()
File "c:\users\libia\anaconda3\envs\rlenvironment\baselines\baselines\common\vec_env\shmem_vec_env.py", line 76, in step_wait
outs = [pipe.recv() for pipe in self.parent_pipes]
File "c:\users\libia\anaconda3\envs\rlenvironment\baselines\baselines\common\vec_env\shmem_vec_env.py", line 76, in
outs = [pipe.recv() for pipe in self.parent_pipes]
File "C:\Users\Libia\anaconda3\envs\rlenvironment\lib\multiprocessing\connection.py", line 250, in recv
buf = self._recv_bytes()
File "C:\Users\Libia\anaconda3\envs\rlenvironment\lib\multiprocessing\connection.py", line 321, in _recv_bytes
raise EOFError
EOFError

@tfboyd
Copy link
Member

tfboyd commented May 23, 2023

You are not going to like this answer and I am sorry. We do not really support windows. We moved to using Reverb as our ReplayBuffer and we only compile it for Linux. No one on the core team uses windows.

I think you might have the wrong project. This is tf-agents not baselines. Baselines is another project for RL. I don't use anaconda so I am not totally sue what you have installed. I am not sure what rlenviorment is based on 10 seconds of searching; but baselines is something I am familiar so that is my best guess. Those paths are also not something I am familiar with in this code base. I could be wrong; but that is my off-the-cuff.

@roeslib
Copy link
Author

roeslib commented May 23, 2023

Thank you for your comment, I solved my problem, it was a programming error with heritances of the class Monitor baselines.bench.Monitor. I solved my error and let the agent training and as result it could reach 10M steps. Currently I am working in Windows but I am moving to Ubuntu to use other RL frameworks.

You have reason, I made a mistake, this is the wrong project Questions and Answers Forum I should have choosen baselines. Should I delete my post? How can I do it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants