Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPOTrainer errors with "_restore() takes 3 positional arguments but 4 were given" #191

Open
duncanldavis opened this issue May 19, 2022 · 2 comments

Comments

@duncanldavis
Copy link

Latest ray libraries via pip install on python 3.8

code breaking

trainer = PPOTrainer(config=config)
trainer.restore(checkpoint)

Error
RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=2452, ip=10.139.64.8, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7fc4f840ed60>)
At least one of the input arguments for this task could not be computed:
ray.exceptions.RaySystemError: System error: _restore() takes 3 positional arguments but 4 were given
traceback: Traceback (most recent call last):
File "/databricks/python/lib/python3.8/site-packages/ray/serialization.py", line 332, in deserialize_objects
obj = self._deserialize_object(data, metadata, object_ref)
File "/databricks/python/lib/python3.8/site-packages/ray/serialization.py", line 235, in _deserialize_object
return self._deserialize_msgpack_data(data, metadata_fields)
File "/databricks/python/lib/python3.8/site-packages/ray/serialization.py", line 190, in _deserialize_msgpack_data
python_objects = self._deserialize_pickle5_data(pickle5_data)
File "/databricks/python/lib/python3.8/site-packages/ray/serialization.py", line 180, in _deserialize_pickle5_data
obj = pickle.loads(in_band)
TypeError: _restore() takes 3 positional arguments but 4 were given

@duncanldavis
Copy link
Author

Ok, it is related to how the ray cluster is setup, when not connecting to the cluster via .init() the trainer works. Working through why everything else works but ppotrainer breaks.

@duncanldavis
Copy link
Author

When using num_workers: 0 PPOTrainer works but when it is 1+ I get the attached stack error
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant