Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Index Error with GPU #45418

Closed
dkunz49 opened this issue May 17, 2024 · 2 comments
Closed

[RLlib] Index Error with GPU #45418

dkunz49 opened this issue May 17, 2024 · 2 comments
Labels
bug Something that is supposed to be working; but isn't rllib RLlib related issues rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack

Comments

@dkunz49
Copy link

dkunz49 commented May 17, 2024

What happened + What you expected to happen

When running a PPO training session with a single GPU, an index error occurs in /torch_policy_v2.py (see traceback below). The error always seems to occur in the fifth iteration, if that helps I've seen similar reports this from several years ago. In those cases the error only seemed to occur when there was only one GPU. If that were the case, I would think that it would be fixed by now.

Traceback (most recent call last):
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/dkunz/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 39, in
cli.main()
File "/home/dkunz/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/home/dkunz/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
runpy.run_path(target, run_name="main")
File "/home/dkunz/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
return _run_module_code(code, init_globals, run_name,
File "/home/dkunz/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/home/dkunz/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "/home/dkunz/python3/gymnasium/gymCopter/gymCopterPPO.py", line 46, in
results = algo.train()
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 331, in train
raise skipped from exception_cause(skipped)
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 328, in train
result = self.step()
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 873, in step
train_results, train_iter_ctx = self._run_one_training_iteration()
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 3156, in _run_one_training_iteration
results = self.training_step()
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 428, in training_step
return self._training_step_old_and_hybrid_api_stacks()
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 587, in _training_step_old_and_hybrid_api_stacks
train_results = multi_gpu_train_one_step(self, train_batch)
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/site-packages/ray/rllib/execution/train_ops.py", line 152, in multi_gpu_train_one_step
num_loaded_samples[policy_id] = local_worker.policy_map[
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 802, in load_batch_into_buffer
return len(slices[0])
IndexError: list index out of range

Versions / Dependencies

Ubuntu 22.04 LTS
Python 3.10.14
Ray 2.22.0
PyTorch 2.3.o

Reproduction script

I don't have a short script, but here's the PPO script that I'm using. The environment is quite complex, and that could be the problem, but it mostly seems to be running satisfactorily.

"""
gymCopterPPO.py
"""

Third-party imports

import ray
from ray.tune.registry import register_env
from ray.rllib.algorithms.ppo import PPOConfig

Local application imports

from gymCopter import GymCopter as gymCopterEnv

def env_creator(env_config):
"""
Function required to register gymnasium environments
"""
gc = gymCopterEnv(env_config)
gc.copter.auxdata['EngOper'] = False
return gc

#################################################################

if name == 'main':
ray.init()
register_env('gymCopterEnv', env_creator)
# Configure PPO
ppo_config = PPOConfig()
ppo_config.training(gamma=0.9447)
ppo_config.training(lr=5.0e-5)
ppo_config.training(lambda_=0.9556)
ppo_config.training(train_batch_size=5000)
ppo_config.training(model={'fcnet_hiddens': [256, 256]})
ppo_config.environment(env='gymCopterEnv')
ppo_config.environment(env_config={'hbar0': None, 'vbar0': None, 'EngOper': False, 'render_mode': None})
ppo_config.framework(framework='torch')
ppo_config.rollouts(num_rollout_workers=8)
ppo_config.debugging(log_level='ERROR')
ppo_config.resources(num_gpus=0.1)
ppo_config.resources(num_cpus_per_worker=2)
ppo_config.resources(num_gpus_per_worker=0.1)
# Build an algorithm from from the configuration
algo = ppo_config.build()
# Train for n iterations and report results
max_reward_mean = -1.0
for n in range(10):
results = algo.train()
episode_reward_mean = results['episode_reward_mean']
if episode_reward_mean > max_reward_mean:
max_reward_mean = episode_reward_mean
print(f'n = {n}: Episode Mean Reward = {episode_reward_mean}; Max Mean Reward = {max_reward_mean}')
ray.shutdown()

Issue Severity

High: It blocks me from completing my task.

@dkunz49 dkunz49 added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels May 17, 2024
@dkunz49
Copy link
Author

dkunz49 commented May 17, 2024

I'm pretty new to RLlib, so it's likely that the issue is my problem, and that I don't know enough to identify it.

@anyscalesam anyscalesam added the rllib RLlib related issues label May 20, 2024
@simonsays1980 simonsays1980 added rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels May 21, 2024
@dkunz49 dkunz49 closed this as completed May 22, 2024
@dkunz49
Copy link
Author

dkunz49 commented May 22, 2024

Discovered problems with gymCopter and the conda environment. After fixing those, the problems seems to be solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't rllib RLlib related issues rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack
Projects
None yet
Development

No branches or pull requests

3 participants