[RLlib] Index Error with GPU #45418
Labels
bug
Something that is supposed to be working; but isn't
rllib
RLlib related issues
rllib-oldstack-cleanup
Issues related to cleaning up classes, utilities on the old API stack
What happened + What you expected to happen
When running a PPO training session with a single GPU, an index error occurs in /torch_policy_v2.py (see traceback below). The error always seems to occur in the fifth iteration, if that helps I've seen similar reports this from several years ago. In those cases the error only seemed to occur when there was only one GPU. If that were the case, I would think that it would be fixed by now.
Traceback (most recent call last):
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/dkunz/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 39, in
cli.main()
File "/home/dkunz/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/home/dkunz/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
runpy.run_path(target, run_name="main")
File "/home/dkunz/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
return _run_module_code(code, init_globals, run_name,
File "/home/dkunz/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/home/dkunz/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "/home/dkunz/python3/gymnasium/gymCopter/gymCopterPPO.py", line 46, in
results = algo.train()
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 331, in train
raise skipped from exception_cause(skipped)
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 328, in train
result = self.step()
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 873, in step
train_results, train_iter_ctx = self._run_one_training_iteration()
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 3156, in _run_one_training_iteration
results = self.training_step()
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 428, in training_step
return self._training_step_old_and_hybrid_api_stacks()
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 587, in _training_step_old_and_hybrid_api_stacks
train_results = multi_gpu_train_one_step(self, train_batch)
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/site-packages/ray/rllib/execution/train_ops.py", line 152, in multi_gpu_train_one_step
num_loaded_samples[policy_id] = local_worker.policy_map[
File "/home/dkunz/anaconda3/envs/envRL/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 802, in load_batch_into_buffer
return len(slices[0])
IndexError: list index out of range
Versions / Dependencies
Ubuntu 22.04 LTS
Python 3.10.14
Ray 2.22.0
PyTorch 2.3.o
Reproduction script
I don't have a short script, but here's the PPO script that I'm using. The environment is quite complex, and that could be the problem, but it mostly seems to be running satisfactorily.
"""
gymCopterPPO.py
"""
Third-party imports
import ray
from ray.tune.registry import register_env
from ray.rllib.algorithms.ppo import PPOConfig
Local application imports
from gymCopter import GymCopter as gymCopterEnv
def env_creator(env_config):
"""
Function required to register gymnasium environments
"""
gc = gymCopterEnv(env_config)
gc.copter.auxdata['EngOper'] = False
return gc
#################################################################
if name == 'main':
ray.init()
register_env('gymCopterEnv', env_creator)
# Configure PPO
ppo_config = PPOConfig()
ppo_config.training(gamma=0.9447)
ppo_config.training(lr=5.0e-5)
ppo_config.training(lambda_=0.9556)
ppo_config.training(train_batch_size=5000)
ppo_config.training(model={'fcnet_hiddens': [256, 256]})
ppo_config.environment(env='gymCopterEnv')
ppo_config.environment(env_config={'hbar0': None, 'vbar0': None, 'EngOper': False, 'render_mode': None})
ppo_config.framework(framework='torch')
ppo_config.rollouts(num_rollout_workers=8)
ppo_config.debugging(log_level='ERROR')
ppo_config.resources(num_gpus=0.1)
ppo_config.resources(num_cpus_per_worker=2)
ppo_config.resources(num_gpus_per_worker=0.1)
# Build an algorithm from from the configuration
algo = ppo_config.build()
# Train for n iterations and report results
max_reward_mean = -1.0
for n in range(10):
results = algo.train()
episode_reward_mean = results['episode_reward_mean']
if episode_reward_mean > max_reward_mean:
max_reward_mean = episode_reward_mean
print(f'n = {n}: Episode Mean Reward = {episode_reward_mean}; Max Mean Reward = {max_reward_mean}')
ray.shutdown()
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: