RLLIB QMIX example does not work #12

josjo80 · 2019-07-05T16:41:39Z

There appears to be a problem when using a masked action space with the QMIX algorithm. I think the qmix_policy_graph expects there to be at least one valid action at all times.

Full traceback is below:

  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 446, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 316, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/worker.py", line 2197, in get
    raise value
ray.exceptions.RayTaskError: ^[[36mray_QMixTrainer:train()^[[39m (pid=25398, host=cassini)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 354, in train
    raise e
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 340, in train
    result = Trainable.train(self)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train
    result = self._train()
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/agents/dqn/dqn.py", line 242, in _train
    self.optimizer.step()
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/optimizers/sync_batch_replay_optimizer.py", line 84, in step
    return self._optimize()
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/optimizers/sync_batch_replay_optimizer.py", line 108, in _optimize
    info_dict = self.local_evaluator.learn_on_batch(samples)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 581, in learn_on_batch
    info_out[pid] = policy.learn_on_batch(batch)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/agents/qmix/qmix_policy_graph.py", line 296, in learn_on_batch
    next_obs, action_mask, next_action_mask)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/johnson/miniconda3/envs/rlenv/lib/python3.6/site-packages/ray/rllib/agents/qmix/qmix_policy_graph.py", line 108, in forward
    there may be a state with no valid actions."
AssertionError: target_max_qvals contains a masked action;             there may be a state with no valid actions.```

The text was updated successfully, but these errors were encountered:

richardliaw · 2019-07-11T15:38:24Z

cc @ericl

EC2EZ4RD · 2019-11-06T07:25:12Z

I found this problem mainly from enabling self.double.q, if you set self.double.q=False in the default config, then qmix can run.

plutonic88 · 2020-12-14T22:20:28Z

I was able to run the RLlib's QMIX in the Starcraft2 env. However, the policy does not converge.
Any suggestion is appreciated.

EC2EZ4RD · 2020-12-15T15:20:56Z

I forgot how to make it converge. I recommend you to use pymarl instead of rllib if you want to explore some research ideas.

xiaoToby · 2022-08-11T10:07:29Z

I found this problem mainly from enabling self.double.q, if you set self.double.q=False in the default config, then qmix can run.

Hi, where to set self.double.q=False
And when I run this example , the error log is below:
(RolloutWorker pid=44372) ray::RolloutWorker.init() (pid=44372, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000002577D997BB0>)
(RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 658, in ray._raylet.execute_task
(RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 699, in ray._raylet.execute_task
(RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 665, in ray._raylet.execute_task
(RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 669, in ray._raylet.execute_task
(RolloutWorker pid=44372) File "python\ray_raylet.pyx", line 616, in ray._raylet.execute_task.function_executor
(RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray_private\function_manager.py", line 675, in actor_method_executor
(RolloutWorker pid=44372) return method(__ray_actor, *args, **kwargs)
(RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
(RolloutWorker pid=44372) return method(self, *_args, **_kwargs)
(RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 511, in init
(RolloutWorker pid=44372) check_env(self.env)
(RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\utils\pre_checks\env.py", line 78, in check_env
(RolloutWorker pid=44372) raise ValueError(
(RolloutWorker pid=44372) ValueError: Traceback (most recent call last):
(RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\utils\pre_checks\env.py", line 65, in check_env
(RolloutWorker pid=44372) check_multiagent_environments(env)
(RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\utils\pre_checks\env.py", line 268, in check_multiagent_environments
(RolloutWorker pid=44372) next_obs, reward, done, info = env.step(sampled_action)
(RolloutWorker pid=44372) File "C:\conda\envs\smac\lib\site-packages\ray\rllib\env\wrappers\group_agents_wrapper.py", line 76, in step
(RolloutWorker pid=44372) obs, rewards, dones, infos = self.env.step(action_dict)
(RolloutWorker pid=44372) File "C:\conda\envs\smac\smac\smac\examples\rllib\env.py", line 82, in step
(RolloutWorker pid=44372) raise ValueError(
(RolloutWorker pid=44372) ValueError: You must supply an action for agent: 0

How to fix it , and run well? Thanks
@EC2EZ4RD

MichaelXCChen · 2023-01-13T22:53:47Z

@xiaoToby One key difference between the default PyMARL implementation and rllib implementation of QMIX for SMAC is that PyMARL uses the true overall global state but rllib only uses the per-agent observation as the global state in the monotonic mixing network. So you'd need to modify the default implementation and extract the true global state from the environment so that the mixing network can use it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RLLIB QMIX example does not work #12

RLLIB QMIX example does not work #12

josjo80 commented Jul 5, 2019

richardliaw commented Jul 11, 2019

EC2EZ4RD commented Nov 6, 2019

plutonic88 commented Dec 14, 2020

EC2EZ4RD commented Dec 15, 2020

xiaoToby commented Aug 11, 2022

MichaelXCChen commented Jan 13, 2023

RLLIB QMIX example does not work #12

RLLIB QMIX example does not work #12

Comments

josjo80 commented Jul 5, 2019

richardliaw commented Jul 11, 2019

EC2EZ4RD commented Nov 6, 2019

plutonic88 commented Dec 14, 2020

EC2EZ4RD commented Dec 15, 2020

xiaoToby commented Aug 11, 2022

MichaelXCChen commented Jan 13, 2023