[Bug]: producing NAN values during training in MaskablePPO #221

vahidqo · 2023-12-14T02:13:23Z

🐛 Bug

During training, nan values are produced by the algorithm. These nan values are produced in the neural network. I found several ideas in issues that were proposed, I tried all of them but still got the error. The solutions were: changing np.float64 to np.float32, which doesn't work. Using use_expln=True, which MaskablePPO doesn't have. I also changed the model's parameter such as gamma, but still got the same error. Tried to decrease learning rate that again faced the error

To Reproduce

class custom(gym.Env):
.
.
.
env = custom()
def mask_fn(env: gym.Env) -> List[bool]:
    return env.valid_action_mask()
env = ActionMasker(env, mask_fn)
model = MaskablePPO(MaskableActorCriticPolicy, env, gamma=0.001, verbose=0)
checkpoint_callback = CheckpointCallback(save_freq=10000, save_path='logs',
                                         name_prefix='rl_model')
model.learn(500000, callback=checkpoint_callback)
model.save("JOM")

Relevant log output / Error message

ValueError                                Traceback (most recent call last)
<ipython-input-9-abee064644f3> in <cell line: 3>()
      1 checkpoint_callback = CheckpointCallback(save_freq=10000, save_path='logs',
      2                                          name_prefix='rl_model')
----> 3 model.learn(500000, callback=checkpoint_callback)
      4 model.save("JOM")

8 frames
/usr/local/lib/python3.10/dist-packages/sb3_contrib/ppo_mask/ppo_mask.py in learn(self, total_timesteps, callback, log_interval, tb_log_name, reset_num_timesteps, use_masking, progress_bar)
    545                 self.logger.dump(step=self.num_timesteps)
    546 
--> 547             self.train()
    548 
    549         callback.on_training_end()

/usr/local/lib/python3.10/dist-packages/sb3_contrib/ppo_mask/ppo_mask.py in train(self)
    410                     actions = rollout_data.actions.long().flatten()
    411 
--> 412                 values, log_prob, entropy = self.policy.evaluate_actions(
    413                     rollout_data.observations,
    414                     actions,

/usr/local/lib/python3.10/dist-packages/sb3_contrib/common/maskable/policies.py in evaluate_actions(self, obs, actions, action_masks)
    331             latent_vf = self.mlp_extractor.forward_critic(vf_features)
    332 
--> 333         distribution = self._get_action_dist_from_latent(latent_pi)
    334         if action_masks is not None:
    335             distribution.apply_masking(action_masks)

/usr/local/lib/python3.10/dist-packages/sb3_contrib/common/maskable/policies.py in _get_action_dist_from_latent(self, latent_pi)
    244         """
    245         action_logits = self.action_net(latent_pi)
--> 246         return self.action_dist.proba_distribution(action_logits=action_logits)
    247 
    248     def _predict(

/usr/local/lib/python3.10/dist-packages/sb3_contrib/common/maskable/distributions.py in proba_distribution(self, action_logits)
    192         reshaped_logits = action_logits.view(-1, sum(self.action_dims))
    193 
--> 194         self.distributions = [
    195             MaskableCategorical(logits=split) for split in th.split(reshaped_logits, tuple(self.action_dims), dim=1)
    196         ]

/usr/local/lib/python3.10/dist-packages/sb3_contrib/common/maskable/distributions.py in <listcomp>(.0)
    193 
    194         self.distributions = [
--> 195             MaskableCategorical(logits=split) for split in th.split(reshaped_logits, tuple(self.action_dims), dim=1)
    196         ]
    197         return self

/usr/local/lib/python3.10/dist-packages/sb3_contrib/common/maskable/distributions.py in __init__(self, probs, logits, validate_args, masks)
     40     ):
     41         self.masks: Optional[th.Tensor] = None
---> 42         super().__init__(probs, logits, validate_args)
     43         self._original_logits = self.logits
     44         self.apply_masking(masks)

/usr/local/lib/python3.10/dist-packages/torch/distributions/categorical.py in __init__(self, probs, logits, validate_args)
     68             self._param.size()[:-1] if self._param.ndimension() > 1 else torch.Size()
     69         )
---> 70         super().__init__(batch_shape, validate_args=validate_args)
     71 
     72     def expand(self, batch_shape, _instance=None):

/usr/local/lib/python3.10/dist-packages/torch/distributions/distribution.py in __init__(self, batch_shape, event_shape, validate_args)
     66                 valid = constraint.check(value)
     67                 if not valid.all():
---> 68                     raise ValueError(
     69                         f"Expected parameter {param} "
     70                         f"({type(value).__name__} of shape {tuple(value.shape)}) "

ValueError: Expected parameter logits (Tensor of shape (64, 2)) of distribution MaskableCategorical(logits: torch.Size([64, 2])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan]], grad_fn=<SubBackward0>)

System Info

No response

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
I have provided a minimal and working example to reproduce the bug
I've used the markdown code blocks for both code and stack traces.

vahidqo · 2024-01-09T20:15:52Z

Hi,

Could you please let me know if this is my code problem or the package problem? @araffin

Thank you

vahidqo · 2024-01-11T21:28:04Z

@araffin Thank you for your response.
Could you please explain what you mean by "more information"? Should I post all the environment code?

vahidqo · 2024-01-13T17:29:55Z

The detailed error is: @araffin

An error occurred during training: Function 'MseLossBackward0' returned nan values in its 1th output.
C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\torch\autograd\__init__.py:200: UserWarning: Error detected in MseLossBackward0. Traceback of forward call that caused the error:
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\ipykernel_launcher.py", line 17, in <module>
    app.launch_new_instance()
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\traitlets\config\application.py", line 1046, in launch_instance
    app.start()
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\ipykernel\kernelapp.py", line 736, in start
    self.io_loop.start()
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\tornado\platform\asyncio.py", line 195, in start
    self.asyncio_loop.run_forever()
  File "C:\Program Files\Python311\Lib\asyncio\base_events.py", line 607, in run_forever
    self._run_once()
  File "C:\Program Files\Python311\Lib\asyncio\base_events.py", line 1922, in _run_once
    handle._run()
  File "C:\Program Files\Python311\Lib\asyncio\events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\ipykernel\kernelbase.py", line 516, in dispatch_queue
    await self.process_one()
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\ipykernel\kernelbase.py", line 505, in process_one
    await dispatch(*args)
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\ipykernel\kernelbase.py", line 412, in dispatch_shell
    await result
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\ipykernel\kernelbase.py", line 740, in execute_request
    reply_content = await reply_content
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\ipykernel\ipkernel.py", line 422, in do_execute
    res = shell.run_cell(
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\ipykernel\zmqshell.py", line 546, in run_cell
    return super().run_cell(*args, **kwargs)
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\IPython\core\interactiveshell.py", line 3024, in run_cell
    result = self._run_cell(
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\IPython\core\interactiveshell.py", line 3079, in _run_cell
    result = runner(coro)
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\IPython\core\async_helpers.py", line 129, in _pseudo_sync_runner
    coro.send(None)
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\IPython\core\interactiveshell.py", line 3284, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\IPython\core\interactiveshell.py", line 3466, in run_ast_nodes
    if await self.run_code(code, result, async_=asy):
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\IPython\core\interactiveshell.py", line 3526, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "C:\Users\12368\AppData\Local\Temp\ipykernel_23684\999724894.py", line 2, in <module>
    model.learn(1000000, callback=checkpoint_callback)
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\sb3_contrib\ppo_mask\ppo_mask.py", line 547, in learn
    self.train()
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\sb3_contrib\ppo_mask\ppo_mask.py", line 447, in train
    value_loss = F.mse_loss(rollout_data.returns, values_pred)
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\torch\nn\functional.py", line 3295, in mse_loss
    return torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
 (Triggered internally at ..\torch\csrc\autograd\python_anomaly_mode.cpp:119.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

vahidqo · 2024-01-15T18:38:28Z

@araffin More info if that helps:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[72], line 2
      1 try:
----> 2     model.learn(1000000)
      3 except (AssertionError, ValueError) as e:
      4     print("An error occurred during training:", e)

File ~\AppData\Roaming\Python\Python311\site-packages\sb3_contrib\ppo_mask\ppo_mask.py:547, in MaskablePPO.learn(self, total_timesteps, callback, log_interval, tb_log_name, reset_num_timesteps, use_masking, progress_bar)
    544         self.logger.record("time/total_timesteps", self.num_timesteps, exclude="tensorboard")
    545         self.logger.dump(step=self.num_timesteps)
--> 547     self.train()
    549 callback.on_training_end()
    551 return self

File ~\AppData\Roaming\Python\Python311\site-packages\sb3_contrib\ppo_mask\ppo_mask.py:478, in MaskablePPO.train(self)
    476 # Optimization step
    477 self.policy.optimizer.zero_grad()
--> 478 loss.backward()
    479 # Clip grad norm
    480 th.nn.utils.clip_grad_norm_(self.policy.parameters(), self.max_grad_norm)

File ~\AppData\Roaming\Python\Python311\site-packages\torch\_tensor.py:487, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    477 if has_torch_function_unary(self):
    478     return handle_torch_function(
    479         Tensor.backward,
    480         (self,),
   (...)
    485         inputs=inputs,
    486     )
--> 487 torch.autograd.backward(
    488     self, gradient, retain_graph, create_graph, inputs=inputs
    489 )

File ~\AppData\Roaming\Python\Python311\site-packages\torch\autograd\__init__.py:200, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    195     retain_graph = create_graph
    197 # The reason we repeat same the comment below is that
    198 # some Python versions print out the first line of a multi-line function
    199 # calls in the traceback and some print out the last line
--> 200 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    201     tensors, grad_tensors_, retain_graph, create_graph, inputs,
    202     allow_unreachable=True, accumulate_grad=True)

RuntimeError: Function 'MseLossBackward0' returned nan values in its 1th output.

araffin · 2024-01-16T09:06:19Z

Might be a duplicate of #81 or #195
Probably a combination from your env/hyperparameters.

Please note that we do not offer tech support, see #81 (comment)

vahidqo added the bug Something isn't working label Dec 14, 2023

araffin added more information needed Please fill the issue template completely custom gym env Issue related to Custom Gym Env No tech support We do not do tech support labels Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: producing NAN values during training in MaskablePPO #221

[Bug]: producing NAN values during training in MaskablePPO #221

vahidqo commented Dec 14, 2023 •

edited

vahidqo commented Jan 9, 2024

vahidqo commented Jan 11, 2024

vahidqo commented Jan 13, 2024 •

edited

vahidqo commented Jan 15, 2024

araffin commented Jan 16, 2024

[Bug]: producing NAN values during training in MaskablePPO #221

[Bug]: producing NAN values during training in MaskablePPO #221

Comments

vahidqo commented Dec 14, 2023 • edited

🐛 Bug

To Reproduce

Relevant log output / Error message

System Info

Checklist

vahidqo commented Jan 9, 2024

vahidqo commented Jan 11, 2024

vahidqo commented Jan 13, 2024 • edited

vahidqo commented Jan 15, 2024

araffin commented Jan 16, 2024

vahidqo commented Dec 14, 2023 •

edited

vahidqo commented Jan 13, 2024 •

edited