Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPOAgent + MaskSplitterNetwork normalizes Mask when observation normalization is turned on. #922

Open
BaLinuss opened this issue Mar 18, 2024 · 0 comments

Comments

@BaLinuss
Copy link

I found a possible bug/unwanted behaviour when I wanted to train a PPOAgent on TicTacToe with Masking.

In the file agents/ppo/ppo_policy.py on line 237, time_step is first normalized, this observation is then fed into the _actor_network. However, if the actual ActorDistributionNetwork is wrapped inside a MaskSplitterNetwork, the observation at this point in time, when the function below is called, contains the mask which leads to the mask being normalized. The fact, that the dtype of the mask is either int or bool, the normalization and rounding afterwards leads to wrong masks applied at the ActorDistributionNetwork.

In the case of my TicTacToe Environment, the masking was simply wrong as soon as one training-step was applied and the normalization function updated.

This should

  • either be properly documented if it is wanted behaviour (that you cannot use masks when observation normalization is turned on with the PPOAgent (which it is by default)),

  • or fixed, such that the mask is excluded from the normalization.

  def _apply_actor_network(self, time_step, policy_state, training=False):
    observation = time_step.observation
    if self._observation_normalizer:
      observation = self._observation_normalizer.normalize(observation)

    return self._actor_network(
        observation,
        time_step.step_type,
        network_state=policy_state,
        training=training,
    )
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant