Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrease in reward during training with MaskablePPO #207

Open
4 tasks done
vahidqo opened this issue Sep 1, 2023 · 0 comments
Open
4 tasks done

Decrease in reward during training with MaskablePPO #207

vahidqo opened this issue Sep 1, 2023 · 0 comments
Labels
custom gym env Issue related to Custom Gym Env more information needed Please fill the issue template completely question Further information is requested

Comments

@vahidqo
Copy link

vahidqo commented Sep 1, 2023

❓ Question

Hi,

During training in a custom environment with MaskablePPO, the reward decreased and then converged. Is there any specific reason? It means the algorithm has found a better policy but is outputting another one?
image

My environment has two normalized rewards that will be weighted sum to measure the final reward. I have 19 timestep and my gamma was set to 0.001.

class customenv(gym.Env):....
env = customenv()
env = ActionMasker(env, mask_fn)
model = MaskablePPO(MaskableActorCriticPolicy, env, gamma = 0.0001, verbose=0)
model.learn(4000000)

Thank you!

Checklist

@vahidqo vahidqo added the question Further information is requested label Sep 1, 2023
@araffin araffin added more information needed Please fill the issue template completely custom gym env Issue related to Custom Gym Env labels Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
custom gym env Issue related to Custom Gym Env more information needed Please fill the issue template completely question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants