Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this reward function good for competition evaluation? #201

Open
luckeciano opened this issue Sep 10, 2019 · 3 comments
Open

Is this reward function good for competition evaluation? #201

luckeciano opened this issue Sep 10, 2019 · 3 comments

Comments

@luckeciano
Copy link

Hey guys,

I would like to add a concern regarding the reward function.

After some analysis, I think it can be easily exploited for controllers that does not walk. Basically, the positive reward comes from the alive bonus and from footstep duration. An agent can just perform footsteps with no pelvis velocity (maintaining its initial position), or even just perform a long footstep from the beginning of the episode until the end without changing its position. In this way, the penalization is very low (the effort is low and there is no penalization from deviation because in the initial position Vtgt is a null vector).

As the objective of the competition is to learn to effectively walk following the navigation field, I think the reward function should be modified. My first thought is to add another factor that reinforces the idea of move. What do you guys think?

@smsong
Copy link
Collaborator

smsong commented Sep 11, 2019

@luckeciano May you elaborate on v_tgt being null at the initial position? How did you get this null vector?

@luckeciano
Copy link
Author

Hey @smsong,

Actually, I commited a mistake. The v_tgt is not null at initial position (I saw a point in the map, but there is an arrow as well). I'm sorry.

However, I printed the components from footstep reward and in this situation, the penalization is very low when compared with the total reward of just give a long footstep during the episode. In one of my tests, my agent did a single footstep, obtaining 47 of reward, losing only ~10 from effort and velocity deviation.

Therefore, it is possible to obtain almost all the possible reward without leaving the initial position. I think the reward should be modified - at least the weights. Otherwise, there is a possibility of top submissions without any walk motion.

@smsong
Copy link
Collaborator

smsong commented Sep 12, 2019

@luckeciano Thanks for the clarification and suggestion.
However, if a network exploits this single footstep solution you've mentioned, it would probably get stuck at local minima and will not be able to compete with good solutions. And it is possible that some participants already got around this issue by using different rewards to first train a good network then fine-tune for the given reward. So it may be unfair to change the reward at this point. A systematic investigation on rewards that facilitate training can be an interesting study ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants