Is this reward function good for competition evaluation? #201

luckeciano · 2019-09-10T13:10:33Z

Hey guys,

I would like to add a concern regarding the reward function.

After some analysis, I think it can be easily exploited for controllers that does not walk. Basically, the positive reward comes from the alive bonus and from footstep duration. An agent can just perform footsteps with no pelvis velocity (maintaining its initial position), or even just perform a long footstep from the beginning of the episode until the end without changing its position. In this way, the penalization is very low (the effort is low and there is no penalization from deviation because in the initial position Vtgt is a null vector).

As the objective of the competition is to learn to effectively walk following the navigation field, I think the reward function should be modified. My first thought is to add another factor that reinforces the idea of move. What do you guys think?

smsong · 2019-09-11T20:20:27Z

@luckeciano May you elaborate on v_tgt being null at the initial position? How did you get this null vector?

luckeciano · 2019-09-12T01:03:32Z

Hey @smsong,

Actually, I commited a mistake. The v_tgt is not null at initial position (I saw a point in the map, but there is an arrow as well). I'm sorry.

However, I printed the components from footstep reward and in this situation, the penalization is very low when compared with the total reward of just give a long footstep during the episode. In one of my tests, my agent did a single footstep, obtaining 47 of reward, losing only ~10 from effort and velocity deviation.

Therefore, it is possible to obtain almost all the possible reward without leaving the initial position. I think the reward should be modified - at least the weights. Otherwise, there is a possibility of top submissions without any walk motion.

smsong · 2019-09-12T03:55:15Z

@luckeciano Thanks for the clarification and suggestion.
However, if a network exploits this single footstep solution you've mentioned, it would probably get stuck at local minima and will not be able to compete with good solutions. And it is possible that some participants already got around this issue by using different rewards to first train a good network then fine-tune for the given reward. So it may be unfair to change the reward at this point. A systematic investigation on rewards that facilitate training can be an interesting study ;)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is this reward function good for competition evaluation? #201

Is this reward function good for competition evaluation? #201

luckeciano commented Sep 10, 2019

smsong commented Sep 11, 2019

luckeciano commented Sep 12, 2019

smsong commented Sep 12, 2019

Is this reward function good for competition evaluation? #201

Is this reward function good for competition evaluation? #201

Comments

luckeciano commented Sep 10, 2019

smsong commented Sep 11, 2019

luckeciano commented Sep 12, 2019

smsong commented Sep 12, 2019