Skip to content

Playing Mountain-Car without reward engineering, by combining DQN and Random Network Distillation (RND)

Notifications You must be signed in to change notification settings

orrivlin/MountainCar_DQN_RND

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MountainCar_DQN_RND

Playing Mountain-Car without reward engineering, by combining DQN and Random Network Distillation (RND).

This project contains a simple PyTorch implementation of DQN [1] for playing Mountain-Car. Mountain-Car is a classic control game in which a car must swing back and forth in order to reach the flag on top of the mountain. Unlike other classic problems like CartPole, the reward in Mountain-Car is sparse as positive feedback is only given upon reaching the flag, which is unlikely with random actions. Usually this is circumvented by engineering the reward signal in order to get a smoother learning process, by using the velocity or position increments as additional rewards, but for general problems, this may not be easy to do, as it requires some knowledge about how to solve the problem.

This project opts for a methodical way of exploration, that is not domain specific, by using Exploration by Random Network Distillation [2], a method developed by OpenAI researchers for hard exploration games like Montezuma's Revenge. This method trains a neural network to try and predict the outputs of a different random neural network, and the prediction error is added to the true reward signal. By updating the network, what we gain is a measure of "familiarity" with game states, encouraging our RL algorithm to explore those states that we are not familiar with and eventually discovering positive feedbacks.

This plot shows accumulated returns against episode:

Alt text

And this one shows the augmented returns, after adding the exploration bonus given by RND:

Alt text

I wrote a MEDIUM post about Random Network Distillation that explains the rationale behind this method in more detail.

  1. Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).
  2. Burda, Yuri, et al. "Exploration by random network distillation." arXiv preprint arXiv:1810.12894 (2018).

About

Playing Mountain-Car without reward engineering, by combining DQN and Random Network Distillation (RND)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages