GitHub - omkarv/pong-from-pixels: Training a Neural Network to play Pong from pixels

Introduction

This repo trains a Reinforcement Learning Neural Network so that it's able to play Pong from raw pixel input.

I've written up a blog post which walks through the code here and the basic principles of Reinforcement Learning, with Pong as the guiding example.

It is largely based on a Gist by Andrej Karpathy, which in turn is based on the Playing Atari with Deep Reinforcement Learning paper by Mnih et al.

This script uses the Open AI Gym environments in order to run the Atari emulator and environments, and currently uses no external ML framework & only numpy.

The AI Agent Pong in action

Prior to training (mostly random actions)

After training base repo + learning rate modification

The agent that played this game was trained for ~12000 episodes (basically 12000 episodes of 'best-of-21' rounds) over a period of ~ 15 hours, on a Macbook Pro 2018 with 2.6GHz i7 (6 cores). The running mean score per episode, over the trailing 100 episodes, at the point I stopped training was -5, i.e. the CPU would win each episode 21-16 on average.

Hyperparameters:

Default except for learning-rate 1e-3

After training base repo + learning rate modification + a bugfix

A minor fix was added which crops more of the image vs the base repo, by removing noisy parts of the image where we can safely ignore the ball motion. This boosted the observed performance and speed at which the AI beat the CPU on average (i.e. when the average reward for an episode exceeded 0)

Hyperparameters:

Default except for learning-rate 1e-3

The agent that played this game was trained for ~10000 episodes (basically 10000 episodes of 'best-of-21' rounds) over a period of ~ 13 hours, on a Macbook Pro 2018 with 2.6GHz i7 (6 cores). The running mean score per episode, over the trailing 100 episodes, at the point I stopped training was 2.5, i.e. the trained AI Agent would win each episode 21 points to 18.5.

Training for another 10 hours & another 5000 episodes allowed the trained AI Agent to reach a running mean score per epsisode of 5, i.e. the trained AI Agent would win each episode 21 points to 16.

Graph of reward over time - first 10000 episodes of training

Graph of reward over time - 10000 to 15000 episodes of training

Modifications vs Source Gist

Records output video of the play
Modified learning rate from 1e-4 to 1e-3
Comments for clarity
Minor fix which crops more of the image vs the base repo

Installation Requirements

The instructions below are for Mac OS & assume you have Homebrew installed.

You'll need to run the code with Python 2.7 - I recommend the use of conda to manage python environments
Install Open AI Gym brew install gym
Install Cmake brew install cmake
Install ffmpeg brew install ffmpeg - Required for monitoring / videos

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
experiment-output		experiment-output
LICENSE		LICENSE
README.md		README.md
pong-from-pixels.py		pong-from-pixels.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiment-output

experiment-output

LICENSE

LICENSE

README.md

README.md

pong-from-pixels.py

pong-from-pixels.py

Repository files navigation

Introduction

The AI Agent Pong in action

Prior to training (mostly random actions)

After training base repo + learning rate modification

After training base repo + learning rate modification + a bugfix

Modifications vs Source Gist

Installation Requirements

About

Releases

Packages

Languages

License

omkarv/pong-from-pixels

Folders and files

Latest commit

History

Repository files navigation

Introduction

The AI Agent Pong in action

Prior to training (mostly random actions)

After training base repo + learning rate modification

After training base repo + learning rate modification + a bugfix

Modifications vs Source Gist

Installation Requirements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages