Pong with Policy Gradients 🔨👷

Code for an intro to RL workshop. You'll be training a simple RL agent to play pong using vanilla policy gradients 😮💯

Adapted from http://karpathy.github.io/2016/05/31/rl/ and rewritten with PyTorch.

Accompanying slides are here.

Trained RL agent (green paddle) vs ball-tracking AI (tan paddle).

Instructions

👩‍🏫 🗣 There are five ### TODO: statements where you'll need to fill in short pieces of code (no longer than a few lines) defining the policy network and calculating the policy gradients.

It takes a few hours to converge, but you should see some improvement within a few minutes. If not, you probably have a bug. Check terminal output and make use of TensorBoard training graphs 📈

Solution and trained network in solution (spoiler alert!) folder - but try to do it yourself first! You got this 🤠

Setup

Make sure you have a working Python >= 3.5 installation. Also make sure it is 64-bit. You can see what version you have if you just run python's interactive prompt.

Install virtualenv and create a new virtual environment:

On macOS and Linux:

python3 -m pip install --user virtualenv
python3 -m venv env
source env/bin/activate

On Windows:

python -m pip install --user virtualenv
python -m venv env
.\env\Scripts\activate

(P.S. you can leave the virtual environment by entering deactivate into the terminal when you're done)

Install dependencies:

Then, just install the requirements

pip install -r requirements.txt

Note: on Windows pytorch may fail to install through the above command, and you then need to install manually with

pip install torch==1.6.0+cpu torchvision==0.7.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

See the pytorch website for more details.

Running the Code

To run it yourself:

$ python pong.py [--render]

where --render is an optional flag that renders pong games and slows them down to a watchable speed.

To test:

$ python test.py

(the tests are a helpful guide, but only check the policy network, calculating discounted rewards, and don't guarantee correctness!)

To view TensorBoard visualizations during training, open a separate terminal, activate the virtualenv, run

$ tensorboard --logdir tensorboard_logs

and visit http://localhost:6006/.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
solution (spoiler alert!)		solution (spoiler alert!)
.gitignore		.gitignore
Readme.md		Readme.md
gameplay.gif		gameplay.gif
pong.py		pong.py
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

solution (spoiler alert!)

solution (spoiler alert!)

.gitignore

.gitignore

Readme.md

Readme.md

gameplay.gif

gameplay.gif

pong.py

pong.py

requirements.txt

requirements.txt

test.py

test.py

Repository files navigation

Pong with Policy Gradients 🔨👷

Instructions

Setup

Running the Code

About

Releases

Packages

Contributors 3

Languages

stewy33/pong-with-policy-gradients

Folders and files

Latest commit

History

Repository files navigation

Pong with Policy Gradients 🔨👷

Instructions

Setup

Running the Code

About

Topics

Resources

Stars

Watchers

Forks

Languages