Skip to content

💣 Bomberman deep reinforcement learning challenge in PyTorch

Notifications You must be signed in to change notification settings

eugene/pommerman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pommerman 💣

PyTorch based, reinforcement learning solution for the Pommerman competitions done as an exam project in course 02456 - Deep learning at DTU - Technical University of Denmark.

Our agent

Our agent always starts in the left upper corner.

Playing against 3 random agents

vs-3-random

Playing against 1 simple and 2 random agents

1simple1simple2random not_perfect

Playing against 3 simple agents

1simple3simple

🔧 Requirements

In addition to pytorch (https://pytorch.org) and the usual data science packages (numpy, matplotlib) this project depends on the Pommerman playground (https://github.com/MultiAgentLearning/playground) to be installed in your Python environment. Additionally, A2C script depends on the colorama package which helps with the rendering of the game in terminal (perfect for running on remote servers).

❗ DISCLAIMER

As this project uses the torch.multiprocessing package which is not compatible with jupyter notebook, the files to reproduce our results are suplied as individual python files. The guide for running these files is written below.

📈 Imitation learning

To start the imitation learnining, first place the log_simpleAgents_sequence_observe.py file in the path playground\pommerman\cli. Hereafter, place the file AA_RUN_LOG_SCRIPT.py in the playground folder and run it.

Observations from 10.000 games will now be collected and logged to three files in the pommerman folder. Once logging is complete, run the train_rnn_cnn.py to generate the trained imitation model.

Once the actor has been trained, the critic must also observe some games in order to learn to reward correctly before being allowed to affect the model. We do this by placing A3C_v10_cnn_lstm_train-critic.py and sharedAdam.py in the playground folder and running the the A3C_v10_cnn_lstm_train-critic.py file.

📈 A3C Model

To train the A3C model, place the A3C_v10_cnn_lstm.py and sharedAdam files in the pommerman folder and run the file A3C_v10_cnn_lstm.py. Inside the file you can specify a filename which will be used to save the checkpoint once the model has trained. This will also be used to load the checkpoint again if you wish to train further. The parameter MAX_EP specifies how many episodes to run before saving the checkpoint and terminating.

📉 A2C Model

To generate the convrnn-s.weights weights file (refreshed every 300 episodes):

python A2C/main.py train

To see how your agent plays (loads the convrnn-s.weights weights file and can be used while the training is running):

python A2C/main.py eval

During the training current gamma, running reward, action statistics and loss are printed after each episode. It takes around 48 hours to fully train this model (40000 episodes) on a modern 10 core CPU with a single 1080TI GPU. Additionally, a training.txt file is generated with the main statistics for each trained episode.

🔨 Model

The full model that is used for this project can be seen in the below image

fullmodel

📊 Main results

From the following figure, we see that with 40.000 episodes that A2C performas better than A3C

a2ca3c

Finally we have the reward for our architecture shown below

final_training

📄 Paper

See our paper for detailed information about the project.

👤 Credits

  • Special thanks to @dimatter for the provided computational ressources ❤️

Releases

No releases published

Packages

No packages published