Multi-Agent Cooperation in Sequential Social Dilemmas

This project was completed as part of the senior requirement for Yale Computer Science. Spring 2019. More Information

This work is an implementation and exploration of current work in Multiagent Reinforcement Learning (MARL). It is highly recommended that you read the following two papers before diving in.

Quick Start

Switch to your virtual env
pip install -r requirements.txt
python train.py
python test.py ~/ray_results/prison_A3C/[training_instance]/ [checkpoint_num]

Training results are usually saved in your ray_results directory located in the root directory

Environments

Pycolab provides the abstraction for creating environments. Although this repository includes three environments, only the PrisonEnvironment has been fully developed and tested.

The PrisonEnvironment instantiates a gridworld variant of the classic Prisoner's Dilemma. At each step of the game, both agents independently choose to move left, move right, or stay still. The left side of the board represents full defection and the right side of the board represents full cooperation. Intermediate positions are a linear combination of the extremes. Rewards are distributed every 10 timesteps of the game. The figure below shows the corresponding rewards for four primary states of the game.

python play.py allows you to quickly run a manual version of the game. The script is extremely helpful when debugging the environment alone.

Learning

Reinforcement Learning is handled by RLLib. Currently all training is done using the A3C algorithm.

Unresolved Issues

When initializing A3C agents in test.py, asynchronous changes to the environment mess with the game visualization. One potential solution is to wait until the interactions with the environment have finished before starting the game. This only takes a few seconds. The better solution would be to fix the issue and submit a PR!

Related Works

Leibo, J. Z., Zambaldi, V., Lanctot, M., Marecki, J., & Graepel, T. (2017). Multi-agent reinforcement learning in sequential social dilemmas. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems (pp. 464-473).
Hughes, E., Leibo, J. Z., Phillips, M., Tuyls, K., Dueñez-Guzman, E., Castañeda, A. G., Dunning, I., Zhu, T., McKee, K., Koster, R., Tina Zhu, Roff, H., Graepel, T. (2018). Inequity aversion improves cooperation in intertemporal social dilemmas. In Advances in Neural Information Processing Systems (pp. 3330-3340).
Jaques, N., Lazaridou, A., Hughes, E., Gulcehre, C., Ortega, P. A., Strouse, D. J., Leibo, J. Z. & de Freitas, N. (2018). Intrinsic Social Motivation via Causal Influence in Multi-Agent RL. arXiv preprint arXiv:1810.08647.
Credit to Sequential Social Dilemma Games for providing a useful example of RLLib.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
resources		resources
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resources

resources

src

src

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Multi-Agent Cooperation in Sequential Social Dilemmas

Quick Start

Environments

Learning

Unresolved Issues

Related Works

About

Releases

Packages

Contributors 3

Languages

social-dilemma/multiagent

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Cooperation in Sequential Social Dilemmas

Quick Start

Environments

Learning

Unresolved Issues

Related Works

About

Resources

Stars

Watchers

Forks

Languages