Supporting code for the paper DERAIL: Diagnostic Environments for Reward and Imitation Learning.
The environments are available at the HumanCompatibleAI/seals repo. This repo contains the rest of the code for running the experiments in the paper.
To reproduce the results:
git clone https://github.com/HumanCompatibleAI/derail
cd derail
pip install .
python -m derail.run -t 500000 -n 15 -p
python -m derail.plot -f results/last.csv