SSINet is a self-supervised interpretable framework, which can discover interpretable features to enable easy understanding of RL agents even for non-experts. Specifically, SSINet can be used to produce fine-grained attention masks for highlighting task-relevant information, which constitutes most evidence for the agent’s decisions. This implementation is based on the paper Self-Supervised Discovering of Interpretable Features for Reinforcement Learning.
If you use this codebase for your research, please cite the paper:
@article{shi2020self,
title={Self-Supervised Discovering of Interpretable Features for Reinforcement Learning},
author={Shi, Wenjie and Huang, Gao and Song, Shiji and Wang, Zhuoyuan and Lin Tingyu and Wu, Cheng},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2020}
}
The recommended way to set up these experiments is via a virtualenv.
sudo apt-get install python-pip
python -m pip install --user virtualenv
python -m virtualenv ~/env
source ~/env/bin/activate
Then install the project dependencies with the following command:
pip install -r requirements.txt
Our experiments are mainly performed on the Gym-Duckietown environment, which is a self-driving car simulator environment for OpenAI Gym. It places your agent, a Duckiebot, inside of an instance of a Duckietown: a loop of roads with turns, intersections, obstacles, Duckie pedestrians, and other Duckiebots.
Download and install all the dependencies with pip3
:
git clone https://github.com/duckietown/gym-duckietown.git
cd gym-duckietown
pip3 install -e .
After installation, we suggest to replace the original file <installation_path>/gym_duckietown/simulator.py
with our revised simulator file to avoid NaNs and simulation logs. Please note that if you want to reproduce our results on different maps, you should add our customized map files to the folder <installation_path>/gym_duckietown/maps
.
Our training procedure includes two stages. While the first stage aims to pretrain the RL agent to be interprested, the second stage trains the attention mask to highlight task-relevant information, which constitutes most evidence for the agent’s decisions.
-
To pretrain the agent
This stage pretrains the RL agent to obtain an expert policy, which is used to generate state-action pairs for training SSINet in the second stage.
For Duckietown:
python pretrain_ppo.py \ --env_name "duckietown" \ --map_name "loop_empty" \ --spec_init \ --batch_size 4 \ --lr 0.00015 \ --lr_decay \ --actor_net_type "unet" \ --net_layers 4
python pretrain_sac.py \ --env_name "duckietown" \ --map_name "loop_empty" \ --spec_init \ --batch_size 128 \ --alpha 0.05 \ --actor_net_type "unet" \ --net_layers 4
python pretrain_td3.py \ --env_name "duckietown" \ --map_name "loop_corner" \ --spec_init \ --batch_size 32 \ --actor_net_type "unet" \ --net_layers 4
For Atari:
python pretrain_ppo_atari.py \ --env_name "atari" \ --task_name "EnduroNoFrameskip-v0" \ --seed 0 \ --spec_init \ --batch_size 4 \ --lr 0.00025 \ --lr_decay \ --actor_net_type "unet" \ --net_layers 4
-
To learn the attention mask
This stage aims to learn interpretable attention masks for explaining the agent’s behavior.
For Duckietown:
python train_mask.py \ --env_name "duckietown" \ --map_name "loop_empty" \ --spec_init \ --reg_scale 0.02 \ --dataset_size 50000 \ --actor_net_type "unet" \ --net_layers 4 \ --model_dir "<path to the pretrained model>"
For Atari:
python train_mask_atari.py \ --env_name "atari" \ --task_name "EnduroNoFrameskip-v0" \ --spec_init \ --reg_scale 0.005 \ --actor_net_type "unet" \ --net_layers 4 \ --model_dir "<path to the pretrained model>"
-
To simulate the resulting policies
Finally, you can visualize the attention mask using:
python render.py \ --env_name "duckietown" \ --map_name "loop_empty" \ --seed 0 \ --spec_init \ --actor_net_type "unet" \ --net_layers 4 \ --model_dir "<path to the trained models>"
This repository includes a pretrained model and a trained mask model in the trained_models directory. These models use an U-Net actor architecture with net_layers = 4
, and the pretrained model is trained using PPO algorithm. You can use them directly to reproduce our main experimental results or for further exploration.
In general, RL agents may exhibit different performance even on simple tasks. To explain why the agent performs well or badly, SSINet is implemented with different RL algorithms (PPO, SAC and TD3) and actor architectures (U-Net, RefineNet, FC-DenseNet and DeepLab-v3). Please note that RefineNet-1 and RefineNet-2 are exactly the same except for utilizing ResNet and MobileNet as the backbone network respectively.
- Different RL algorithms with the same actor architecture (U-Net)
- Different actor architectures with the same RL algorithm (PPO)
To evaluate the generalization capability, the agent trained on the loop_empty
map is tested on all the maps.