SSINet

SSINet is a self-supervised interpretable framework, which can discover interpretable features to enable easy understanding of RL agents even for non-experts. Specifically, SSINet can be used to produce fine-grained attention masks for highlighting task-relevant information, which constitutes most evidence for the agent’s decisions. This implementation is based on the paper Self-Supervised Discovering of Interpretable Features for Reinforcement Learning.

If you use this codebase for your research, please cite the paper:

@article{shi2020self,
  title={Self-Supervised Discovering of Interpretable Features for Reinforcement Learning},
  author={Shi, Wenjie and Huang, Gao and Song, Shiji and Wang, Zhuoyuan and Lin Tingyu and Wu, Cheng},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2020}
}

Getting Started

The recommended way to set up these experiments is via a virtualenv.

sudo apt-get install python-pip
python -m pip install --user virtualenv
python -m virtualenv ~/env
source ~/env/bin/activate

Then install the project dependencies with the following command:

pip install -r requirements.txt

Gym-Duckietown Installation

Our experiments are mainly performed on the Gym-Duckietown environment, which is a self-driving car simulator environment for OpenAI Gym. It places your agent, a Duckiebot, inside of an instance of a Duckietown: a loop of roads with turns, intersections, obstacles, Duckie pedestrians, and other Duckiebots.

Download and install all the dependencies with pip3:

git clone https://github.com/duckietown/gym-duckietown.git
cd gym-duckietown
pip3 install -e .

After installation, we suggest to replace the original file <installation_path>/gym_duckietown/simulator.py with our revised simulator file to avoid NaNs and simulation logs. Please note that if you want to reproduce our results on different maps, you should add our customized map files to the folder <installation_path>/gym_duckietown/maps.

Examples

Our training procedure includes two stages. While the first stage aims to pretrain the RL agent to be interprested, the second stage trains the attention mask to highlight task-relevant information, which constitutes most evidence for the agent’s decisions.

To pretrain the agent

This stage pretrains the RL agent to obtain an expert policy, which is used to generate state-action pairs for training SSINet in the second stage.

For Duckietown:

python pretrain_ppo.py \
  --env_name "duckietown" \
  --map_name "loop_empty" \
  --spec_init \
  --batch_size 4 \
  --lr 0.00015 \
  --lr_decay \
  --actor_net_type "unet" \
  --net_layers 4

python pretrain_sac.py \
  --env_name "duckietown" \
  --map_name "loop_empty" \
  --spec_init \
  --batch_size 128 \
  --alpha 0.05 \
  --actor_net_type "unet" \
  --net_layers 4

python pretrain_td3.py \
  --env_name "duckietown" \
  --map_name "loop_corner" \
  --spec_init \
  --batch_size 32 \
  --actor_net_type "unet" \
  --net_layers 4

For Atari:

python pretrain_ppo_atari.py \
  --env_name "atari" \
  --task_name "EnduroNoFrameskip-v0" \
  --seed 0 \
  --spec_init \
  --batch_size 4 \
  --lr 0.00025 \
  --lr_decay \
  --actor_net_type "unet" \
  --net_layers 4

To learn the attention mask

This stage aims to learn interpretable attention masks for explaining the agent’s behavior.

For Duckietown:

python train_mask.py \
  --env_name "duckietown" \
  --map_name "loop_empty" \
  --spec_init \
  --reg_scale 0.02 \
  --dataset_size 50000 \
  --actor_net_type "unet" \
  --net_layers 4 \
  --model_dir "<path to the pretrained model>"

For Atari:

python train_mask_atari.py \
  --env_name "atari" \
  --task_name "EnduroNoFrameskip-v0" \
  --spec_init \
  --reg_scale 0.005 \
  --actor_net_type "unet" \
  --net_layers 4 \
  --model_dir "<path to the pretrained model>"

To simulate the resulting policies

Finally, you can visualize the attention mask using:

python render.py \
  --env_name "duckietown" \
  --map_name "loop_empty" \
  --seed 0 \
  --spec_init \
  --actor_net_type "unet" \
  --net_layers 4 \
  --model_dir "<path to the trained models>"

This repository includes a pretrained model and a trained mask model in the trained_models directory. These models use an U-Net actor architecture with net_layers = 4, and the pretrained model is trained using PPO algorithm. You can use them directly to reproduce our main experimental results or for further exploration.

Resutls

In general, RL agents may exhibit different performance even on simple tasks. To explain why the agent performs well or badly, SSINet is implemented with different RL algorithms (PPO, SAC and TD3) and actor architectures (U-Net, RefineNet, FC-DenseNet and DeepLab-v3). Please note that RefineNet-1 and RefineNet-2 are exactly the same except for utilizing ResNet and MobileNet as the backbone network respectively.

Different RL algorithms with the same actor architecture (U-Net)

PPO SAC TD3

Different actor architectures with the same RL algorithm (PPO)

U-Net RefineNet-1 RefineNet-2 FC-DenseNet DeepLab-v3

To evaluate the generalization capability, the agent trained on the loop_empty map is tested on all the maps.

loop_empty loop_corner loop_U_turn loop_S_turn zigzag_floor

loop_empty_fill loop_corner_fill loop_U_turn_fill loop_S_turn_fill zigzag_floor_fill

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
envs		envs
envs_utils		envs_utils
mask		mask
media		media
net_parts		net_parts
src		src
trained_models		trained_models
LICENSE		LICENSE
README.md		README.md
pretrain_ppo.py		pretrain_ppo.py
pretrain_ppo_atari.py		pretrain_ppo_atari.py
pretrain_sac.py		pretrain_sac.py
pretrain_td3.py		pretrain_td3.py
render.py		render.py
requirements.txt		requirements.txt
train_mask.py		train_mask.py
train_mask_atari.py		train_mask_atari.py

License

shiwj16/SSINet

Folders and files

Latest commit

History

Repository files navigation

SSINet

Getting Started

Gym-Duckietown Installation

Examples

Resutls

About

Resources

License

Stars

Watchers

Forks

Languages