Hiding Leader's Identity

This is the official repository for the following two papers:

Hiding Leader's Identity in Leader-Follower Navigation through Multi-Agent Reinforcement Learning
Ankur Deka, Wenhao Luo, Huao Li, Michael Lewis, Katia Sycara
Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2021
Paper Link: Arxiv, IROS
Human vs. Deep Neural Network Performance at a Leader Identification Task
Ankur Deka, Michael Lewis, Huao Li, Phillip Walker, Katia Sycara*
Accepted to Human Factors and Ergonomics Society (HFES) Annual Meeting 2021
Paper Link: PITT, HFES

Installation

I have tested this repository with Python 3.6 on Ubuntu 18.04. First install Anaconda and then run:

git clone git@github.com:Ankur-Deka/Hiding-Leader-Identity.git
cd Hiding-Leader-Identity
conda create python=3.6 pip --name HidingIdentity
conda activate HidingIdentity
pip install -r requirements.txt

Note: gym_vecenv MUST be installed from the link in requirements.txt. pip install gym_vecenv will NOT give the same results!

Quick start - Pretrained policies

From left to right: Naive MARL, Our proposed strategy (Paper 1), Co-training (Stage 4 in Paper 2), Scripted PD, Zheng et al.

Download the folder marlsave from this Drive link and store it in the root directory.

Naive MARL

python joint_main.py --mode test --load-mode individual --swarm-load-run 0 --swarm-load-ckpt latest --adversary-load-run 0 --adversary-load-ckpt latest --out-dir naive_marl --plot-trajectories --record --goal-at-top --seed 0

Our proposed leader identity hiding policy in Paper 1

python joint_main.py --mode test --load-mode joint --load-run 1 --load-ckpt latest --out-dir leader_hiding --plot-trajectories --record --goal-at-top --seed 0

Co-training - stage 4 in Paper 2 I am providing 3 training runs (2,3 and 4) due to stochasticity of results (refer to Paper 2).

python joint_main.py --mode test --load-mode joint --load-run 2 --load-ckpt latest --out-dir co_training --plot-trajectories --record --goal-at-top --seed 0

Scripted PD

python joint_main.py --mode test --algo scripted --load-mode individual --adversary-load-run 4 --adversary-load-ckpt latest --out-dir scripted_pd --plot-trajectories --record --goal-at-top --seed 0

Zheng et al.

python -W ignore joint_main_genetic.py --algo genetic --adversary-hidden-dim 512 --num-processes 1 --mode test --load-mode joint --load-run 5 --load-ckpt latest --adversary-version V2 --out-dir zheng --plot-trajectories --record --goal-at-top --seed 0

Full details - Train and test yourself

There are multiple stages of training as described in the above mentioned papers: Stage 1 to 3 in Paper 1, Stage 1 to 4 in Paper 2. For each stage, for both train and test mode, we need to joint_main.py or adversary_training/main.py with the right arguments as explained below.

Stage 1

Train multi-robot team on goal reaching reward

python joint_main.py --mode train --use-adversary 0

Saves training files in a folder marlsave/run_n1. n1 is generated automatically starting from 0 and increasing everytime we run train. Trained model checkpoints and tensorboard logs are saved here. It is important to note down the run number n1 for use below.

Test

Replace --load-run 0 with the same n1 as above. load-ckpt can be latest or a valid number. Results are saved in output/stage_1

python joint_main.py --mode test --use-adversary 0 --load-mode joint --load-run 0 --load-ckpt latest --out-dir stage_1 --record --goal-at-top

Stage 2

Train adversary to identify leader

Generate trajectory data
Replace --load-run 0 with the right value of n1 corresponding to training in stage 1.

python joint_main.py --mode test --use-adversary 0 --load-mode joint --load-run 0 --load-ckpt latest --out-dir stage_1_train --num-eval-episodes 1000
python joint_main.py --mode test --use-adversary 0 --load-mode joint --load-run 0 --load-ckpt latest --out-dir stage_1_test --num-eval-episodes 100 --goal-at-top

Saves trajectories in output/stage_1_train/trajs and output/stage_1_test/trajs.

Create a dataset folder

mkdir -p trajectory_datasets/dataset_1
mv output/stage_1_train/trajs trajectory_datasets/dataset_1/train_dataset
mv output/stage_1_test/trajs trajectory_datasets/dataset_1/test_dataset

Train adversary

cd adversary_training
python main.py --mode train --dataDir ../trajectory_datasets/dataset_1

This will save training files in runs/run_n2 where n2 is generated automatically. It is important to note down n2 for use below.

Test adversary
Replace 0 in --swarm-load-run 0 with n1. Replace 0 in --adversary-load-run n2 with n2.

cd ..
python joint_main.py --mode test --load-mode individual --swarm-load-run 0 --swarm-load-ckpt latest --adversary-load-run 0 --adversary-load-ckpt latest --out-dir stage_2_results --plot-trajectories --record --goal-at-top

Stage 3

Train swarm with goal reaching + identity hiding reward

Replace 0 in --adversary-load-run 0 with n2.

python joint_main.py --mode train --load-mode individual --adversary-load-run 0 --adversary-load-ckpt latest --train-adversary 0

Saves training files in a folder marlsave/run_n2.

Test

Replace 1 in --load-run 0 with n2.

python joint_main.py --mode test --load-mode joint --load-run 1 --load-ckpt latest --out-dir stage_3_results --plot-trajectories --record --goal-at-top

This is our proposed policy in Paper 1.

Stage 4

Joint training of swarm and adversary

python joint_main.py --mode train

Test

Replace --load-run 1

python joint_main.py --mode test --load-mode joint --load-run 1 --load-ckpt latest --out-dir stage_4_results --plot-trajectories --record --goal-at-top

Baselines

Scripted PD

set algo=scripted

python joint_main.py --mode test --use-adversary 0 --algo scripted --out-dir scripted_pd_train --num-eval-episodes 1000
python joint_main.py --mode test --use-adversary 0 --algo scripted --out-dir scripted_pd_test --num-eval-episodes 100 --goal-at-top
mkdir -p trajectory_datasets/dataset_scripted_pd
mv output/scripted_pd_train/trajs trajectory_datasets/dataset_scripted_pd/train_dataset
mv output/scripted_pd_test/trajs trajectory_datasets/dataset_scripted_pd/test_dataset

cd adversary_training
python main.py --mode train --dataDir ../trajectory_datasets/dataset_scripted_pd

cd ..
python joint_main.py --mode test --algo scripted --load-mode individual --adversary-load-run 4 --adversary-load-ckpt latest --out-dir scripted_pd_results --plot-trajectories --record --goal-at-top --seed 0

Zheng et al. - Genetic

Pretraining Adversary

cd adversary_training
python main.py --mode train --dataDir ../trajectory_datasets/dataset_genetic_pretraining --lr 0.025 --version V2 --hiddenDim 512 --optimizer SGD

Co-optimization

cd ..
cmd python -W ignore joint_main_genetic.py --adversary-num-trajs 100 --algo genetic --adversary-load-ckpt latest --num-frames 1000000 --adversary-hidden-dim 512 --num-processes 1 --mode train --env-name simple_flocking --adversary-load-run 8 --load-mode individual --adversary-num-epochs 1 --adversary-version V2

Test

python -W ignore joint_main_genetic.py --algo genetic --adversary-hidden-dim 512 --num-processes 1 --mode test --env-name simple_flocking --load-mode joint --load-run 228 --load-ckpt latest --adversary-version V2

Utilities

Generating plotting data

python gen_plot_data.py --load-run 32

Recording and playing videos

Use run_grid_search.py file to generate multiple videos together. Open swarm_training/output/video_previewing_tool/video_preview.html on browser (tested on Firefox 75.0 beta and Chrome Version 83.0.4103.61 (Official Build) (64-bit)). Browse and select the videos you wish to play.

--store-video-together to store videos in common folder

Analysing

For checking results of different ckpts of same run (1) pass an array of 'load-ckpt': [10,20,30,40,50,60,70,80,90,100], (2) 'store-video-together': [''], (3) DON'T pass 'out_dir' - auto generates out-dir names

Useful arguments

Arguments are defined in arguments.py

num_frames: No. of environment frames to train on
num_iters: num_frames // num_processes
update_every: Updated after this many frames
num_updates: No. of updates for each update_every
batch_size
buffer_size: Should be larger than max possible episode length

Evaluation

load-mode = {individual, joint}
1. individual: loads from swarm-load-path and adversary-load-path
2. joint: loads from load-path

Automate

joint_run.py provides a convenient way to run multiple experiments. Earlier, I didn't have any problem but lately I'm having issues training in parallel. run_ID of one experiment is clashing with another

User Data

web_form contains complete web based UI. In our experiments we hosted on a Apache web server running on a google cloud instance.
Data is saved in user_data.txt. cd web_form && python conv2csv.py user_data.txt to convert to csv format. If there is any space/new line at the end of user_data.txt, remove it before converting.

Folders and files

out_files: trajectories of swarm robots, one episode per file
adversary_training: code for training adversary
1. Prototyping_notebook.ipynb
2. runs: checkpoints and tensorboard
3. main.py: training/validating
4. dataset.py: dataset class
5. models.py: model classes
swarm_training: code for training swarm
1. main.py: main file for running
2. arguments.py: arguments
3. learner.py: Learner - master object for the swarm
4. runs: checkpoints and tensorboard
mape: environment

Environment Description

simple_flocking - goal reaching, leader observes goal location
simple_trajectory - trajectory following, leader observes next loc on trajectory (crude implementation). 'is_success' is always False
Successful if all agents are within a thershold distance to the goal
Reward: difference in distance + additional goal on task completion (disable right now)
Goal at the top half of the window during training and at y = 0.9 during testing (to get roughly uniform goal reaching time for human trials)
Adversary cannot see the goal
Implementation details:
1. done is a list with True/False repeated num_agents times. All should have save value. Env is reset (by gym_vecenv or PseudoVecEnv) even if one of them is True
2. info['env_done'] contains done for overall team
3. info['is_success'] contains is_success for overall team
4. env reset when it's done by gym_vecenv or PseudoVecEnv, last obs stored in info['terminal_observation']
5. mape/multiagent/environment: generates multiagent environment
6. mape/environment/scenarios/simple_flocking: goal reaching environment

Output files

stored in output/args.out_file
adversary_preds has csv files for different episodes. First column is true leader ID in corresponding video, second column is adversary's prediction.

Common Issues

Pyglet:

File "./mape/multiagent/rendering.py", line 120, in render
    arr = np.fromstring(image_data.data, dtype=np.uint8, sep='')
AttributeError: 'ImageData' object has no attribute 'data'

Solution: pip install pyglet==1.3.2

Acknowledgements

swarm_training directory is adapted from: marl_transfer.
Parts of the code for Zheng et al. are adapted from the corresponding private_flocking repository.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Figures		Figures
SIGS-Grid-Search		SIGS-Grid-Search
adversary_training		adversary_training
mape		mape
swarm_training		swarm_training
web_form		web_form
README.md		README.md
gen_plot_data.py		gen_plot_data.py
gen_plot_data_genetic.py		gen_plot_data_genetic.py
joint_arguments.py		joint_arguments.py
joint_main.py		joint_main.py
joint_main_genetic.py		joint_main_genetic.py
joint_run.py		joint_run.py
joint_run_genetic.py		joint_run_genetic.py
joint_utils.py		joint_utils.py
requirements.txt		requirements.txt

Ankur-Deka/Hiding-Leader-Identity

Folders and files

Latest commit

History

Repository files navigation

Hiding Leader's Identity

Installation

Quick start - Pretrained policies

Full details - Train and test yourself

Stage 1

Train multi-robot team on goal reaching reward

Test

Stage 2

Train adversary to identify leader

Stage 3

Train swarm with goal reaching + identity hiding reward

Test

Stage 4

Joint training of swarm and adversary

Test

Baselines

Scripted PD

Zheng et al. - Genetic

Pretraining Adversary

Co-optimization

Utilities

Generating plotting data

Recording and playing videos

Analysing

Useful arguments

Evaluation

Automate

User Data

Folders and files

Environment Description

Output files

Common Issues

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Languages