Maze 3D Collaborative Learning on shared task

Description

A human-agent collaborative game in a virtual environment based on the work of Shafti et al. (2020) [1]. Collaborative Learning is achieved through Deep Reinforcement Learning (DRL). The Soft-Actor Critic (SAC) algorithm is used [2] with modifications for discrete action space [3].

Installation

Run source install_dependencies/install.sh. A python virtual environment will be created and the necessary libraries will be installed. Furthermore, the directory of the repo will be added to the PYTHONPATH environmental variable.

Run

Run python game/maze3d_human_only_test.py game/config/onfig_human_test.yaml <participant_name> for human-only game.
Run python game/sac_maze3d_train.py game/config/<config_sac> <participant_name> for human-agent game.
- Notes before training:
  - Set the <participant_name> to the name of the participant.
  - The program will create a /tmp and a /plot folder (if they do not exist) in the results/ folder. The /tmp folder contains CSV files with information of the game. The /plot folder contains figures for tha game. See here for more details.
  - The program will automatically create an identification number after your name on each folder name created

Configuration

In the game/config folder several YAML files exist for the configuration of the experiment. The main parameters are listed below.
- game/discrete: True if the keyboard input is discrete (False for continuous). Details regarding the discrete and continuous human input mode can be found here
- SAC/reward_function: Type of reward function. Details about the predefined reward functions and how to define a new one can be found here.
- Experiment/mode: Choose how the game will be terminated; either when a number of games, or a number of interactions is completed.
- SAC/discrete: Discrete or normal SAC (Currently only the discrete SAC is compatible with the game)

Play

Human only Use Left and Right arrows to control the tilt of the tray around its y-axis and use Up and Down arrows to control the tile of the tray around its x-axis as shown in the previous picture
Human-Agent Use Left and Right arrows to control the tilt of the tray around its y-axis
Press once the space key to pause, and a second time to resume
Press q to exit the experiment.

Citation

If you use this repository in your publication please cite below:

Fotios Lygerakis, Maria Dagioglou, and Vangelis Karkaletsis. 2021. Accelerating Human-Agent Collaborative Reinforcement Learning. InThe 14th PErvasive Technologies Related to Assistive Environments Conference (PETRA2021), June 29-July 2, 2021, Corfu, Greece.ACM, New York, NY, USA, 3 pages.https://doi.org/10.1145/3453892.3454004

Experiment Result Output Files

Contents of a/tmp folder. The terms "training/testing trial", "game step" and "experiment" are explained in [4] in detail:

actions.csv: All the actions performed during the experiment in the format ( a^agent [4], a^human).
avg_length_list.csv: The length of each training trial in terms of game step.
test_length_list.csv: The length of each test trial in terms of game step.
config_sac.yaml: The configuration file used for this experiment. It's purpose it to be able to replicate this experiment.
episode_durations.csv: The total duration of each training trial.
test_episode_duration_list.csv: The total duration of each testing trial.
grad_updates_durations.csv: The total duration of an offline gradient update session for each trial. In combination with the episode_durations.csv are used to calculate the cumulative total time elapsed as shown on Figure 4 of [4].
scores.csv: The total score for each training trial.
test_score_history.csv: The total score for each testing trial. The mean and standard error of the mean over each session is used in [4] for figures 2 and 3
rest_info.csv: goal position, total experiment duration, best score achieved, the trial that achieved the best score, the best reward achieved, the length of the game trial with the best score, the total amount of time steps for the whole experiment, the total number of games played, the fps the game run on and the average offline gradient update duration over all sessions.

Contents of a/plot folder:

episode_durations.png
grad_updates_durations.png
length.png
scores.png
test_episode_duration.png
test_length.png
test_scores.png
test_scores_mean_std.png
training_logs.pkl: a pandas framework saves in pickle format that contains the action and state for each training game step.

References

[1] Shafti, Ali, et al. "Real-world human-robot collaborative reinforcement learning." arXiv preprint arXiv:2003.01156 (2020).

[2] https://github.com/kengz/SLM-Lab

[3] Christodoulou, Petros. "Soft actor-critic for discrete action settings." arXiv preprint arXiv:1910.07207 (2019).

[4] Fotios Lygerakis, Maria Dagioglou, and Vangelis Karkaletsis. 2021. Accelerating Human-Agent Collaborative Reinforcement Learning. InThe 14th PErvasive Technologies Related to Assistive Environments Conference (PETRA2021), June 29-July 2, 2021, Corfu, Greece.ACM, New York, NY, USA, 3 pages.https://doi.org/10.1145/3453892.3454004

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.idea		.idea
game		game
install_dependencies		install_dependencies
maze3D_new		maze3D_new
misc		misc
pictures		pictures
plot_utils		plot_utils
rl_models		rl_models
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

game

game

install_dependencies

install_dependencies

maze3D_new

maze3D_new

misc

misc

pictures

pictures

plot_utils

plot_utils

rl_models

rl_models

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Maze 3D Collaborative Learning on shared task

Description

Installation

Run

Configuration

Play

Citation

Experiment Result Output Files

References

About

Releases

Packages

Languages

Roboskel-Manipulation/maze_RL_v2

Folders and files

Latest commit

History

Repository files navigation

Maze 3D Collaborative Learning on shared task

Description

Installation

Run

Configuration

Play

Citation

Experiment Result Output Files

References

About

Resources

Stars

Watchers

Forks

Languages