Inverse Reinforcement Learning in Contextual MDPs

This repository contains the code files that we used in our work to construct two environments and test three learning methods. The environments in this repository:

Driving simulator
Dynamic treatment regime

How to run the simulations

To recreate our simulations firstly clone this git and then run the scripts specified below, each one from its own path.

Dynamic treatment regime:

Ellipsoid method:

Dynamic treatment regime/Linear/Ellipsoid/ellipsoid_medical.py

ES with the 1st loss:

Dynamic treatment regime/Linear/Blackbox_loss1/bb_medical.py

ES with the 2nd loss:

Dynamic treatment regime/Linear/Blackbox_loss2/bbl2_medical.py

Driving simulator:

Ellipsoid method:

Driving simulation/Linear/Ellipsoid/ellipsoid_driving.py

ES with the 1st loss:

Driving simulation/Linear/Blackbox_loss1/bb_driving.py

ES with the 2nd loss:

Driving simulation/Linear/Blackbox_loss2/bbl2_driving.py

Ellipsoid method on the non-linear model:

Driving simulation/non_linear/Ellipsoid/ellipsoid_non_linear.py

ES with the 2nd loss on the non-linear model:

Driving simulation/non_linear/Blackbox_loss2/bb_non_linear.py

Plot the results

use the jupyter notebooks in each environment.

Data required

in this work we use the processed data from point85AI git repository that can be found at:
https://github.com/point85AI/Policy-Iteration-AI-Clinician

The data set we use to construct the dynamic treatment regime can be found at:

Policy-iteration-AI-Clinician/data/normalized_data.mat

It should be placed at:

Dynamic treatment regime/data/normalized_data.mat

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Driving simulation		Driving simulation
Dynamic treatment regime		Dynamic treatment regime
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Driving simulation

Driving simulation

Dynamic treatment regime

Dynamic treatment regime

README.md

README.md

Repository files navigation

Inverse Reinforcement Learning in Contextual MDPs

How to run the simulations

Dynamic treatment regime:

Ellipsoid method:

ES with the 1st loss:

ES with the 2nd loss:

Driving simulator:

Ellipsoid method:

ES with the 1st loss:

ES with the 2nd loss:

Ellipsoid method on the non-linear model:

ES with the 2nd loss on the non-linear model:

Plot the results

Data required

The data set we use to construct the dynamic treatment regime can be found at:

It should be placed at:

About

Releases

Packages

Languages

CIRLMDP/CIRL

Folders and files

Latest commit

History

Repository files navigation

Inverse Reinforcement Learning in Contextual MDPs

How to run the simulations

Dynamic treatment regime:

Ellipsoid method:

ES with the 1st loss:

ES with the 2nd loss:

Driving simulator:

Ellipsoid method:

ES with the 1st loss:

ES with the 2nd loss:

Ellipsoid method on the non-linear model:

ES with the 2nd loss on the non-linear model:

Plot the results

Data required

The data set we use to construct the dynamic treatment regime can be found at:

It should be placed at:

About

Resources

Stars

Watchers

Forks

Languages