Robust MDPs

Despite being a fundamental building block for reinforcement learning, Markov decision processes (MDPs) often suffer from ambiguity in model parameters. Robust MDPs are proposed to overcome this challenge by optimizing the worst-case performance under ambiguity.
The performances of NMDPs, RMDPs, DRMDPs are evaluated in three applications: river swim, machine replacement and grid world.

Environments

We use a discounted factor $\gamma=0.85$ for all environments, and the objective is always maximizing (total discounted) rewards.

Machine Replacement: we have 2 repair options constituting our action set ["repair", "do nothing"] and 10 states. The rewards relate only to the states, which are [20, 20, 20, 20, 20, 20, 20, 0, 18, 10].
River Swim: we have 2 swimming directions constituting our action set ["move left", "move right"] and 10 states, and the rewards relate only to the state, which are [5, 0, 0, 10, 10, 10, 10, 10, 10, 15].
Grid World: the grid world has two rows and 12 columns, and the rewards relate to the column indices only, which are [0, 3, 21, 27, 6, 0, 0, 0, 0, 0, 15, 24]. There are four available actions, "move up" and "move down" for vertical moves (that decreases and increases the column index, respectively), as well as "move left", and "move right" for horizontal moves (that decreases and increases the row index, respectively). Horizontal moves have a chance of failure that only related to row indices (0.9 for the first row and 0.2 for the second). Failing a transfer or selecting a vertical move would generate the column index of the next state according to a Dirichlet distribution. After selecting a horizontal move, the agent will randomly go up, go down, or stay with probabilities $0.35$, $0.35$ and 0.3, respectively.

Files Intro

Experiments Files

main.py: conduct experiments on [Improvements on Percentile], save the estimated transition kernels in "Sampling", the results in "EXP_Results".
ResultsPlots.ipynb: plot the results obtained from "main" file.
EXP2.py: conduct experiments on [Target-oriented feature], plot the results, and save the plots in "Figure".
“Backup” section in different files should be ignored.

Function Files

MOD.py: Optimization models: Dual MDP; RSMDP.
AGENT.py: make agent take actions according to different requests (e.g., sampling, test policy).
ENV.py: experimental environments, including ENV_GW, ENV_MR, ENV_RiSw.
VI.py: dynamic programming methods, including value iteration, robust value iteration(RMDP).
SAMpling.py: estimate transition kernel under different purposes.

Data

EXP_Results_...: experimental results in [Improvements on Percentile]

TM-O: estimate transition kernels without prior knowledge[MR,RiSw]; TM-M: estimate transition kernels with prior knowledge[GW]
L1: L1-norm
EVA-5000: test size 5000
MSK: Mosek solver

Sampling_...: estimated transition kernels in [Improvements on Percentile]
TOF_GenMat: generated transition kernels in [Target-oriented feature]
Figure: plots in [Improvements on Percentile] and [Target-oriented feature]
Table: experimental results in [Target-oriented feature].

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.ipynb_checkpoints		.ipynb_checkpoints
EXP_Results		EXP_Results
Figure		Figure
Sampling		Sampling
TOF_GenMat		TOF_GenMat
Table		Table
__pycache__		__pycache__
AGENT.py		AGENT.py
ENV.py		ENV.py
EXP2.py		EXP2.py
MOD.py		MOD.py
README.md		README.md
ResultsPlots.ipynb		ResultsPlots.ipynb
SAMpling.py		SAMpling.py
VI.py		VI.py
main.py		main.py

elated-sawyer/MDPs-with-Uncertainty

Folders and files

Latest commit

History

Repository files navigation

Robust MDPs

Environments

Files Intro

Experiments Files

Function Files

Data

About

Topics

Resources

Stars

Watchers

Forks

Languages