Skip to content

The performances of NMDPs, RMDPs, DRMDPs are evaluated in several classis toy examples.

Notifications You must be signed in to change notification settings

elated-sawyer/MDPs-with-Uncertainty

Repository files navigation

Robust MDPs

Despite being a fundamental building block for reinforcement learning, Markov decision processes (MDPs) often suffer from ambiguity in model parameters. Robust MDPs are proposed to overcome this challenge by optimizing the worst-case performance under ambiguity.
The performances of NMDPs, RMDPs, DRMDPs are evaluated in three applications: river swim, machine replacement and grid world.

Environments

We use a discounted factor $\gamma=0.85$ for all environments, and the objective is always maximizing (total discounted) rewards.

  • Machine Replacement: we have 2 repair options constituting our action set ["repair", "do nothing"] and 10 states. The rewards relate only to the states, which are [20, 20, 20, 20, 20, 20, 20, 0, 18, 10].

  • River Swim: we have 2 swimming directions constituting our action set ["move left", "move right"] and 10 states, and the rewards relate only to the state, which are [5, 0, 0, 10, 10, 10, 10, 10, 10, 15].

  • Grid World: the grid world has two rows and 12 columns, and the rewards relate to the column indices only, which are [0, 3, 21, 27, 6, 0, 0, 0, 0, 0, 15, 24]. There are four available actions, "move up" and "move down" for vertical moves (that decreases and increases the column index, respectively), as well as "move left", and "move right" for horizontal moves (that decreases and increases the row index, respectively). Horizontal moves have a chance of failure that only related to row indices (0.9 for the first row and 0.2 for the second). Failing a transfer or selecting a vertical move would generate the column index of the next state according to a Dirichlet distribution. After selecting a horizontal move, the agent will randomly go up, go down, or stay with probabilities $0.35$, $0.35$ and 0.3, respectively.

Files Intro

Experiments Files

main.py: conduct experiments on [Improvements on Percentile], save the estimated transition kernels in "Sampling", the results in "EXP_Results".
ResultsPlots.ipynb: plot the results obtained from "main" file.
EXP2.py: conduct experiments on [Target-oriented feature], plot the results, and save the plots in "Figure".
“Backup” section in different files should be ignored.

Function Files

MOD.py: Optimization models: Dual MDP; RSMDP.
AGENT.py: make agent take actions according to different requests (e.g., sampling, test policy).
ENV.py: experimental environments, including ENV_GW, ENV_MR, ENV_RiSw.
VI.py: dynamic programming methods, including value iteration, robust value iteration(RMDP).
SAMpling.py: estimate transition kernel under different purposes.

Data

EXP_Results_...: experimental results in [Improvements on Percentile]

TM-O: estimate transition kernels without prior knowledge[MR,RiSw]; TM-M: estimate transition kernels with prior knowledge[GW]
L1: L1-norm
EVA-5000: test size 5000
MSK: Mosek solver

Sampling_...: estimated transition kernels in [Improvements on Percentile]
TOF_GenMat: generated transition kernels in [Target-oriented feature]
Figure: plots in [Improvements on Percentile] and [Target-oriented feature]
Table: experimental results in [Target-oriented feature].

About

The performances of NMDPs, RMDPs, DRMDPs are evaluated in several classis toy examples.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published