Machine ToM

Code for CompDevLab machine theory of mind project. Currently capable of inferring reward functions for agents in simple grid environments. Environments are of the form

- - - - - g
- w - - - -
- w - - - -
- w - - - -
- w w w w -
a - - - - -

where - represents an empty space, g represents a goal state, w represents a wall, and a represents the agent's location.

Requirements

Python 3.4 or above
Numpy
Tensorflow

Usage

Forward Model

Given a map, the forward model uses value iteration to find a path from the agent's starting state to a goal state. The agent is stochastic (rather than deterministic), and its behavior can be controlled with a softmax parameter tau. Higher values of tau lead to more erratic behavior while lower ones lead to more deterministic results.

Example command:

python3 forward_example.py --tau=0.005

Inverse Model (Naive Version)

The naive inverse model samples potential reward values from a Poisson prior. These values are then normalized by dividing by the largest sampled value. This model can be run using inverse_model_example.py. The script will run for a specified number of sampling iterations, saving the most likely reward function based on a set of forward model demonstrations. A graph of log likelihood vs time will be produced, as well as a heat map displaying the most likely reward function.

Example command:

python3 slow_sampler__example.py --tau=0.005 --num_demonstrations=25

Inverse Model (Maximum Entropy Deep Inverse Reinforcement Learning)

Implementation of Maximum Entropy Deep Inverse Reinforcement Learning as introduced in this paper. Model can be run using inverse_model_example.py. Running the script will infer a reward function and produce a heat map to display it.

Example command:

python3 inverse_model_example.py --tau=0.005 --num_demonstrations=25

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
SDD_utils		SDD_utils
Samplers		Samplers
example_scripts		example_scripts
gridworld_maps		gridworld_maps
GridWorldMDP.py		GridWorldMDP.py
GridWorldState.py		GridWorldState.py
Planner.py		Planner.py
README.md		README.md
deep_maxent_irl.py		deep_maxent_irl.py
reward_prior.npy		reward_prior.npy
utils.py		utils.py
vis_utils.py		vis_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDD_utils

SDD_utils

Samplers

Samplers

example_scripts

example_scripts

gridworld_maps

gridworld_maps

GridWorldMDP.py

GridWorldMDP.py

GridWorldState.py

GridWorldState.py

Planner.py

Planner.py

README.md

README.md

deep_maxent_irl.py

deep_maxent_irl.py

reward_prior.npy

reward_prior.npy

utils.py

utils.py

vis_utils.py

vis_utils.py

Repository files navigation

Machine ToM

Requirements

Usage

Forward Model

Inverse Model (Naive Version)

Inverse Model (Maximum Entropy Deep Inverse Reinforcement Learning)

About

Releases

Packages

Languages

ethanweinberger/cdl_tom

Folders and files

Latest commit

History

Repository files navigation

Machine ToM

Requirements

Usage

Forward Model

Inverse Model (Naive Version)

Inverse Model (Maximum Entropy Deep Inverse Reinforcement Learning)

About

Resources

Stars

Watchers

Forks

Languages