Skip to content

tsumers/rewards

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning Rewards from Linguistic Feedback

This repository contains (1) data, (2) model training, and (3) model analysis code to support the paper: https://arxiv.org/abs/2009.14715

Requirements

Install the environment via Conda:

$ conda env create -f environment.yml

Run tests:

$ python -m unittest

Data Exploration

Appendix.pdf contains additional information about models and experiments, including full transcripts from informative teacher-learner pairs.

Provided iPython notebooks under the /notebooks diectory can be used to explore datasets and re-run model evaluation.

To run them:

$ cd notebooks/
$ jupyter lab

Datasets

The human-human and human-agent datasets can be found in notebooks/data/: human_trial_data.json and agent_trial_data.json respectively.

The easiest way to get started with them is to use the aaai_experiment_data_exploration.ipynb notebook.

Training

Training code / scripts are in the aaai_inference_network_training.ipynb notebook. The data augmentation step will cache results in the notebooks/data/ subfolder.

Evaluation

Evaluation code / scripts are in the aaai_model_evaluation.ipynb notebook.

These can be run independently of the training notebook and will use pretrained models. Running it will cache results in the notebooks/data/ subfolder.

Pre-trained Models

Pretrained models are available in the data/model_training_10fold subdirectory. There is one .pt file for each cross-validation split. These models are loaded and used automatically in the aaai_model_evaluation.ipynb notebook.