Contrastive Preference Learning: Learning from Human Feedback without RL

This is the official codebase for Contrastive Preference Learning: Learning From Human Feedback without RL by Joey Hejna, Rafael Rafailov*, Harshit Sikchi*, Chelsea Finn, Scott Niekum, W. Bradley Knox, and Dorsa Sadigh.

Below we include instructions for reproducing results found in the paper. This repository is based on a frozen version of research-lightning. For detailed information about how to use the repository, refer to that repository.

If you find our paper or code insightful, feel free to cite us with the following bibtex:

@InProceedings{hejna23contrastive,
  title = {Contrastive Preference Learning: Learning From Human Feedback without RL},
  author = {Hejna, Joey and Rafailov, Rafael and Sikchi, Harshit and Finn, Chelsea and Niekum, Scott and Knox, W. Bradley and Sadigh, Dorsa},
  booktitle = {ArXiv preprint},
  year = {2023},
  url = {https://arxiv.org/abs/2310.13639}
}

Installation

Complete the following steps:

Clone the repository to your desired location using git clone https://github.com/jhejna/cpl.
Create the conda environment using conda env create -f environment_<cpu or gpu>.yaml. Note that the correct MetaWorld version must be used.
Install the repository research package via pip install -e ..
Modify the setup_shell.sh script by updating the appropriate values as needed. The setup_shell.sh script should load the environment, move the shell to the repository directory, and additionally setup any external dependencies. All the required flags should be at the top of the file. This is necessary for support with the SLURM launcher, which we used to run experiments.
Download the metaworld datasets here. Extract the files into a datasets folder in the repository root. This should match the paths in the config files.

When using the repository, you should be able to setup the environment by running . path/to/setup_shell.sh.

Usage

To train a model, simply run python scripts/train.py --config path/to/config --path path/to/save/folder after activating the environment.

Multiple experiments can be run at a single time using a .json sweep file. To run a sweep, first create one, then run a sweep command using either tools/run_slurm.py or tools/run_local.py. Specify the slurm config and output directory with --arguments config=path/to/config path=path/to/save/folder. For example sweep files, check out the Inverse Preference Learning repository.

License

This code has an MIT license, found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
research		research
scripts		scripts
tools		tools
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
environment_cpu.yaml		environment_cpu.yaml
environment_gpu.yaml		environment_gpu.yaml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup_shell.sh		setup_shell.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

research

research

scripts

scripts

tools

tools

.gitignore

.gitignore

.pre-commit-config.yaml

.pre-commit-config.yaml

LICENSE

LICENSE

README.md

README.md

environment_cpu.yaml

environment_cpu.yaml

environment_gpu.yaml

environment_gpu.yaml

pyproject.toml

pyproject.toml

setup.cfg

setup.cfg

setup_shell.sh

setup_shell.sh

Repository files navigation

Contrastive Preference Learning: Learning from Human Feedback without RL

Installation

Usage

License

About

Releases

Packages

Contributors 2

Languages

License

jhejna/cpl

Folders and files

Latest commit

History

Repository files navigation

Contrastive Preference Learning: Learning from Human Feedback without RL

Installation

Usage

License

About

Resources

License

Stars

Watchers

Forks

Languages