First-Explore

This repo reproduces the results from the paper, First-Explore, then Exploit: Meta-Learning Intelligent Exploration. First-Explore is a general framework for meta-RL in which two context-conditioned policies are trained, one to explore (gather an informative environment rollout based on the current context), and one to exploit (map the current context to high reward behaviour). Each time the policies are used in an environment, the context provided to the policies is all the previous explore rollouts in that environment. By learning two policies, First-Explore decouples Exploration from Exploitation, avoiding the conflict of having to do both simultaneously. This decoupling allows First-Explore to intentionally perform exploration that requires sacrificing episode reward (e.g., spending a whole episode training a new skill the agent is bad at, for example practicing with an unfamilair difficult-to-use-but-effective-once-mastered weapon in a fighting game).

As First-Explore is a meta-RL framework, it is trained on a distribution of environments. Training on a distribution allows the policies to learn (via weight updates) how to best do the following: in-context adapt to perform the policy task (exploration or exploitation) based on the prior that an encountered environment is sampled from the training environment distribution. Once trained, the policies then learn about new environments via in-context adaption (with that adaptation to a new environment being the analogue of standard-RL training on a new environment).

Note: this repo is just an example instance of First-Explore. First-Explore is a framework and is applicable to general meta-RL.

Repo Structure:

Plots:

Plots contains the code for reproducing the plots, as well as saved models.
This done via the notebooks. Running all cells in the notebook produces the figures in the paper.

Code:

darkroom contains the code for the dark treasure room environment.
lte_code contains the code for First-Explore, as well as the Bandit environment.

Runs:
The four run folders contain code to replicate the experiments training the first-explore models for the two environments, as well as the always-exploit controls.

Each folder contains:

the .sh script that is used in a slurm environment to launch the python training script on a server.
the .py script that performs the training runs, when passed the appropriate arguments (see the .sh script).
folders with all the trained models, (saved as run_data.pkl).

Setup:

The python environment used, (e.g., 'hf' in the .sh scripts), is specified by the requirements.txt file. This environment should be set as the python kernel of the notebooks. Note, this uses jax with GPU support, which can sometimes be tricky to install, e.g., locally on a mac.

Example Installation for Linux:

python3 -m venv [env_name]
source [env_name]/bin/activate
pip install --upgrade pip
pip install jaxlib==0.3.25+cuda11.cudnn82 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html --only-binary=jaxlib
pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
bandit_always-exploit_runs		bandit_always-exploit_runs
bandit_runs		bandit_runs
darkroom		darkroom
lte_code		lte_code
plots		plots
treasure-room_always-exploit_runs		treasure-room_always-exploit_runs
treasure-room_runs		treasure-room_runs
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bandit_always-exploit_runs

bandit_always-exploit_runs

bandit_runs

bandit_runs

darkroom

darkroom

lte_code

lte_code

plots

plots

treasure-room_always-exploit_runs

treasure-room_always-exploit_runs

treasure-room_runs

treasure-room_runs

readme.md

readme.md

requirements.txt

requirements.txt

Repository files navigation

First-Explore

Repo Structure:

Setup:

About

Releases

Packages

Languages

btnorman/First-Explore

Folders and files

Latest commit

History

Repository files navigation

First-Explore

Repo Structure:

Setup:

About

Topics

Resources

Stars

Watchers

Forks

Languages