Skip to content
mcastron edited this page Aug 26, 2015 · 8 revisions

BBRL (Benchmarking tools for Bayesian Reinforcement Learning) is a C++ open-source library for Bayesian reinforcement learning (discrete state/action spaces). We decided to develop these tools by gathering as many agents and experiments as possible in order to provide a unified framework for Bayesian reinforcement learning.

Problem Statement

Reinforcement Learning is a field of Machine Learning, which aims to learn how to behave in different circumstances based on previous experiences. We consider an agent interacting with an environment. At each time-step, the agent is in a given state and has to perform an action. In return, the environment sends a reward and moves the agent to another state. The goal of the agent is to optimise the rewards collected by taking good decisions, based on its current state and what has happened during the previous interactions.

The environment is generally represented by a Markov Decision Process (MDP), which is defined as follows:

  • The state space, which is the set of all possible states in which an agent can be.
  • The action space, which is the set of all possible actions that an agent can perform.
  • The transition function, which is the function defining how to move an agent from one state to another based on the current state and the action it chose to perform.
  • The reward function, which is the function defining the reward to send based on the current state and the action it chose to perform.

We consider the transition function to be unknown. The state and action spaces are discrete. Besides, we assume the existence of some prior knowledge on the transition function of the MDP to be played. This prior knowledge is encoded in the form of an MDP distribution, accessible before interacting with the real MDP. During this offline-learning phase, the time consumed by the agent is monitored.

In order to provide a fair comparison between the different agents, different experiments are defined. Running an experiment for an agent in a given experiment aims to test this agent on the MDPs drawn from a test distribution, defined in the experiment. Each test is independent. ((The agent cannot learn from one test to improve its performances on the others.)) We monitor the time consumed by the agents during both the offline and online phases.

The agents are compared with respect to their performances and the time consumed during the offline and online phases.

Main Advantages

  • Exhaustive command-line options, allowing you to perform your experiments easily.
  • Agents, MDPs, MDP distributions and experiments can be saved in files.
  • During very long experiments, you can create backups automatically and restart from where you have stopped.
  • Every single trajectory encountered during an experiment is saved.
  • Multi-threading is supported for some implemented algorithms.
  • Your results are automatically stored in a neat Latex report, containing Gnuplot graphs and latex tables.
  • BBRL source code is well documented, allowing anyone to add their own agents/experiments.