AgileRL v0.1.21 introduces contextual multi-armed bandit algorithms to the framework. Train agents to solve complex optimisation problems with our two new evolvable bandit algorithms!
This release includes the following updates:
- Two new evolvable contextual bandit algorithms: Neural Contextual Bandits with UCB-based Exploration and Neural Thompson Sampling
- A new contextual bandits training function, enabling the fastest and easiest training
- A new BanditEnv class for converting any labelled dataset into a bandit learning environment
- Tutorials on using AgileRL bandit algorithms with evolvable hyperparameter optimisation for SOTA results
- New demo and benchmarking scripts for bandit algorithms
-
- more!
More updates will be coming soon!