RL Benchmarks: A Historic Recollection of Reinforcement Learning Benchmarks

Pre-Deep Learning Era (2009-2014)

Tanner, Brian, and Adam White. 2009. “RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments.” Journal of Machine Learning Research: JMLR 10 (74): 2133–36. [paper]
Laird, John E., and Robert E. Wray III. 2010. “Cognitive Architecture Requirements for Achieving AGI.” In Proceedings of the 3d Conference on Artificial General Intelligence (AGI-10). Paris, France: Atlantis Press. [paper]
Whiteson, Shimon, Brian Tanner, and Adam White. 2010. “The Reinforcement Learning Competitions.” In AI Magazine. [paper]
Whiteson, Shimon, Brian Tanner, Matthew E. Taylor, and Peter Stone. 2011. “Protecting against Evaluation Overfitting in Empirical Reinforcement Learning.” In 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL). IEEE. [paper]
Schaul, Tom, Julian Togelius, and Jürgen Schmidhuber. 2011. “Measuring Intelligence through Games.” arXiv. [paper]
Bellemare, Marc G., Yavar Naddaf, Joel Veness, and Michael Bowling. 2012. “The Arcade Learning Environment: An Evaluation Platform for General Agents.” arXiv [cs.AI]. arXiv. [paper]
Adams, Sam, Itmar Arel, Joscha Bach, Robert Coop, Rod Furlan, Ben Goertzel, J. Storrs Hall, et al. 2012. “Mapping the Landscape of Human-Level Artificial General Intelligence.” AI Magazine 33 (1): 25–42. [paper]
Riedmiller, Martin, Manuel Blum, Thomas Lampe, Roland Hafner, Sascha Lange, and Stephan Timmer. 2013. CLSquare: Closed Loop Simulation System. [paper]
Schaul, Tom. 2013. “A Video Game Description Language for Model-Based or Interactive Learning.” In 2013 IEEE Conference on Computational Inteligence in Games (CIG). IEEE. [paper]
Coleman, Oliver J., Alan D. Blair, and Jeff Clune. 2014. “Automated Generation of Environments to Test the General Learning Capabilities of AI Agents.” In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, 161–68. GECCO ’14. New York, NY, USA: Association for Computing Machinery. [paper]

The Deep Reinforcement Learning Era (2015-*)

Agarwal, Rishabh, Dale Schuurmans, and Mohammad Norouzi. 2019. “An Optimistic Perspective on Offline Reinforcement Learning.” arXiv [cs.LG]. arXiv. [paper]
Ahn, Michael, Henry Zhu, Kristian Hartikainen, Hugo Ponte, Abhishek Gupta, Sergey Levine, and Vikash Kumar. 2019. “ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots.” arXiv [cs.RO]. arXiv. [paper].
Beattie, Charles, Joel Z. Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright, Heinrich Küttler, Andrew Lefrancq, et al. 2016. “DeepMind Lab.” arXiv [cs.AI]. arXiv. [paper].
Brockman, Greg, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. “OpenAI Gym.” arXiv [cs.LG]. arXiv. [paper].
Chollet, François. 2019. “On the Measure of Intelligence.” arXiv [cs.AI]. arXiv. [paper].
Cobbe, Karl, Christopher Hesse, Jacob Hilton, and John Schulman. 2019. “Leveraging Procedural Generation to Benchmark Reinforcement Learning.” arXiv [cs.LG]. arXiv. [paper].
Cobbe, Karl, Oleg Klimov, Chris Hesse, Taehoon Kim, and John Schulman. 2018. “Quantifying Generalization in Reinforcement Learning.” arXiv [cs.LG]. arXiv. [paper].
Collins, Jack, Jessie McVicar, David Wedlock, Ross Brown, David Howard, and Jürgen Leitner. 2019. “Benchmarking Simulated Robotic Manipulation through a Real World Dataset.” arXiv [cs.RO]. arXiv. [paper].
Côté, Marc-Alexandre, Ákos Kádár, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, et al. 2019. “TextWorld: A Learning Environment for Text-Based Games.” In Computer Games, 41–75. Springer International Publishing. [paper].
Crosby, Matthew, Benjamin Beyret, Murray Shanahan, José Hernández-Orallo, Lucy Cheke, and Marta Halina. 2020. “The Animal-AI Testbed and Competition.” In Proceedings of the NeurIPS 2019 Competition and Demonstration Track, edited by Hugo Jair Escalante and Raia Hadsell, 123:164–76. Proceedings of Machine Learning Research. PMLR. [paper].
Daniel Freeman, C., Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem. 2021. “Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation.” arXiv [cs.RO]. arXiv. [paper].
Duan, Yan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. 2016. “Benchmarking Deep Reinforcement Learning for Continuous Control.” arXiv [cs.LG]. arXiv. [paper].
Dulac-Arnold, Gabriel, Nir Levine, Daniel J. Mankowitz, Jerry Li, Cosmin Paduraru, Sven Gowal, and Todd Hester. 2020. “An Empirical Investigation of the Challenges of Real-World Reinforcement Learning.” arXiv [cs.LG]. arXiv. [paper].
Fan, Linxi, and Yuke Zhu. 2018. “SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark.” In. [paper].
Fortunato, Meire, Melissa Tan, Ryan Faulkner, Steven Hansen, Adrià Puigdomènech Badia, Gavin Buttimore, Charles Deck, Joel Z. Leibo, and Charles Blundell. 2019. “Generalization of Reinforcement Learners with Working and Episodic Memory.” In Advances in Neural Information Processing Systems, 12448–57.
Fu, Justin, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. 2020. D4RL: Datasets for Deep Data-Driven Reinforcement Learning.” arXiv [cs.LG]. arXiv. [paper].
Fujimoto, Scott, Edoardo Conti, Mohammad Ghavamzadeh, and Joelle Pineau. 2019. “Benchmarking Batch Deep Reinforcement Learning Algorithms.” arXiv [cs.LG]. arXiv. [paper].
Gulcehre, Caglar, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, et al. 2020. “RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning.” arXiv [cs.LG]. arXiv. [paper].
Guss, William H., Cayden Codel, Katja Hofmann, Brandon Houghton, Noboru Kuno, Stephanie Milani, Sharada Mohanty, et al. 2019. “The MineRL 2019 Competition on Sample Efficient Reinforcement Learning Using Human Priors.” arXiv [cs.LG]. arXiv. [paper].
Guss, William H., Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, and Ruslan Salakhutdinov. 2019. “MineRL: A Large-Scale Dataset of Minecraft Demonstrations.” arXiv [cs.LG]. arXiv. [paper].
James, Stephen, Zicong Ma, David Rovick Arrojo, and Andrew J. Davison. 2019. “RLBench: The Robot Learning Benchmark & Learning Environment.” arXiv [cs.RO]. arXiv. [paper].
Johnson, Matthew, Katja Hofmann, Tim Hutton, and David Bignell. 2016. “The Malmo Platform for Artificial Intelligence Experimentation.” In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 4246–47. IJCAI’16. AAAI Press. [paper].
Juliani, Arthur, Vincent-Pierre Berges, Ervin Teng, Andrew Cohen, Jonathan Harper, Chris Elion, Chris Goy, et al. 2018. “Unity: A General Platform for Intelligent Agents.” arXiv [cs.LG]. arXiv. [paper].
Juliani, Arthur, Ahmed Khalifa, Vincent-Pierre Berges, Jonathan Harper, Ervin Teng, Hunter Henry, Adam Crespi, Julian Togelius, and Danny Lange. 2019. “Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning.” arXiv [cs.AI]. arXiv. [paper].
Kannan, Harini, Danijar Hafner, Chelsea Finn, and Dumitru Erhan. 2021. “RoboDesk Environment v0.” 2021. [paper].
Kempka, Michał, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech Jaśkowski. 2016. “ViZDoom: A Doom-Based AI Research Platform for Visual Reinforcement Learning.” arXiv [cs.LG]. arXiv. [paper]
Kurach, Karol, Anton Raichuk, Piotr Stańczyk, Michał Zając, Olivier Bachem, Lasse Espeholt, Carlos Riquelme, et al. 2019. “Google Research Football: A Novel Reinforcement Learning Environment.” arXiv [cs.LG]. arXiv. [paper].
Küttler, Heinrich, Nantas Nardelli, Alexander H. Miller, Roberta Raileanu, Marco Selvatici, Edward Grefenstette, and Tim Rocktäschel. 2020. “The NetHack Learning Environment.” arXiv [cs.LG]. arXiv. [paper].
Le Paine, Tom, Caglar Gulcehre, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, et al. 2019. “Making Efficient Use of Demonstrations to Solve Hard Exploration Problems.” arXiv [cs.LG]. arXiv. [paper].
Lee, Youngwoon, Edward S. Hu, Zhengyu Yang, Alex Yin, and Joseph J. Lim. 2019. “IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks.” arXiv [cs.RO]. arXiv. [paper].
Leibo, Joel Z., Cyprien de Masson d’Autume, Daniel Zoran, David Amos, Charles Beattie, Keith Anderson, Antonio García Castañeda, et al. 2018. “Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents.” arXiv [cs.AI]. arXiv. [paper].
Machado, Marlos C., Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, and Michael Bowling. 2018. “Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents.” The Journal of Artificial Intelligence Research 61 (1): 523–62. [paper].
Mbuwir, Brida V., Carlo Manna, Fred Spiessens, and Geert Deconinck. 2020. “Benchmarking Reinforcement Learning Algorithms for Demand Response Applications.” In 2020 IEEE PES Innovative Smart Grid Technologies Europe (ISGT-Europe), 289–93. [paper].
Nichol, Alex, Vicki Pfau, Christopher Hesse, Oleg Klimov, and John Schulman. 2018. “Gotta Learn Fast: A New Benchmark for Generalization in RL.” arXiv [cs.LG]. arXiv. [paper].
Osband, Ian, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, et al. 2019. “Behaviour Suite for Reinforcement Learning.” arXiv [cs.LG]. arXiv. [paper].
Perez-Liebana, Diego, Spyridon Samothrakis, Julian Togelius, Tom Schaul, Simon M. Lucas, Adrien Couëtoux, Jerry Lee, Chong-U Lim, and Tommy Thompson. 2016. “The 2014 General Video Game Playing Competition.” IEEE Transactions on Computational Intelligence in AI and Games 8 (3): 229–43. [paper].
Platanios, Emmanouil Antonios, Abulhair Saparov, and Tom Mitchell. 2020. “Jelly Bean World: A Testbed for Never-Ending Learning.” [paper].
Rajeswaran, Aravind, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, and Sergey Levine. 2017. “Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations.” arXiv [cs.LG]. arXiv. [paper].
Ray, Alex, Joshua Achiam, and Dario Amodei. 2019. “Benchmarking Safe Exploration in Deep Reinforcement Learning.” [paper].
Samvelyan, Mikayel, Robert Kirk, Vitaly Kurin, Jack Parker-Holder, Minqi Jiang, Eric Hambro, Fabio Petroni, Heinrich Kuttler, Edward Grefenstette, and Tim Rocktäschel. 2021. “MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research.” [paper].
Savva, Manolis, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, et al. 2019. “Habitat: A Platform for Embodied AI Research.” arXiv [cs.CV]. arXiv. [paper].
Szot, Andrew, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, et al. 2021. “Habitat 2.0: Training Home Assistants to Rearrange Their Habitat.” arXiv [cs.LG]. arXiv. [paper].
Tassa, Yuval, Saran Tunyasuvunakool, Alistair Muldal, Yotam Doron, Piotr Trochim, Siqi Liu, Steven Bohez, et al. 2020. “Dm_control: Software and Tasks for Continuous Control.” arXiv [cs.RO]. arXiv. [paper].
Wang, Jane X., Michael King, Nicolas Porcel, Zeb Kurth-Nelson, Tina Zhu, Charlie Deck, Peter Choy, et al. 2021. “Alchemy: A Structured Task Distribution for Meta-Reinforcement Learning.” arXiv [cs.LG]. arXiv. [paper].
Wang, Tingwu, Xuchan Bao, Ignasi Clavera, Jerrick Hoang, Yeming Wen, Eric Langlois, Shunshi Zhang, Guodong Zhang, Pieter Abbeel, and Jimmy Ba. 2019. “Benchmarking Model-Based Reinforcement Learning.” arXiv [cs.LG]. arXiv. [paper].
Yu, Tianhe, Deirdre Quillen, Zhanpeng He, Ryan Julian, Avnish Narayan, Hayden Shively, Adithya Bellathur, Karol Hausman, Chelsea Finn, and Sergey Levine. 2019. “Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning.” arXiv [cs.LG]. arXiv. [paper].
Zhang, Amy, Yuxin Wu, and Joelle Pineau. 2018. “Natural Environment Benchmarks for Reinforcement Learning.” arXiv [cs.LG]. arXiv. [paper].

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
benchmarks.bib		benchmarks.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

_config.yml

_config.yml

benchmarks.bib

benchmarks.bib

Repository files navigation

RL Benchmarks: A Historic Recollection of Reinforcement Learning Benchmarks

Pre-Deep Learning Era (2009-2014)

The Deep Reinforcement Learning Era (2015-*)

About

Releases

Packages

Languages

License

manfreddiaz/rl-benchmarks

Folders and files

Latest commit

History

Repository files navigation

RL Benchmarks: A Historic Recollection of Reinforcement Learning Benchmarks

Pre-Deep Learning Era (2009-2014)

The Deep Reinforcement Learning Era (2015-*)

About

Resources

License

Stars

Watchers

Forks

Languages