Skip to content

ryanrudes/rc2020

Repository files navigation

RC2020

ML Reproducibility Challenge 2020 is a community challenge for machine learning enthusiasts, students, and researchers in which participants select a paper from one of the prestigious ML conferences of the year (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR or ECCV). In an attempt to relicate the paper, participants provide additional support either towards or against the claims and results of the work. Alongside the goal of evaluating the validity/legitemacy of recent research, this project primary serves to assess the reproducibility/replicability of Machine Learning research.

Here is a collection of my work-in-progress code for the challenge. I am working on Discovering Reinforcement Learning Algorithms by DeepMind.

This code is not expected to be very well organized until towards the end of the project. I am merely dropping the source code thus far into this repo each time I update it.

Progress Log

Thurday, November 5, 2020:

  • Implemented all Grid World environments, including Tabular Grid World and Random Grid World, and the five maps for each type of grid world.

GridWorld

Sunday, November 8, 2020

  • Implemented all Delayed Chain MDP environments, which includes 4 standard maps and 1 unique mode.

DelayedChainMDP

Saturday, November 14, 2020

  • Rewrote the Grid World environments, doubling the speed of simulation and improving the rendering graphics

GridWorld

Tuesday, November 17, 2020

  • Wrote the Agent abstract class, and all subsequent agent concrete classes, including TabularAgent (for the Tabula Grid World environment), FunctionalAgent (for environments demanding function approximation, ie. Random Grid World and Delayed Chain MDP + State Distraction), and BinaryAgent (for the standard Delayed Chain MDP environments without state distraction).

Wednesday, November 18, 2020

  • Wrote the LPG Model class and the Embedding layer it uses to encode the categorical prediction vector

Sunday, November 22, 2020

  • Began to attempt the first tests at a simple implementation. Work in progress, but starting to write the overall code.

Saturday, November 28, 2020

  • Began writing the final code implementation.

Task Log

  • Read DeepMind's Discovering Reinforcement Learning Algorithms

  • Write the gym Class for the custom Grid World environments

  • Write the gym Class for the custom MDP environments

  • Fix a rendering bug in the Random Grid World environment that causes some squares to remain lit although the reward located on that square was already collected.

  • Write a custom TensorFlow model for the Learned Policy Gradient architecture

  • Write some classes for the various agent structures for each training environment

    • Agent abstract class
    • TabularAgent for Tabular Grid World
    • FunctionalAgent for Random Grid World and Delayed Chain MDP + State Distraction
    • BinaryAgent for Delayed Chain MDP
  • Implement the agent update

  • Implement the Learned Policy Gradient algorithm

  • Train on each environment

  • Test the learned update rule on each Atari environment

    • Write a model class for the network architecture specified by the authors — C(32)-C(64)-C(64)-D(512)
    • Test performance over 20 million frames on each of Montezuma Revenge, Ms. Pacman, Riverraid, Pitfall, Tutankham (the environments in which LPG outperformed/matched A2C)