Skip to content

Master's thesis project on learning stateful simulations with deep differentiable models. The focus is to train a neural network to simulate a game (PONG) end-to-end.

Notifications You must be signed in to change notification settings

ichko/forward-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Forward model

Previous notebooks and experiments can be found here.

Experiments and models for my masters thesis on learning environment dynamics from observations.


The model in action

Example visualization of assets

Diagram of the Asset Spatial Transformer RNN

Asset Spatial RNN

Structure of the project

src
  data
  loggers           (implementation of the wnb logger)
  models            (implementations of models)
  pipelines
    train           (training module)
    eval            (evaluation module)
    train_eval_all  (script for running train and eval on all the configurations)
    config          (configuration of all the runs)
  scrips            (scrips for visualization)
  utils             (generic utilities and modules)

How to run

You will need Python >= 3.6

  1. Install the requirements:
pip install -r requirements.txt
  1. Execute:
wandb login

to log in wandb and be able to view the training results.

  1. Create the following folders in the root directory:
.reports
.models
.results
  1. Run train and eval for all of the configurations setup in src/pipelines/config - all of the models described in the thesis.
python -m src.pipeline.train_eval_all

Observe the results in the generated wnb project.

Visualizations

Visualize live model reconstruction with:

python -m src.scripts.play_model

set the desired model configuration at the top of the file - as a get_hparams parameter.

Visualize live Asset Spatial RNN model:

python -m src.scripts.animate_asset_model

Example visualization of assets

Visualization of a sweep of 16 train runs

Sweep of training runs

Examples of reconstructed PONG episodes

Rollout 1 Rollout 2 Rollout 3

Notes and tasks

  • Profiling code

    • pip install profiling
    • profiling live-profile -m src.pipeline.train -- --debug
  • General stuff

    • Mask out empty (padded) frames after rollout has finished. See here.
    • Label smoothing. Do I actually want that?
  • Models

    • RNN Deconvolution Baseline
    • Learn frame transformations
      • Instead of compressing the state like the RNN does
      • Action + Precondition (last few frames) -> transformation vector T
      • Use T to transform the current frame to the future frame
      • Play rollout of frame transformations - results in wandb look promising
  • Notes

    • 12.06.2020

      • Update implementation of RNN Deconv
      • Focus on making RNN deconv work on PONG
        • WHY RNN Deconv - it is the only model that can model PONG with the current setup of the data pipeline.
        • Frame transforming models need two frames as context
        • TODO: [ ] Train and save working RNN Deconv model [ ] Write playing script [ ] Write script for manipulating the latent RNN state and viewing the result?
    • 06.06.2020

      • Implement pong agent class + action mappings ([3, 3] => 9)
      • Make RNN Playable (interface like a gym)
    • 04.06.2020

      • [BUGFIX] Found major bug in RNN models - the pred frames and true frames were not aligned, the model was trying to predict the present from the present
      • [BUGFIX] TimeDistributed (decorator) module was not holding the wrapped module in it's state resulting in the parameters of the wrapped module not being part of the overall model, resulting in the model not being able to be trined. (Took quite some time)
      • [FEATURE] Implemented generic multiprocessing function spawner and random agent rollout generator that leads to newer rollouts in the training buffer faster. Hopefully this can reduce over-fitting.

About

Master's thesis project on learning stateful simulations with deep differentiable models. The focus is to train a neural network to simulate a game (PONG) end-to-end.

Topics

Resources

Stars

Watchers

Forks

Languages