PyTorch implementation of the DARLA reinforcement learning pipeline, using PPO to learn a policy from the ß-VAE's latent state
https://arxiv.org/pdf/1707.08475.pdf
- Learn disentangled features of the environment using a random agent in an unsupervised domain
- Learn a policy for the source domain (in this case with PPO) using the learned state representation from step 1
- Test the policy from step 2 on the target domain