Reinforcement learning homework IBIO4615
Dependencies
- Python3.5+
- Pytorch 1.0.1.
- TensorFlow 1.2
- gym, matplotlib, numpy, tensorboardx
pip install gym
pip install tensorboardx
pip install tensorflow=1.2
- Original paper: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
Tasks:
- Play with the hyperparameters and show their corresponding graphs. Which parameter caused the most change? Which one didn’t affect that much? Discuss briefly your results
- Anneal the 𝞮 hyperparameter to decay linearly instead of being fixed? Did it help at all? Why?
- Try two different architectures and report any results
- Original paper: https://arxiv.org/abs/1509.02971
- OPENAI Baselines post: https://blog.openai.com/better-exploration-with-parameter-noise/
Tasks:
- Change DDPG to Mountain car (May tune a bit the hyperparameters as constant time systems are different). Compare with DQN (# of episodes till convergence)
- (Optional) As you see reward/cost penalize control law/actions change it so it penalize more control energy used and plot u(t) for different initial positions of the pendulum.
Note that DDPG is feasible about hyper-parameters. You should fine-tuning if you change to another environment.
Episode reward in Pendulum-v0: