Reinforcement-Learning

Reinforcement learning homework IBIO4615

Dependencies

pip install gym
pip install tensorboardx 
pip install tensorflow=1.2

DQN

Tasks:

Play with the hyperparameters and show their corresponding graphs. Which parameter caused the most change? Which one didn’t affect that much? Discuss briefly your results
Anneal the 𝞮 hyperparameter to decay linearly instead of being fixed? Did it help at all? Why?
Try two different architectures and report any results

Tasks:

Change DDPG to Mountain car (May tune a bit the hyperparameters as constant time systems are different). Compare with DQN (# of episodes till convergence)
(Optional) As you see reward/cost penalize control law/actions change it so it penalize more control energy used and plot u(t) for different initial positions of the pendulum.

Note that DDPG is feasible about hyper-parameters. You should fine-tuning if you change to another environment.

Episode reward in Pendulum-v0:

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
DDPG.py		DDPG.py
DQN.py		DQN.py
README.md		README.md