Tasks to be implemented

Deep Q Network Agent
Double Deep Q Network Agent
Prioritized Experience Replay
Grad-Cam Visualization (Double DQN)
Rainbox
Policy Gradient
Actor-Critic Algorithms
Deployment (onnx, opencv4nodejs, nodejs, target FPS for the agent in browser: 50 fps)

Testing in 50 FPS. A higher scores' version in OneDrive.

Using Grad_CAM, an explainable/interpretable AI approach for deep learning, to examine whether the agent treats the game as human. See Grad_CAM Visualization for details.

Quick Start

Create two directories manually or created by main.py automatically

mkdir result
mkdir weights

Learning or Testing

python3 main.py -c config1

The most detailed experiments and explaination of Chrome dinosaur in Deep Reinforcement Learning on GitHub

Chome dinosaur is a game very suitable for beginners in deep reinforcement learning because of its easy rules and environment setting. Although the game is easy for human but it is difficult for computer agent to learning it. Through this project, We will not only show the result of baseline DQN, but also compare its results with double DQN, Rainbow, policy gradient and Actor-Critic Algorithms.

Baseline DQN
Double DQN
Rainbow
Policy Gradient
Actor-Critic Algorithms

We have also implemented a real-time browser demo here. If you are not familar with Q-learing, you can visit a more fundamental project, Q-learning for Tic-Tac-Toe (GitHub Repo) and the real-time interactive streamlit demo.

The following is a detailed explaination of each approach and their environment setting in Chrome Dinosaur.

Game Environment in learning

No acceleration and no birds in the game for simplicity. If you want them, you can set the acceleration in game.py.
Two actions only: up and nothing. There are three actions in the game actually, down to evade the birds if there is acceleration.
Reward:
- Hit the obstacle: -1
- Otherwise: 0.1
Using selenium in python to capture the images from the game.
Using the version of Chrome Dinosaur in here:

Baseline DQN

Double DQN

Important Settings and Hyper-parameters

Training GPU: Nvidia RTX 3080 (12GB)
CPU:
Memory: 64 GB (if using prioritize replay buffer, should use at least 45 GB RAM)
Batch size: 32 (if too large, the overfitting will happen)
Buffer size: 100,000
Final epsilon: 0.1
FPS:
- Slow mode: 14.xx - 18.xx fps (with prioritize replay buffer)
- Fast mode: 50 fps (without prioritizied replay buffer)

Result:

Epsilon Decay

For the final epsilon, we believe that it should be the most important hyper-parameters to affect the learning process. We tried 0.03 and 0.01 and 0.0001 before but the agent is not stable. The scores achieved by the agents are very obsolete during learning and the agent in testing is totally garbage if the epsilon is too small. Giving more exploration to the agent in this game seems better. I tried to follow the hyparameter in this report first but the problem occurs in what I have mentioned before. The training score (epsilon = 0.0001) is shown in figure . The average and median score of this agent in testing for 20 episodes are 50.xx only.

Later, we tried the final epsilon = 0.1. Although the max score in learning is smaller than 1,000, the test score is very higher when we test the agent for 20 episodes after training 100 episodes each time.

FPS

The FPS in here refers to the number of frames the agent to predict the action per second instead of the FPS of the game rendered by javascript in browser.

Since the computation of prioritized replay buffer is much higher, our pc in this experiment can only achieve $\approx 15$ FPS during learning process. If we use normal replay buffer only, the FPS in learning is faster, $\approx 50$ FPS. Higher FPS seems to be more general in lower FPS also but not the reverse. However, to obtain the similar performance, keeping the fps in both training and test is preferred. The FPS in testing is much faster without the learning process. The FPS is $\approx 90$ FPS in our PC, which is even faster than the game rendered by javascript. Thus, we add a sleep() in test function to slow down the FPS as close as learning.

Fig.4 - Testing scores of a stable agents in 100 episodes (53.7 Avg FPS)

Rainbow

Policy Gradient

Actor-Critic Algorithms

Paper: Actor-Critic Algorithms

Grad_CAM Visualization in Double DQN

Since the computation time of Grad-CAM is very long, every second can only produce one heat map only even though we use GPU to speed up. Therefore, it is hard to produce real-time Gram-CAM images during test. Instead of visualizing the heat maps in real time, we save the states during testing and produce the Grad-Cam heat maps after testing.

Set cam_visualization in config1.py to be True. The states from the game will be save to the folder, test_states, for each testing episode.

"cam_visualization": True

Create heat_maps in GIF

python3 visualize_CAM.py -c config1

You can choose which testing episode you want to generate the heat maps under visualize_CAM

with open('./test_states/dino_states7.pickle', 'rb') as f:   # the 7-th test episode

The warmer area in the visualization is the area with higher weights, i.e. larger impact for these area's pixels to the final output of the agent. From the heat maps' visualization, it is very obvious that the area of the dinosaur and the obstacles always has higher impacts to the agent's action.

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
Grad_CAM		Grad_CAM
__pycache__		__pycache__
img/double_dqn		img/double_dqn
models		models
replay_buffer		replay_buffer
test_scores		test_scores
test_states		test_states
utils		utils
weights		weights
.gitignore		.gitignore
README.md		README.md
config1.py		config1.py
main.py		main.py
plot_score.py		plot_score.py
requirements.txt		requirements.txt
visualize_CAM.py		visualize_CAM.py

SamYuen101234/chrome_dino_RL

Folders and files

Latest commit

History

Repository files navigation

Tasks to be implemented

Quick Start

The most detailed experiments and explaination of Chrome dinosaur in Deep Reinforcement Learning on GitHub

Game Environment in learning

Baseline DQN

Double DQN

Important Settings and Hyper-parameters

Result:

Epsilon Decay

FPS

Rainbow

Policy Gradient

Actor-Critic Algorithms

Grad_CAM Visualization in Double DQN

About

Topics

Resources

Stars

Watchers

Forks

Languages