Skip to content

Using deep reinforcement learning to play Snake game. The used algorithm is PPO for discrete! It has the brilliant performance in the field of discrete action space just like in continuous action space. You just need half an hour to train the snake and then it can be as smart as you.|使用深度强化学习玩蛇游戏。 使用的算法是离散的 PPO! 它在离散动作空间领域有着与连续动作空间一样的出色表现。

License

Notifications You must be signed in to change notification settings

MuGeminorum/Snake-AI

Repository files navigation

Snake-AI

Python application license

This project aims to use deep reinforcement learning (DRL) to play Snake game automatically. The core DRL method used here is PPO for discrete, which has brilliant performance in the field of discrete action space like in continuous action space. You just need half an hour to train the snake agent and then it can take effect.

Requirements

conda create -n ppo --yes --file conda.txt
conda activate ppo
pip install -r requirements.txt

Usage

Train

python train.py # after training, the training curve of current round will autometically show
python snake.py # evaluate latest saved model

Evaluate assigned model

python evaluate.py --weight ./model/act-weight_round3_472_82.5.pkl

Plot assigned reward log

python plotter.py --history ./logs/reward_round3_82.5.csv

Experiments

Round 1 2 3
Traing curve round1 round2 round3
Evaluation round1 round2 round3
Reward_eat +2.0 +2.0 +2.0
Reward_hit -0.5 -1.0 -1.5
Reward_bit -0.8 -1.5 -2.0
Avg record ≈19 ≈23 ≈28

Conclusions

  1. Increasing the penalty for death leads to higher average records
  2. The training result of the low death penalty strategy has a low reward curve, but it performs well in the demo
  3. A particularly high reward for eating food can lead to quick success regardless of long-term safety

Future work

  1. Training time is too short to reflect the advantages of DRL compared to none-DRL method (Snaqe)
  2. The zigzag of snake body looks ugly, try to add punishment into reward for too many zigzags

About

Using deep reinforcement learning to play Snake game. The used algorithm is PPO for discrete! It has the brilliant performance in the field of discrete action space just like in continuous action space. You just need half an hour to train the snake and then it can be as smart as you.|使用深度强化学习玩蛇游戏。 使用的算法是离散的 PPO! 它在离散动作空间领域有着与连续动作空间一样的出色表现。

Topics

Resources

License

Stars

Watchers

Forks

Languages