car_RL_AI

A simple Reinforcement Learning Car AI using Proximal Policy Optimization
Implementation using Unity ML-agent

Contents - 目次

Environment Setup - 環境の設定
- Checkpoint Method Environment
- Blocks Method Environment
Training - 学習
- Failure Train
- Successful Train
Trained Result - 学習結果
- Original Race Track
- Random Race Track
Method - 実現方法

Popular Environment Setup

There are many different methods to set up the environment, and many methods are heuristic. They are flexible for changes based on the goal you want to achieve during the training process.

In my implementation, my method is mainly based on Blocks Method which will be explained in following section.

Checkpoint Method Environment

Setting a lot of checkpoint over the roads, the agent will learn how to reach the next checkpoint as fast as possible.
Usually car will shoot rays to detect the distance between car and checkpoint plane
Advantages : Easy to setup, fits any kinds of race track
Disadvantages : You have to manually set up checkpoint planes for the track. It can be done automatically, however it may come some failure cases where you have to adjust by yourself.

Blocks Method Environment (My Method)

Separate the road track by blocks, different blocks will give the agent a different desired direction. I used 3 different blocks, straight, turn left 90, turn right 90 degree.
Advantages : If the car trained to be able to handle all different blocks, then theoretically any kind of race track made by the default blocks could work well.
Disadvantages : You only can use the race track made by the pre-defined blocks.

Training

Failure Train

As you can see, the cars will always fail at the 180 degree left turn.
The reason is, I didn't include 180 left turn at the beginning part of the track, the agents spend too much iterations to learn how to 90 degree turns, and drive straight
It makes them hard to optimize to deal with a 180 degree left turn because the neural network's weights are trained too much iternations with 90 degree turns.
Solution is making a new track,and make sure the new track includes all possible turns at beginning part of the track. So the cars could fully learn all possible turns by taking advantages of initial random sampled action.

Successful Train

Now the new track contains, left 90, right 90, left 180 (2x left 90), right 180 (2x right 90), and straight road at the beginning.
Agents could quickly adapts the all possible situations and weights are optimized to handle all kinds of turns.

Step: 200000.
Time Elapsed: 628.180 s. 
Mean Reward: 48.482. 
Std of Reward: 59.216.

x3 speed training record

Trained Result

Run the agent in different race tracks to test its performance
On original race track

On cycle race track

On random race track

3rd person view

Method

Agent

Number of observation states : 7 , including 6 rays check the surrounds, return the distance between car and wall, another extra state store the angle between car and desired moving direction
Number of actions : 2, forward speed and turning speed
Reward: take the displacement between 2 frames, if car going toward desired direction, giving a positive reward proportional to the displacement, 0 if going backward, -1 if hit wall and shut down simulation immediately.

Higher number of hidden units or layers will cause overfitting

Training Parameters

    trainer_type: ppo
    hyperparameters:
      batch_size: 10
      buffer_size: 100
      learning_rate: 3.0e-4
      beta: 5.0e-4
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 3
      learning_rate_schedule: linear
      beta_schedule: constant
      epsilon_schedule: linear
    network_settings:
      normalize: false
      hidden_units: 128
      num_layers: 2
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    max_steps: 500000
    time_horizon: 64
    summary_freq: 10000

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Assets		Assets
Packages		Packages
ProjectSettings		ProjectSettings
UserSettings		UserSettings
images		images
.gitignore		.gitignore
.vsconfig		.vsconfig
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assets

Assets

Packages

Packages

ProjectSettings

ProjectSettings

UserSettings

UserSettings

images

images

.gitignore

.gitignore

.vsconfig

.vsconfig

README.md

README.md

Repository files navigation

car_RL_AI

Popular Environment Setup

Checkpoint Method Environment

Blocks Method Environment (My Method)

Training

Failure Train

Successful Train

Trained Result

Method

Agent

About

Releases

Packages

Languages

tomcatmew/car_RL_AI

Folders and files

Latest commit

History

Repository files navigation

car_RL_AI

Popular Environment Setup

Checkpoint Method Environment

Blocks Method Environment (My Method)

Training

Failure Train

Successful Train

Trained Result

Method

Agent

About

Resources

Stars

Watchers

Forks

Languages