Mixed_Input_PPO_CNN_LSTM_Car_Navigation

A car-agent navigates in complex traffic conditions by Mixed_Input_PPO_CNN_LSTM model.

Accepted by AROB 2021

Find my paper Generation of Traffic Flows in Multi-Agent Traffic Simulation with Agent Behavior Model based on Deep Reinforcement Learning.

Feature extraction and image inverse generation

Partially Observable Markov Games

In this work, I consider an agent extension of Markov decision processes(MDPs) called partially observable Markov games.
Every cycle the agent will obtain an observation which makes the agent become the image's center.
And the inverse generated images are extracted by features of which the agent should be careful. For example, the front cars and behind cars.

Mixed input architecture

Sequential data && LSTM

Input [real_speed/10, target_speed/10, elapsed_time_ratio,reward,done,time_pass,over]
Station representation: [real_speed/10, target_speed/10, elapsed_time_ratio,]
It's notable that the data elements have some relation rather than random distribute.
The　target_speed is a constant value while the elapsed_time_ratio and distance_to_goal are monotonically increasing or monotonically decreasing data.
So we can consider to use LSTM, a kind of Recurrent Neural Network(RNN), can find temporal relationship between datas.
To comfirm this, I input [t-2,t] three datas in a bunch once time. Also applies to images.
And the LSTM layers will use (h_t-1, c_t-1) hidden sate for time t.

Traffic conditions && Collision Detection

When a car-agent navigates on the road, it may encounter with other cars.
In some conditions, the acceleration chosen by car-agent will cause jam or collision.
Since the condition will come very complex and the GAMA simulator has no idea about the collision so I have to make collision detection or jam detection.
Here will choose the closest 10 cars around the agent and calculate the distances.
These equations are neccessary. And here will use Euclidean distance for safe driving.
$S = v_{0}*t + \frac{1}{2}at^{2}$
$v_{n+1} = v_{n}+a_{n}t_{n}$

On the same road

First, the agent compute the useful distances (There will be distance of the behind car or distance of the front car).
And then detections will be executed after the agent choose acceleration to detecte whether the acceleration will cause jams or collisions.
A unit of time is 1-cycle.

Collision Detection

When there is an another car is in front of the car-agent when the two cars are on the same road, if
$EuclideanDistance&space;+&space;v_{car}*t&space;\leq&space;v_{agent}*t+\frac{1}{2}*a*t^{2}$
the acceleration will be supposed to cause collision with the front cars. (The front cars maybe more than one.)

Jam Detection

When there is an another car is behind of the car-agent when the two cars are on the same road, if
$EuclideanDistance + v_{agent}*t+\frac{1}{2}*a*t^{2} \leq v_{car}*t$
the acceleration will be supposed to cause jam with the behind cars. (The behind cars maybe more than one.)

Jam

On the different road

The calculation process is the same as the conditions on the same road.But the conditions become very complex.
The closest 10 cars will on the same road with the agnet?
If so, will the cars be the front of the agent or behind of the agent?
These conditions will be detected clear in the gaml file.

Station representation

[real_speed/10, target_speed/10, elapsed_time_ratio, distance_to_goal/100,distance_front_car/10,distance_behind_car/10]

Action representation

The network's output are accelerations which are constricted between [-6,6]m/s^2 to be closer to the real situations.

Reward shaping

　Output acceleration. 　Action representation [acceleration]. 　The car will learn to control its acceleration with the restructions shown below:
　Reward shaping:

rt = r terminal + r danger + r speed
r terminal： -0.013(target_speed > real_speed) or -0.1(target_speed < real_speed)：crash / time expires
r speed： related to the target speed
if sa ≤st:0.001 - 0.004*((target_speed-Instantaneous_speed)/target_speed);
　　if distance_front_car_before <= safe_interval or time_after_safe_interval>0:0.001*(Instantaneous_speed/target_speed);
　　Time_after_safe_interval can be extented when the front cars within safe_interval.
if sa > st: 0.001 - 0.006*((Instantaneous_speed-target_speed)/target_speed);

　In my experiment it's obviously I desire the agent to learn controling its speed around the target-speed.

Result

　It's obvoiusly that the LSTM can be trained much better than models without LSTM.

Actor-Ctitic 2 LSTM

Actor-Ctitic 0 LSTM

PPO2

$J^{\theta&space;'}(\theta&space;)&space;=&space;\sum&space;min(\frac{p_{\theta'&space;}}{p_{\theta&space;}}*A_{\theta&space;}(s_{t&space;},a_{t&space;})),clip(\frac{p_{\theta'&space;}}{p_{\theta&space;}},1-\varepsilon&space;,1+\varepsilon)*A_{\theta&space;}(s_{t&space;},a_{t&space;}))$

About GAMA

　The GAMA is a platefrom to do simulations.
　I have a GAMA-modle named "PPO_Mixedinput_Navigation.gaml", which is assigned a car and some traffic lights. The model will sent some data
　[real_speed, target_speed, elapsed_time_ratio, distance_to_goal,reward,done,time_pass,over]
　as a matrix to python environment, calculating the car's accelerate by A2C. Applying to the Markov Decision Process framework, the car in the GAMA will take up the acceleration and send the latest data to python over and over again until reaching the destination.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
GAMA_R		GAMA_R
GAMA_img		GAMA_img
result		result
training_data		training_data
(NEW)A_C_TD_CNN_3LSTM_GAMA_Navigation.py		(NEW)A_C_TD_CNN_3LSTM_GAMA_Navigation.py
(NEW)A_C_TD_CNN_3_LSTM_GAMA_Navigation.PY		(NEW)A_C_TD_CNN_3_LSTM_GAMA_Navigation.PY
(NEW)utils.py		(NEW)utils.py
A_C_TD_CNN_2_LSTM_GAMA_Navigation.py		A_C_TD_CNN_2_LSTM_GAMA_Navigation.py
A_C_TD_CNN_3LSTM_GAMA_Navigation.py		A_C_TD_CNN_3LSTM_GAMA_Navigation.py
A_C_TD_CNN_3_LSTM_GAMA_Navigation.py		A_C_TD_CNN_3_LSTM_GAMA_Navigation.py
A_C_TD_CNN_nonLSTM_GAMA_Navigation.py		A_C_TD_CNN_nonLSTM_GAMA_Navigation.py
Actor_Critic__TD_CNN_LSTM_GAMA_Navigation.py		Actor_Critic__TD_CNN_LSTM_GAMA_Navigation.py
CV_input.py		CV_input.py
PPO_MC_CNN_GAMA_Navigation_main.py		PPO_MC_CNN_GAMA_Navigation_main.py
PPO_Mixedinput_Navigation.gaml		PPO_Mixedinput_Navigation.gaml
PPO_Mixedinput_Navigation_v2.gaml		PPO_Mixedinput_Navigation_v2.gaml
PPO_Mixedinput_Navigation_v4.gaml		PPO_Mixedinput_Navigation_v4.gaml
PPO_Mixedinput_Navigation_v5.gaml		PPO_Mixedinput_Navigation_v5.gaml
PPO_Mixedinput_Navigation_v6.gaml		PPO_Mixedinput_Navigation_v6.gaml
PPO_TD_CNN_2LSTM.py		PPO_TD_CNN_2LSTM.py
PPO_TD_CNN_3LSTM(New).py		PPO_TD_CNN_3LSTM(New).py
PPO_TD_CNN_3LSTM.py		PPO_TD_CNN_3LSTM.py
PPO_TD_CNN_3_GAMA_Navigation_main.py		PPO_TD_CNN_3_GAMA_Navigation_main.py
PPO_TD_CNN_3single.py		PPO_TD_CNN_3single.py
PPO_TD_CNN_GAMA_Navigation_main.py		PPO_TD_CNN_GAMA_Navigation_main.py
PPO_TD_CNN_nolstm.py		PPO_TD_CNN_nolstm.py
README.md		README.md
utils.py		utils.py

ZHONGJunjie86/Mixed_Input_PPO_CNN_LSTM_Car_Navigation

Folders and files

Latest commit

History

Repository files navigation