This is the repo of Traffic sign recognition project of Udacity Self-driving Car Nano degree, original repo can be found here
Build a Traffic Sign Recognition Project
The goals / steps of this project are the following:
- Load the data set (see below for links to the project data set)
- Explore, summarize and visualize the data set
- Design, train and test a model architecture
- Use the model to make predictions on new images
- Analyze the softmax probabilities of the new images
- Summarize the results with a written report
Here is a link to my project code
- The size of training set is 34799
- The size of the validation set is 4410
- The size of test set is 12630
- The shape of a traffic sign image is (32,32,3)
- The number of unique classes/labels in the data set is 43
Here is an exploratory visualization of the data set. It is a bar chart showing number of images of each class in the training dataset. I noticed that the distribution of classes is quiet unbalanced. Several classes have more than 700 examples (ex. speed limit (50km/h), speed limit (30km/h), yield...) where several have less than 100 examples (ex. Dangerous curve to the left, End of no passing, Speed limit (20km/h)).
As a first step, I decided to convert the images to grayscale to focus on traffic signs' pattern and accelerate the training process.
Here is an example of a traffic sign image before and after grayscaling.
As a last step, I standardized the image data by scaling pixel values to have a zero mean and unit variance.
As the training dataset has very unbalanced samples between classes, I decided to augment artificially the image data such that the training dataset gets a balanced class distribution.
To add more data to the the data set, I used random image transformation combined with horizontal/vertical shift, zoom, shear and rotation.
Here is an example of original image samples:
And for augmented image samples:
Finally, some of the images in the training dataset are taken in low contrast conditions which will prevent the network from seeing all pixels information behind the darkness. I then applied an adaptive histogram equalization on images suffering from low contrast. The following an example of the effect of histogram equalization:
I used the LeNet architecture and my final model consisted of the following layers:
Layer | Description |
---|---|
Input | 32x32x1 grayscale image |
Convolution 3x3 | 1x1 stride, valid padding, outputs 28x28x32 |
Batch Normalization | |
RELU | |
Max pooling | 2x2 stride, outputs 14x14x32 |
Dropout | 0.2 |
Convolution 3x3 | 1x1 stride, valid padding, outputs 10x10x64 |
RELU | |
Batch Normalization | |
Max pooling | 2x2 stride, valid padding, outputs 5x5x64 |
Dropout | 0.2 |
Fully connected | output 120 |
RELU | |
Fully connected | output 84 |
RELU | |
Fully connected | output 43 |
To train the model, I used an ADAM optimizer to minimize the cost function represented by the cross entropy within the following hyperparameters setting:
- batch size = 512
- epochs = 20
- learning rate = 0.002
My final model results were:
- training set accuracy of 99.6%
- validation set accuracy of 98.5%
- test set accuracy of 95.4%
The model selection was done through an iterative approach:
First architecture tried was the original LeNet-5 model from the class, as it is a well built and efficient architecture for image classification.
The first architecture was fast on training but did not returned satisfied accuracy after several hyperparameters tuning iterations. The results showed an overfitting issue, the model had a good bias level while there was a great gap between training loss and validation loss.
As the input from prior layers can vary after weights updates, I first added a batch normalization right after convolution layer to standardize inputs fed to activation unit, the technique helps stabilize and accelerate the training and offering a second benefit on regularization. I added as well the dropout regularization term to help the model generalize better (break potential dependencies between training data and some nodes), but at the meantime I tried not to degrade much the bias level, so I increased the filter depth to consider more features as well. My hyperparameters tuning mainly focused on the probability of dropout and the depth of convolution filters to manage the overfitting/underfitting trade-off. I then decreased a bit the learning rate but increased the epochs to reach the final model results.
The final model output a low bias level and kept a reasonable loss gap between training and validation.
Here are ten German traffic signs that I found on the web:
The images are all taken in good light conditions with a normal contrast level. Those images also have a good sharpness. The 5th image with a "Dangerous to turn right" sign slightly rotated. The 7th image "Slippery road" with watermark which may introduce undesired bias for the model to predict correctly.
Here are the results of the prediction:
The model was able to correctly guess 10 of the 10 traffic signs, which gives an accuracy of 100%. This compares favorably to the accuracy on the test set of 95.4%
The code for making predictions on my final model is located in the line #208 of the Ipython notebook.
For each of the images, the top five soft max probabilities were:
The model is quiet sure (probability close to 100%) about what it predicts for the images. Except for the "Slippery road" image and "No entry", the model has less certainty but predicts correctly.
Again by visualizing the confusion matrix of test dataset, we can get a more clear idea on how certain the model predicts on each of the classes:
The model sees the "Speed limit (60km/h)" sign as "Speed limit (80km/h)", "Speed limit (100km/h)" sign as "Speed limit (120km/h)" a lot. The model has trouble as well to predict "Pedestrians","Beware of ice/snow" and "Bumpy road":
ClassId | SignName | Precision | Recall |
---|---|---|---|
27 | Pedestrians | 0.882353 | 0.500000 |
30 | Beware of ice/snow | 0.840909 | 0.740000 |
22 | Bumpy road | 0.959184 | 0.783333 |
3 | Speed limit (60km/h) | 0.975069 | 0.782222 |
7 | Speed limit (100km/h) | 0.990000 | 0.880000 |
The model's certainty about "Double curve" sign and "Speed limit (20km/h)" sign is still limited.
ClassId | SignName | Precision | Recall |
---|---|---|---|
21 | Double curve | 0.566434 | 0.900000 |
0 | Speed limit (20km/h) | 0.674699 | 0.933333 |
Further work on image preprocessing and model architecture is still needed to improve model's generalization ability about those classes.
The model has a high precision and recall score for the "Speed limit(30km/h)" sign. Here are feature map visualization of the first convolution layer output of two "Speed limit(30km/h)" samples
Fisrt image:
1st conv layer output:
Second image:
1st conv layer output:
Filter 3,4,5,8,22,26,27,31 mainly focus on the number's shape, Filter 20 reads the outline of the sign, and the red circle is the common feature for the majority of filters