Deep Learning Assisted Computer Vision System For Traffic Sign Classification and Detection

Synopsis (Get full documentation here )

Computers, as we have around today, are GIGO (Garbage-in-Garbage-Out) devices which are only capable of producing results based on what is inputted into them and how they have been originally programmed to respond to such inputs. As such, challenges exist with problem categories that cannot be formulated as algorithms, especially problems which depend on many subtle factors such as knowledge and understanding of previous scenes and corresponding reactions to them. As an example, for the recognition of the Queen of England’s image among a cluster of 100 other images, the human brain may be able to provide an informed guess, probably based on past knowledge and various other experiences combined, however, this cannot be accurately derived by a computer without an already pre-written algorithm. In the light of this, there has been growing interest in researches geared toward developing Artificial Intelligent (AI) models which are capable of learning and carrying out classification tasks without making references to any pre-written algorithm. One of such research area is in the field of Neural Networks (NN) which are a biologically inspired family of computation architectures built as extremely simplified models of the human brain.

It is safe to say that despite the ever-increasing popularity of transferring (human) tasks to computers for simplification purposes, there are still a lot of human tasks that are still poorly done by computers, such as in areas of visual perception and intelligence. This is because the largest part of the human brain works continuously on data analysis and interpretation while the largest part of the computer is only available for passive data storage. Thus, the brain therefore performs even closer to its theoretical maximum. Although the computer is fast, reliable, unbiased, never tired, consistent and sometimes can even carry out much more complex computational combinations than the human brain is known to muscle, it is still unable to synthesize new rules and it is safe to say, it has no common sense. They rather have a group of arithmetic processing units and storage carefully interconnected to perform complex numerical calculations in a very short time but are not adaptive.

On the other hand, the human brain possesses what we know as common sense, a bigger knowledge base, ability to synthesize new rules and spontaneously detect trends in data without being pre-taught, even though based on capacity, the computer should be more powerful than the human brain as it averagely comprises of over 109 transistors with a switching time of 10-9 seconds while the brain in comparison consists of over 1011 neurons but with only a switching time of about 10-3 seconds. With closer analysis, we note that although the human brain is easily tired, bored, biased, inconsistent and cannot be fully trusted, it still outperforms the computer in some application areas due to its perceptive nature of operation (interpretation of sensory information in order to understand the environment). This explains why there still is major reliance on the human brain for classification tasks.

Juxtaposing the computer’s strengths and weaknesses against the human brain’s makes us realize that in as much as the human brain is better when it comes to perceptive tasks, it has endurance, bias and inconsistency issues. Therefore, effort is being made by researchers to develop systems which are capable of fusing together the advantages of both the brain and the computer into one near perfect outfit. A system which can take on the perceptive learning, out of the box synthesis, self-organizing and self-learning characteristic of the human brain, while maintaining the massive computational capability, speed and enduring features of the computer. This motive has led to increased research on neural networks which are a biologically inspired family of computation architectures built as extremely simplified models of the human brain.

In summary, this project seeks to explore the science behind Neural Networks (NN), its various flavours (especially CNNs), application areas and then finally, narrow down by applying it in the design and development of a computer vision system which can be used for traffic sign recognition and detection in autonomous vehicles.

Project Implementation and Results

IMDB Creation (Dataset)

A dataset of traffic sign images from the German Traffic Sign Recognition Benchmark (GTSRB) website is compiled and used to create the image database (IMDB) used to train and test the deep neural network. The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. The dataset consists of over 39,000 images in total, grouped into 43 different road traffic symbols.

Through a written IMDB creation script, the dataset is split into 70% training, 20% validation and 10% test sets of images which are used in the holistic training, validation and testing of the created AlexNet model. Validation images are used to test the performance of the network during the training process while the test images are reserved in this project to perform personal test on the network, post training. The validation set actually can be regarded as a part of training set, but it is usually used for parameter selection and to avoid overfitting. If a model is trained on a training set only, it is very likely to get close to 100% accuracy and over fit, thus get very poor performance on test set which have never been seen by the network before. The test-sets are only used to test the performance of a trained model and are the best means of detecting over fitting in the network.

CNN Training

For this project, the AlexNet CNN model is used. The MatConvNet Toolbox is used to create, modify and train the AlexNet CNN model. MatConvNet includes a variety of layer models contained in its MATLAB library directory, such as convolution, deconvolution, max and average pooling, ReLU activation, sigmoid activation and many other pre-written functions. There are enough elements available to help implement many interesting state-of-the-art networks out of the box, or even import them from other toolboxes such as Caffe. After the creation of the AlexNet model, the network undergoes training and is then tested to ensure performance and accuracy. Considering the fact that two levels of operation is cascaded, one for the traffic sign (object) detection and the other for the actual classification of the traffic sign, it is important to note that this chapter only explains the development, training and testing of the image classifier. Detailed testing and verification is carried out to ensure optimal performance of the system. In order to develop a Convolutional Neural Network which is able to classify images fed into it, the network has to be trained over multiple epochs in a specialized manner. Batches of training images fed into the CNN first have to be pre-processed to the network’s standard input size and in most cases, normalized to have zero mean. This initiative affect the rate of convergence of the network during training to a great extent. It is important to remember that the convolutional layers of the network serve as the feature extractors while the fully connected layers and the softmax serve as the processing and classifier elements

The diagrams below show the results of the performance analysis and testing carried out and the descent in error rate of the classifier/CNN over the course of the 58 epochs (rounds) of training. This took about 47 hours using over 39000 training images on a 16 Gigabyte RAM quad core processor. The images below show the descent in the error rate as training proceeded, a classification example,and a bar graph showing performance improvement as the training went progressed.

Figure Showing Error Descent During 58 Epochs of Training (left) and Sample Classification Result After Training (right)

Figure Showing Improvememnt Bar Graph Per Epoch During Training

As at Epoch 58, the achieved accuracy level was 98.464. Continuing the training a few more epochs down the line will result in accuracy levels above 99%

Sign Detection & Integration

The images below show the integration plan for the system (detection/classification) as well as the different schemes used for the traffic sign detection (harris corner detection, sobel edge filtering, hough line/circular transforms, connected component analysis etc.) Figure Showing Detection and Classification Integration Methodology

For more details, read Project Documentation (Chapter 5.0)

Optimization

The key optimization schemes used in the improvement of the performance of this system include:

MATLAB Vectorization
Use of C/C++/FORTRAN for some subroutines
MATLAB Parallel Computing
Heterogeneous Computing

Figure Showing Speed Improvement After Some Form of Optimization

Results

By engaging different methods of optimization (software/hardware) we can improve the speed of action of Neural Network computational expensive operations, as in this project, I was able to push my computer vision system's performance from about 1.5 seconds per frame (i.e. about half a frame per second) to between 25 to 30 frames per seconds using my C/C++ solution running across multiple cores. The pure MATLAB solution optimized through vectorization and multi core dispatch gives a maximum of just about 8 frames per second which is also way ahead of the un-optimized MATLAB version of the vision system. In a nutshell, the results achieved by this project are as listed below

Design and implementation of a deep leaning model (AlexNet) for traffic sign classification which attained a classification accuracy of over 98%.
Design and implantation of multiple layers of traffic sign post detection mechanisms which attained a detection accuracy of over 99%.
Optimization of the detection algorithms performance from about half a frame per second to around 25 to 30 frames per second (classification operation included). This sums up to over 50 time’s improvement in speed (C/C++ version).
Practically proving the fact that further improvement in speed can be achieved through heterogeneous computing. i.e., by dedicating some parts of the computationally expensive detection functions to suitable devices such as GPUs and FPGA.

Figure Showing Vision System in Operation

Repository Info

This repository purely contains the project documentation and codes. Large supplementing files such as the:
- The IMDB file
- Trained ConvNet
- Test Images
- Test Videos

...can be downloaded from the links given below

Remember to set the project folder location in the file 'setGlobalVariables.m' (src>Others), as the current ProjectFolder = ('C:\Temp\final_project'), which may be different from where you have this project located on your computer
Also remember to unzip the 'TestImages' folder and 'TrainingImages' folder
Once all code files and supplementing files/folders have all been downloaded, import folder path into MATLAB directory

Paste the below and run in MATLAB to download/install MATCONVNET Library

cd matconvnet-1.0-beta23
run matlab/vl_compilenn;

% Setup MatConvnet.
run matlab/vl_setupnn; ```

Note once again: Different function files interact with various supporting files such as video clips, images, webcam etc. Make sure that all of this resources are well references/linked before you run any of the functions, so as to avoid minor errors.

Download Links

Test Images: Included in 'Training Dependencies Folder'. IMDB File (32x32), Trained Convnet (227x227), TestVideoClips: Here Project Documentation: Here

Functions Breakdown

The list below gives a short explanation on each of the function.

AlexNetNN.m: Used to create alexnet model, link to the IMDB and train
callAlex.m: Used to call AlexNetNN and also set some post training parameters
classifyImg.m: Used to run classification on the detected ROI
createIMDB.m: Used to create an image database for the NN training
sceneClassifier.m: Used to validate input image to system, set detection algorithm parameters and call parallel/nonparallelDetection
CCA.m: Used to perform connected component analysis on input image
detectCircle.m: For circle detection
detectCorner.m: For corner detection (Harris method)
detectEdge.m: For edge detection (Sobel’s method)
getROI.m: Used to call CCA and also set some pre-CCA call conditions
myDetectCircle.m: Customized circle detection algorithm (Circular Hough Transform)
networkTest.m: Used to test individual CNN epoch outputs
testDetection.m: Used to test individual detection functions
getGrayScale.m: To get image grayscale
MapRegion.m: Region class used for CCA
viewCircularHough: Used to have a look into how the circular hough transform operates
shapeAnalyser.m: Analyses and searches for triangles, rectangles, octagon and diamond shapes
cornerTestBench.m: Testbench for corner detection function
setGlobalVariables.m: Used to set global variables
getGlobalVariables.m: Used to get global variables
drawDetectedCircle.m: Takes in image, circle radius and centre co-ordinates and produces the circle on top of the received imge
passVideoToVision.m: To feed video stream to the vision system, so as to evaluate
parallelDetection.m: Used to execute all detection algorithms in a parallel way
multiEpochAnalyser.m: Used to test a wide range of CNN epoch outputs so as to detect overfitting
detectCircleTestBench.m: Test bench for circle detection function performance
analyseCriticalAreas.m: Used to analyse critical areas as explained in 5.2.2
nonParallelDetection.m: Used to execute all detection algorithms in a non-parallel way
evaluateDetectionSystem.m: Used to run evaluation on all the individual detection functions
evaluateOpenCLPerformance.m: Used to evaluate the performance of the OpenCL based parts of the vision system application

Author(s)

Oluwole Oyetoke - Project work - LinkedIn Profile, Website
Dr. David Cowell - Initial work - University Profile

Licence

This project is free for use and to be contributed to

Acknowledgments

Family
Well wishers
Supervisor
All the nice people who published helpful and easy to grasp journal articles in this area of study
Most importantly, future contributors to this project

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Grouped Source Code		Grouped Source Code
Libararies		Libararies
Project Documentation		Project Documentation
Training Dependencies		Training Dependencies
AlexNetNN.m		AlexNetNN.m
BasicHandle.m		BasicHandle.m
CCA.m		CCA.m
CCAmex.cpp		CCAmex.cpp
CCAmex.mexw64		CCAmex.mexw64
Final_Documentation_(MSc Thesis).pdf		Final_Documentation_(MSc Thesis).pdf
MapRegion.m		MapRegion.m
Readme.md		Readme.md
analyseCriticalAreas.m		analyseCriticalAreas.m
analyseCriticalAreasTestBench.m		analyseCriticalAreasTestBench.m
analyseCriticalAreas_mex.mexw64		analyseCriticalAreas_mex.mexw64
callAlex.m		callAlex.m
classifyImage.m		classifyImage.m
cornerTestBench.m		cornerTestBench.m
createDetectionVideo.m		createDetectionVideo.m
createIMDB2.m		createIMDB2.m
detectCircle.m		detectCircle.m
detectCircleTestBench.m		detectCircleTestBench.m
detectCircle_mex.mexw64		detectCircle_mex.mexw64
detectCorner.m		detectCorner.m
detectCorner_mex.mexw64		detectCorner_mex.mexw64
detectEdge.m		detectEdge.m
detectEdge_mex.mexw64		detectEdge_mex.mexw64
drawDetectedCircle.m		drawDetectedCircle.m
evaluateDetectionSystem.m		evaluateDetectionSystem.m
evaluateOpenCLPerformance.m		evaluateOpenCLPerformance.m
findjobj.m		findjobj.m
getGlobalVariables.m		getGlobalVariables.m
getGrayScale.m		getGrayScale.m
getROI.m		getROI.m
matconvnet.sln		matconvnet.sln
matconvnet.vcxproj		matconvnet.vcxproj
matconvnet.vcxproj.filters		matconvnet.vcxproj.filters
multiEpochAnalyser.m		multiEpochAnalyser.m
myAlexNet.m		myAlexNet.m
myDetectCircle.m		myDetectCircle.m
myDetectCircleTestBench.m		myDetectCircleTestBench.m
myDetectCircle_OpenCL.m		myDetectCircle_OpenCL.m
myDetectCircle_mex.mexw64		myDetectCircle_mex.mexw64
myMake.m		myMake.m
networkTest.m		networkTest.m
nonParallelDetection.m		nonParallelDetection.m
parallelDetection.m		parallelDetection.m
passVideoToVision.m		passVideoToVision.m
pax_global_header		pax_global_header
performanceEvaluator.m		performanceEvaluator.m
printWindowsNSteps.m		printWindowsNSteps.m
runVision.m		runVision.m
sceneClassifier.m		sceneClassifier.m
setGlobalVariables.m		setGlobalVariables.m
shapeAnalyser.m		shapeAnalyser.m
shapeAnalyser_mex.mexw64		shapeAnalyser_mex.mexw64
testDetection.m		testDetection.m
viewCircularHough.m		viewCircularHough.m
xticklabel_rotate.m		xticklabel_rotate.m

OluwoleOyetoke/Deep-Learning-Assisted-Computer-Vision-System

Folders and files

Latest commit

History

Repository files navigation