The image HDR dataset contains 652 normal and HDR readings of cropped license plates with a csv containing annotation as well. You can download the dataset from here : https://medusa.fit.vutbr.cz/traffic/research-topics/general-traffic-analysis/holistic-recognition-of-low-quality-license-plates-by-cnn-using-track-annotated-data-iwt4s-avss-2017/
- track_id - ID of specific track based on tracker
- image_path - path to image in archive structure
- lp - ground truth text for license plate
- train - Train/test split. 0 - test, 1 - train
- After reading the csv file as a pandas dataframe, shuffle the dataframe before splitting the data into Train and Test.
- Preprocess the images to grayscale, resize every image to size(32,128) and then normalize images by dividing it by 255.
- To preprocess the labels, read the 'lp' column values from the dataframe and compute some numerical value for each character/number in the labels by using an encoding function.
The network architecture is inspired by https://arxiv.org/pdf/1507.05717.pdf. Source model consists of three main components:
- CNN for extraction of spatial feayures from the image
- RNN to predict sequential output per time-step
- CTC loss function which is transcription layer used to predict output for each time step.
We use the CTC(Connectionist Temporal Classification) loss function which is widely used for text recognition problems. CTC loss function requires four arguments to compute the loss : predicted labels, ground truth labels, input sequence length to the LSTM and ground truth label length. Since keras does not have a CTC implementation, a custom loss function is created and passed into the model. This model is used for training and for testing we will use the model that we have created earlier “act_model”.
The Dataset has a total of only 652 cropped license plate images which is not enough to train the entrire network from scratch with random weight initializations. Therefore, we are going to apply Transfer Learning and use a pre-trained model trained on a large dataset with pre-trained weights.
The Pre-trained model and weights used here is available at https://github.com/TheAILearner/A-CRNN-model-for-Text-Recognition-in-Keras with full source code and documentation.
Since our HDR dataset is small and very similar to the dataset on which the Pre-trained model was trained on, we will initialize our model weights with the pre-trained model's weights and then fine-tune the entire network by training on 80% of the images and testing on the remaining 20%. Overfitting concerns due to fine-tuning were handled by using Dropout layers in the Network and also by using early stopping if necessary.
After fine-tuning the network for 10 epochs with a batch size of 16, we save the final weights as 'best_model.hdf5' based on the value of Validation loss.
Our model is now trained and fine-tuned on around 520 HRD license plate images and we can now evaluate the model to predict text from the 130 Validation/Test images using the model.predict() function and a CTC decoder. Some of the validation text results are displayed alongside the original text. We actually get decent results on our test set.
To test how well our model performs on the test set , we use the "Accuracy" metric and calculate the accuracy on the test set. Test accuracy is somewhere between 95-97% depending on our shuffled data split.
We now have our OCR model ready that can read the text on the license plates with decent accuracy. To test the OCR model on your own custom image, use the 'test_ocr' method which takes a cropped image as an input and outputs the text on the license plate.