Blood Cell Type Prediction
Apr 2023
This project aims to classify blood cell types using deep learning and transfer learning techniques. The MobileNetV2 architecture is employed as a pre-trained model for feature extraction, and a custom classification model is built on top of it.
The goal of this project is to develop an automated method for detecting and classifying blood cell subtypes using machine learning techniques. The project utilizes a dataset containing 12,500 augmented images of blood cells, each labeled with the corresponding cell type. The dataset consists of four different cell types: Eosinophil, Lymphocyte, Monocyte, and Neutrophil. In this project we will build a deep learning based image classification model using a pretrained model MobilenetV2 and applying the concept of transfer learning
Transfer learning is a technique that uses knowledge gained from a pre-trained model on one task to boost performance on a different but related task. It takes advantage of the learned features and representations from the pre-trained model, which can be applied to a new task. This approach helps overcome data limitations, speeds up training, and improves the performance of models on new tasks by leveraging the knowledge encoded in the pre-trained model.
Transfer learning offers several benefits:
-
Improved Performance: By leveraging pre-trained models that have been trained on large-scale datasets, transfer learning can help achieve higher performance on new tasks with limited available data.
-
Faster Training: Transfer learning reduces the training time and computational resources required to build models from scratch, as the pre-trained model has already learned generic features that can be used as a starting point.
-
Generalization: Transfer learning allows models to generalize better to new tasks by learning abstract and generic representations from a diverse range of data. This helps in capturing higher-level features that can be relevant across different tasks.
-
Data Efficiency: With transfer learning, models can achieve good performance even with smaller datasets, as they can leverage the knowledge learned from large-scale datasets.
Overall, transfer learning is a powerful technique that enables the transfer of knowledge and representations learned from one task to another, leading to improved performance, faster training, and better generalization capabilities.
The project consists of the following key steps:
You can download the data from kaggle.
The image data is loaded and preprocessed using the ImageDataGenerator
class from TensorFlow. Data is split into training and validation sets.
The MobileNetV2 model is loaded with pre-trained weights and frozen to retain its learned features. Additional layers are added to the model for classification purposes.
The model is trained using the training images and validated using the validation images. Training is performed for a specified number of epochs, and early stopping is applied to prevent overfitting.
The trained model is evaluated on the test data. Metrics such as loss, accuracy, and a classification report are computed. Additionally, a confusion matrix is plotted to analyze the model's performance.
To reproduce this project, follow these steps:
-
Dataset: Prepare a dataset of blood cell images, divided into training and testing directories. The
train_dir
andtest_dir
variables in the code should be updated with the respective directory paths. -
Environment Setup: Set up a Python environment with TensorFlow and the necessary dependencies. Use
pip
orconda
to install the required packages. -
Code Execution: Execute the code cells in the provided Jupyter Notebook model in sequential order.
-
Analyze Results: Review the training and validation loss/accuracy plots to assess the model's performance. Evaluate the model on the test data and examine the confusion matrix and classification report.
The project yields the following conclusions based on the validation set:
- The model achieves a high accuracy of 94.5% on the validation images.
- There are two possible explanations for the subpar performance on the test set:
- The test data may contain mislabeled samples.
- The test set might have been generated differently from the original training set, leading to inconsistencies in image quality or class representation.
The confusion matrix provides additional insights, suggesting that the neutrophil class in the test set could contain mixed samples from other classes.
This project demonstrates the application of transfer learning using the MobileNetV2 architecture for blood cell type classification. Despite achieving high accuracy on the validation set, further investigation is required to understand the discrepancies in performance on the test set. The project can be extended and optimized by implementing various strategies to improve the model's accuracy and generalization capability.
To enhance the project, consider the following steps:
-
Data Augmentation: Apply data augmentation techniques to increase the diversity of the training data and improve the model's generalization.
-
Hyperparameter Tuning: Explore different hyperparameter configurations, such as learning rate, batch size, and number of epochs, to optimize the model's performance.
-
Model Selection: Experiment with other pre-trained models, such as Xception or ResNet, to compare their performance with MobileNetV2.
-
Ensemble Methods: Combine multiple models or use ensemble techniques to enhance classification accuracy and robustness.
-
Fine-tuning: Gradually unfreeze and fine-tune the pre-trained layers of the model to adapt to the specific blood cell classification task.