Cassava Leaf Disease Classification

Top-1% solution to the Cassava Leaf Disease Classification Kaggle competition on plant image classification.

Summary
Repo structure
Working with the repo

Summary

Cassava is one of the key food crops grown in Africa. Plant diseases are major sources of poor yields. To diagnose plant diseases, farmers require the help of agricultural experts to visually inspect the plants, which is labor-intensive and costly. Deep learning helps to automate this process.

This project works with a dataset of 21,367 cassava images. The pictures are taken by farmers on mobile phones and labeled as healthy or having one of the four common leaf disease types. Main data-related challenges are poor image quality, inconsistent background conditions and label noise.

We develop a stacking ensemble with CNNs and Vision Transformers implemented in PyTorch. Our solution reaches the test accuracy of 91.06% and places 14th out of 3,900 competing teams. The diagram below overviews the ensemble. The detailed summary of the solution is provided this writeup.

Repo structure

The project has the following structure:

codes/: .py scripts with training, inference and image processing functions
notebooks/: .ipynb notebooks for data eploration, training CNN/ViT models and ensembling
data/: input data (images are not included due to size constraints and can be downloaded here)
output/: model weights, configurations and diagrams exported from notebooks
pretraining/: weights and configurations of models pretrained on external datasets

Working with the repo

Environment

To execute codes, you can create a virtual Conda environment from the environment.yml file:

conda env create --name cassava --file environment.yml
conda activate cassava

Reproducing solution

The solution can be reproduced in the following steps:

Downloading competition data and adding it into the data/ folder.
Running training notebooks pytorch-model to obtain base models weights.
Running the ensembling notebook lightgbm-stacking to get final predictions.

All pytorch-model notebooks have the same structure and differ in model/data parameters. Different versions are included to ensure reproducibility. If you only wish to get familiar with our solution, it is enough to inspect one of the modeling notebooks and go through codes/ to understand the training process. The stacking ensemble reproducing our submission is also provided in this Kaggle notebook.

The notebooks are designed to run on Google Colab. More details are provided in the documentation within the notebooks.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
codes		codes
data		data
notebooks		notebooks
output		output
pretraining		pretraining
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codes

codes

data

data

notebooks

notebooks

output

output

pretraining

pretraining

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

environment.yml

environment.yml

Repository files navigation

Cassava Leaf Disease Classification

Summary

Repo structure

Working with the repo

Environment

Reproducing solution

About

Releases

Packages

Contributors 2

Languages

License

kozodoi/Kaggle_Leaf_Disease_Classification

Folders and files

Latest commit

History

Repository files navigation

Cassava Leaf Disease Classification

Summary

Repo structure

Working with the repo

Environment

Reproducing solution

About

Topics

Resources

License

Stars

Watchers

Forks

Languages