MADAIN

Mole Analysis with Deep Adam-optimized Inception Network

By: Amanda Derdiger, Andrew Koller, Mustafa Can Ayter, and Natalia Mitchell

Introduction

We have built a convolutional neural network (CNN) to analyze images of skin lesions and categorize them into one of seven classes, three of which are cancerous and four of which are benign. We have have also developed a web page, currently hosted on GitHub pages, and plan to embed a web app with our model.

Data

Our dataset is from Kaggle and can be accessed by the link below. This dataset contains 10,015 images of skin lesions across the 7 classes detailed below.

Demographics:

Class Definitions:

https://www.kaggle.com/datasets/farjanakabirsamanta/skin-cancer-dataset

Process

Benchmarking

We started by benchmarking three CNN architectures detailed in Aurelien Geron's book "Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow" (2022). These three CNNs were: InceptionV3, ResNet50, and VGG16. In addition to the three architectures, we tested 3 different optimizers for each: Adam, RMSprop, and SGD. In total, nine models were benchmarked, and from those, we chose InceptionV3 with the Adam optimizer as our primary model.*

For benchmarking metrics, see "run1/visualizations/"

Despite the InceptionV3.Adam slightly underperforming relative to ResNet50.Adam in the classification reports, InceptionV3.Adam was chosen for it's supperior performance on AUC metrics. (see \run1\visualizations\roc_curve\roc_curve_InceptionV3_Adam.png).

Fine-Tuning the Model

Once we chose our primary model, we continued to fine-tune it to maximize our AUC, precision, and recall scores, with recall on our three cancerous classes more highly prioritized. This is because, in the precision/recall trade-off, favoring recall reduces false negatives. In a cancer identification model, such as this, false negatives in the cancerous classes would be our most detrimental outcome that should be minimized to the extent possible. Please note the 'Cancer Catcher' model in run4, which reached our higest recall for melanoma at .7.

Our fine-tuning steps, along with their corresponding run folders in our repo are detailed below.

Running InceptionV3.adam at 150 epochs (run3; v6) a. Removed image augmentation - original benchmarking involved preliminary image augmentation:

 rotation_range=20,
 width_shift_range=0.2,
 height_shift_range=0.2,
 horizontal_flip=True,

b. Results from testing do not show improvement with higher epochs.

        precision    recall  f1-score   support

akiec       0.02      0.05      0.03        65
  bcc       0.05      0.09      0.07       103
  bkl       0.08      0.07      0.08       220
   df       0.00      0.00      0.00        23
  mel       0.11      0.17      0.13       223
   nv       0.68      0.50      0.57      1341
 vasc       0.01      0.04      0.02        28

accuracy                               0.37      2003
macro avg          0.14      0.13      0.13      2003
weighted avg       0.48      0.37      0.41      2003

c. Resolution: test at lower epochs; weights need to be adjusted

Weighting scheme testing 1 - The Cancer Catcher (run4; v7) a. All testing to this point utilized TensorFlow's 'balanced' weighting system to account for large imblanace in classes. b. 4x on 'bcc' and 'akeic', 20x on 'mel' c. Tested effectiveness of different weights d. Increased weighting for underrepresented classes by a factor of 4x e. Note the recall for 'mel' at .70

                 precision    recall  f1-score   support

       akiec       0.04      0.06      0.05        65
         bcc       0.06      0.07      0.06       103
         bkl       0.00      0.00      0.00       220
         df        0.00      0.00      0.00        23
         mel       0.11      0.70      0.19       223
         nv        0.63      0.12      0.20      1341
         vasc      0.02      0.04      0.03        28

accuracy                               0.16      2003
macro avg          0.12      0.14      0.08      2003
weighted avg       0.44      0.16      0.16      2003

Binary classification testing (run5; v8) a. Testing conducted at same time as weighting scheme testing b. All testing to this point involved a multiclass classifier. c. Tested to effectiveness of a binary classifier as opposed to a multiclass classifier. d. Results unremarkable

                precision    recall  f1-score   support

       benign      0.79      0.58      0.67      1612
    cancerous      0.17      0.37      0.24       391

accuracy                               0.54      2003
macro avg          0.48      0.47      0.45      2003
weighted avg       0.67      0.54      0.59      2003

Inverse proportional weighting (run8) a. Weighted classes based on the inverse of their frequency
Class balanced loss approach weighting (run9) a. Attempted to implement balanced loss weighting, model performed poorly

Adding generated augmented images to training data (run10) a. Added a random imgage augementor and image generator b. Added randomly generated images back into training data c. Wanted to normalize percentage representation in data set of underrepresented classes

             precision    recall  f1-score   support

     akiec       0.01      0.02      0.01        65
       bcc       0.04      0.06      0.05       103
       bkl       0.07      0.03      0.04       220
        df       0.03      0.04      0.04        23
       mel       0.12      0.22      0.15       223
        nv       0.68      0.61      0.64      1341
      vasc       0.00      0.00      0.00        28

  accuracy                           0.44      2003
 macro avg       0.14      0.14      0.13      2003

weighted avg 0.48 0.44 0.45 2003

Increasing custom layer neuron density from 512 to 1024 and rerunning promissing models (run11; v12) a. Testing Multiple models with increased neuron count b. Top performers are as follows: InceptionV3.Adam, ResNet50.Adam, VGG16.SGD c. Ultimately, InceptionV3.Adam remained the highest perfrmer

InceptionV3.Adam

             precision    recall  f1-score   support

     akiec       0.03      0.06      0.04        65
       bcc       0.04      0.06      0.05       103
       bkl       0.12      0.14      0.13       220
        df       0.01      0.04      0.02        23
       mel       0.09      0.15      0.12       223
        nv       0.67      0.50      0.57      1341
      vasc       0.03      0.07      0.04        28

  accuracy                           0.37      2003
 macro avg       0.14      0.15      0.14      2003

weighted avg 0.48 0.37 0.41 2003

ResNet50.Adam

VGG16.SGD

Augmented image generation with 1000 images for underrepresented classes (run12) a. Results not impressive

Resources:

https://towardsdatascience.com/review-inception-v4-evolved-from-googlenet-merged-with-resnet-idea-image-classification-5e8c339d18bc

https://stackoverflow.com/questions/51798784/keras-transfer-learning-on-inception-resnetv2-training-stops-in-between-becaus

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Resources		Resources
prototype_versions		prototype_versions
resources_augmented		resources_augmented
resources_binary		resources_binary
run1		run1
run10		run10
run11		run11
run12		run12
run13		run13
run14		run14
run15		run15
run16		run16
run17		run17
run18		run18
run19 - FINAL		run19 - FINAL
run2		run2
run3		run3
run4 - Cancer Catcher		run4 - Cancer Catcher
run4		run4
run5		run5
run6		run6
run8		run8
run9		run9
static		static
.gitignore		.gitignore
README.md		README.md
Split images into folders for github upload.ipynb		Split images into folders for github upload.ipynb
index.html		index.html

aderdiger/MADAIN

Folders and files

Latest commit

History

Repository files navigation

MADAIN

Mole Analysis with Deep Adam-optimized Inception Network

By: Amanda Derdiger, Andrew Koller, Mustafa Can Ayter, and Natalia Mitchell

Introduction

Data

Process

Benchmarking

Fine-Tuning the Model

Resources:

About

Topics

Resources

Stars

Watchers

Forks

Languages