Skip to content

Using neural networks to classify COVID-19 in chest x-rays (part 1) and mel-spectrograms (part 2)

License

Notifications You must be signed in to change notification settings

Lewis34cs/corona_audio

Repository files navigation

COVID-19 Response Logo

COVID-19 Classification Through Chest X-Rays and Coughing Audio

Part 1: Open In Colab

Part 2: Open In Colab

Author: Christopher Lewis

Description

The contents of this repository describe methods of COVID-19 Classification through multiple testing methods. The analysis displayed below is detailed in hopes of making the work accessible and able to be replicated by others wishing to explore and analyze this project themselves.

From the beginning of 2020, COVID-19 has run rampant throughout the entirety of the globe, resulting in over 100 million cases and 2 million deaths. While vaccines are beginning to be pushed out as our new line of defense, my project's goal is to identify other ways to detect COVID-19 in a patient to help diagnose those who are infected and slow the spread. In this project, I focus on two different ways to identify COVID-19: multi classification via chest x-rays and binary classification through coughing audio. Both will be using Sequential Convolutional Neural Networks, and the code for this project can be found in the google colab link at the top of the page. To run the Google Colab code, you must allow Google Colab to access an account. I provide links for the data in the Google Colab Notebook (link above). Please be sure to save zipped folders for these audio datasets into your google drive so you can access them on Google Colab. The chest x-ray database was obtained via Kaggle's API.

In this project we are going to be using the OSEMN Process:

Obtain: Our datasets came from a variety of places including: Kaggle via API, Stanford Unversity's Virufy App via Github, and the CoughVid app via Zenodo's website.

Inspecting, Scrubbing, & Exploring:

PART 1: When dealing with the chest x-rays, we will be making sure that our train, test, and validation sets are stocked with an appropriate amount of images, examine class proportions, and use CLAHE as our key preprocessing technique.

PART 2: We will be implementing scrubbing techniques on the CoughVid dataset according to a certain thresholds and custom filters. Other preprocessing and preparation techniques include converting .webm video files to .ogg files, setting the length of our audio files to a desired length through zero padding shorter files, and creating mel-spectrogram images via the librosa library.

Modeling & Exploring:

PART 1: We will then create Sequential models to train on our unpreprocessed and CLAHE preprocessed images to see if implementing CLAHE as a preprocessing technique proves to be effective.

PART 2: We will create and train Sequential models for this section as well to try and find if our models are able to correctly classify the right class for each image. We will also save augmented spectrogram images of the minority class in our training set in an attempt to help the model identify the proper class.

Interpretion: Here we will give our results of our models, conclusion, recommendations, and further research.

PART 1 - Chest X-ray Multi Classification

In Part 1, we explore a database retrieved from Kaggle's API containing 3800+ high quality images that were diagnosed in a professional setting to three classes: COVID, Healthy, Viral Pneumonia. We create an evaluate a baseline model, evaluate a basic CNN model, and then proceed to use a preprocessing technique called CLAHE to generate new images, which a new model is then trained on. We evaluate that model's results and discuss further thoughts.

Obtaining COVID CXR dataset via Kaggle API

We start by importing the chest-ray dataset by making a call to the Kaggle API. In order to make a call to the API, you must request a key from Kaggle. I've saved my key to my local storage, so I use the upload() function to access my key from my local files. Once we've uploaded our API key, we make a new hidden folder to store our key in, and make sure that only we can access and view this key. After the dataset is downloaded into our current working directory, we unzip the file and extract all files within the folder to the root directory.

Note that if you would like to choose a different place to extract the folder contents, in the zipf.extractall() function, simply type the address of where you would like the contents to be saved. Since the run time for this portion of the project is relatively fast, we save it to the root directory. Note that saving it to the root directory while in Google Colab makes it a temporary folder - so if you disconnect the runtime, you will have to feed in your key and extract the files again.

Identifying the number of images in our dataset

From here, we view the extracted contents and count the number of images per class along with the total image count. We see that we are working with 3800+ images. We can also see that our classes are relatively balanced.

Splitting our images into train, test, and validation sets

The splitfolders library proved to be extremely effective and very easy to use. It allowed us to feed in a directory address (base_folder variable) and gave it a place to put the new train, test, and validation folders (output variable). Note that the output address must already be existing. We also set a seed in order to help with model score reproducibility when training our models. The ratio parameter required floats that determined the percentage of data going to each folder.

If you would like more information about splitfolders library, please visit https://github.com/jfilter/split-folders

Recommendation:

If you prefer working with folder structures when dealing with training and testing data, I would highly recommend using the splitfolders library. Through the use of this library, I was able to create train, test, and validation folders, create image data generators, and have those generators access the data by flowing from my directories.

Baseline Model

We trained a baseline model that received a 34% accuracy score on classifying chest x-rays. In order to use the DummyClassifier, we had to manipulate image data generators to create train, test and validation sets by specifying a batch size equal to total amount of images for each set(80% of images in train, 20% in test, and 10% in validation). The Dummy Classifier tended to classify the majority of x-rays as COVID.

#Printing classification report and plotting confusion matrix
print(metrics.classification_report(y_test_ohe, y_pred));

plt.figure(figsize=(7, 6))
cm = metrics.confusion_matrix(y_test_ohe, y_pred, labels = [0, 1, 2], 
                              normalize='true')
sns.heatmap(cm, cmap="Greens", annot=True, square=True)
plt.show()
              precision    recall  f1-score   support

           0       0.32      0.64      0.42       240
           1       0.36      0.26      0.30       269
           2       0.38      0.14      0.21       270

    accuracy                           0.34       779
   macro avg       0.35      0.35      0.31       779
weighted avg       0.35      0.34      0.31       779

png

From here, we created new image data generators using the chest x-ray images, setting the batch size to 32 and fed the generators into a CNN model. The model received a 94% accuracy score, approximately 180% better than our baseline model.

Preprocessing Techniques

CLAHE Information

Contrast Limited Adaptive Histogram Equalization (CLAHE) is used to equalize pixel intensity in images. It is very similar to Adaptive Histogram Equalization (AHE), except it doesn't over-amplify the contrast of the image. This is controlled by the clipLimit parameter. The way CLAHE works on an image is that it focuses on small portions of the image (tileGridSize parameter) and then combines these portions together through bilinear interpolation to help remove any artificial boundaries, which means that it enhances the local contrast of the total image. This essentially helps with the pixel intensity distribution, allowing us to see more "depth" in an image.

link for info on cv2.createCLAHE():

Through the use of CLAHE, we can definitely see more of the infiltrate in the lung areas of the chest x-ray. These infiltrate areas of the lung can determine whether or not a person has Pneumonia. According to Hosseiny et al., when there is radiographic appearances of multifocal ground glass opacity, linear opacities, and consolidation, these are usually seen in cases of coronavirus type infections, including COVID-19, SARS, and MERS.

Now that we've preprocessed our CXR images and split them into train, test, and validation folders, we create our generators, define class weights and model, and train the model on the preprocessed images. We will be using the same model structure before to measure the effectiveness of CLAHE by comparing the recall rate of the COVID class and the overall accuracy of the two models.

best_model = load_model('.../cxr_models/model-05-0.101.hdf5')
class_report_gen(best_model, cl_test_generator, 
                 class_indices=test_generator.class_indices, cmap='Greens')
best_model.evaluate(cl_test_generator, verbose=1)
---------------------------------------------------------
                  Classification Report

              precision    recall  f1-score   support

           0       0.98      0.98      0.98       240
           1       0.95      0.98      0.96       269
           2       0.98      0.94      0.96       270

    accuracy                           0.97       779
   macro avg       0.97      0.97      0.97       779
weighted avg       0.97      0.97      0.97       779

---------------------------------------------------------

png

25/25 [==============================] - 7s 271ms/step - loss: 0.1188 - acc: 0.9679 - precision: 0.9704 - recall: 0.9679 - auc: 0.9941





[0.11884351074695587,
 0.9679075479507446,
 0.9703989624977112,
 0.9679075479507446,
 0.994138777256012]

By using our CLAHE CXR images, we are able to improve our model's acuracy to around 97%, which is 3% better than our model that used the unpreprocessed CXR images. We should also note that this model's recall score for classifying the COVID class is 98%, with only a 2% false negative rate. Using computer vision is a great way to diagnose certain infections and diseases through the use of chest x-rays.

Recommendation:

When working with images like x-rays or MRI scans, I highly recommend using CLAHE as a preprocessing technique to create new images that give the model more to learn from. CLAHE also is able to provide enough contrast to the image without overamplifying the intensity of the pixels. It is a great tool if the goal of your project involves detection and/or recognition with images.

Lime Image Explainer

We create a function that implements the library called lime, which allows us to see what the model determines as most important when classifying an image. Below we can see in green what the model identifies as positive correlations with the COVID class, and red as negative correlation with the class.

explain_image(basic_prep_cnn, cl_train_generator, num_samples=2000, num_feats=3, 
              class_label=0)
HBox(children=(FloatProgress(value=0.0, max=2000.0), HTML(value='')))

png

Interpretation & Further Thoughts

While using x-rays to diagnose patients with COVID has proven to be successful with the model we've created, we should consider the cost and risk of a COVID patient getting a chest x-ray. It would not be ideal for someone with COVID to come into a medical facility and expose other people to the virus. Not only would they be exposing COVID-19 to the medical staff, but also to someone with a potentially lowered immune system or someone who could be at a greater risk of hospitalization if they were to get COVID-19. We should also consider the price of getting an x-ray. A person who does not have health insurance can spend, on average, around $370.00 for a chest x-ray. Furthermore, those who are asymptomatic would not think to get an x-ray if they are not displaying any symptoms.

PART 2: Mel-Spectrogram Binary Classification

In Part 2, we explore the possiblities of using cough audio from healthy and COVID infected individuals and see if we can create a model that can accurately diagnose those with COVID. Using datasets obtained from Stanford's Virufy app and the CoughVid app, we combine these audio files together and create spectrograms off of each audio file. Then we train a model on the created spectrogram images. The end goal is creating a model that can classify our spectrogram images with high degree of accuracy and recall for the COVID class, then proceeding to build an application around the model that people would be able to interact with. Ideally, the application would gather audio input from people who allow the application to record their voice while they cough into the microphone. From there, the program would create a spectrogram image of the inputted audio file and then the model would evaluate the spectrogram and attempt to classify if the audio was COVID positive or not. This would be a free app and would be accessible to everyone with working phone or computer. This would also allow for people to be tested on a daily basis in quick succession when compared to other current testing methods such as Viral Testing or Antibody testing.

Obtaining Virufy audio data

The Virufy data came from the University of Stanford. While is does not contain many samples, the 16 patients in this dataset have been laboratory-confirmed cases as either having COVID-19 or being healthy at the time their audio was recorded. We will be focusing on the segmented audio in our project, which gives us approximately 121 audio files to train a model. To access the data through Google Drive, we must unzip the folder that is currently stored in our drive.

We see that the virufy audio data we have is all the same length (approx. 1.6 secs in length). Spectrograms take into account the dimension of time, so having our audio files the same length is important. Think of having different time lengths the same as distorting images by either stretching our shrinking them in width. In order for our model to perform the best, we must make sure that our images dimensions are the same.

Working with Audio Using the Librosa Library

Visualizing our audio example's waveform using librosa.display (ldp)

plt.figure(figsize=(12, 5))
ldp.waveplot(signal, sr=sr)
plt.title('Waveplot')
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.show()

png

Visualizing the Fast Fourier Transform

#Fast Fourier Transformation
fft = np.fft.fft(signal)

# These magnitudes indicate the contribution of each frequency in the sound
magnitude = np.abs(fft)

# mapping the magnitude to the relative frequency bins using np.linspace()
frequency = np.linspace(0, sr, len(magnitude))

# We only need the first half of the magnitude and frequency to visualize the FFT
left_mag = magnitude[:int(len(magnitude)/2)]
left_freq = frequency[:int(len(frequency)/2)]

plt.plot(left_freq, left_mag)
plt.title('Fast Fourier Transform')
plt.xlabel('Frequency')
plt.ylabel('Magnitude')
plt.show()

png

Here we have a Fast Fourier Transform plotted. The magnitudes indicate the contribution of each frequency in the sound. The larger the magnitude, the heavier the contribution of the frequency. Here we can see that the majority of the energy resides in the lower frequencies. The only issue with the FFT is the fact that it is static; there is no time associated with this plot. So in order to incorporate time into our audio to see what frequencies impact at what time, we should use the Short-Time Fourier Transformation.

Visualizing a Spectrogram in amplitude

# number of samples per fft
# this is the number of samples in a window per fast fourier transform
n_fft = 2048

# The amount we are shifting each fourier transform (to the right)
hop_length = 512

#Trying out Short-time Fourier Transformation on our audio data
audio_stft = librosa.core.stft(signal, hop_length=hop_length, n_fft=n_fft)

# gathering the absolute values for all values in our audio_stft variable
spectrogram = np.abs(audio_stft)

# Plotting the short-time Fourier Transformation
plt.figure(figsize=(8, 7))
ldp.specshow(spectrogram, sr=sr, x_axis='time', y_axis='hz', hop_length=hop_length)
plt.colorbar(label='Amplitude')
plt.title('Spectrogram (amp)')
plt.show()

png

As we can see (or not see), most of the frequencies in our audio contribute very little amplitude to the overall sound. Because what we are looking for is not linear, like loudness, we are going to take the log of our sounds amplitude and turn it into decibels. Humans experience frequency logarithmically, not linearly.

to_spectrogram(signal, sr=48000, hop_length=128, n_fft=1024, vmin=-40, vmax=30)

png

A spectrogram is basically composed of multiple Fast Fourier Transforms, where each FFT is calculated on overlapping windowed portions of the signal. In order for us to visualize “loudness” in our signal, we must convert from amplitude to decibels.This allows us to view the loudness of frequencies over time. By switching from a scale in amplitude to decibels, we create an image with more information to give to our model.

Recommendation:

When working with audio data, I highly recommend using the librosa library. It is full of different functions that are easy to use and come with great explanations on how to use them. Librosa also focuses on being user friendly and it relies on numpy datatypes, which allows interoperability from librosa to another library. Furthermore, librosa has other functions that allow us to extract different features from an audio file, which could also be used to classify an audio file.

# Creating a mel-spectrogram
to_mel_spectro(signal, sr, hop_length=128, n_fft=1024, figsize=(10,8), vmin=-40, 
               vmax=20, ref=1, n_mels=128)

png

A mel-spectrogram is a spectrogram where the frequencies are converted to the mel-scale. According to the University of California, the mel-scale is “a perceptual scale of pitches judged by listeners to be equal in distance from one another”. We can picture this as notes on a musical scale:

From C to D is one whole step, and from D to E is another whole step. Perceptually to the human ears, the step sizes are equal. However if were were to compare these steps in hertz, they would not be equal steps. A C is around 261.63 Hz, a D is 293.66 Hz, and an E is 329.63 Hz.

  • C to D difference = 32.03 Hz
  • D to E difference = 35.37 Hz

As the notes go higher in octave, the difference between the steps dramatically increases. Mel-spectrograms provide a perceptually relevant amplitude and frequency representation.

Recommendation:

When working with spectrograms created from human audio, taking the log of the amplitude and converting it to decibels will give your model more to look at, and allow it to learn more from each image. Since we are working with coughing audio, converting the frequency to the mel-scale allows us to peer more into the tonal relationship of the frequencies for each audio file.

Creating & Saving Mel-Spectrograms for Virufy Dataset

In order for us to train a Sequential Convolutional Neural Network on mel-spectrograms, we create and save mel-spectrograms for each audio clip in our Virufy dataset and save them into a new folder under their respective class.

# Using np.hstack() to show our images side by side
res = np.hstack((neg_img, pos_img))
# Creating a figure and adding a subplot
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(111)
# plotting the horizontal stack using plt.imshow()
plt.imshow(res, cmap='gray')
plt.title('Healthy Image                                 COVID Image')

# Hiding our axes and frame
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
ax.set_frame_on(False)
plt.show()

png

When looking at the spectrograms, there's not much we can take away just by looking at them. We notice the coughs are roughly the same length in duration, and the prominence of some frequencies of sound may be slightly different when comparing patients (male vs female, older vs younger, etc).

Modeling off of the virufy spectrogram images

---------------------------------------------------------
                  Classification Report

              precision    recall  f1-score   support

           0       1.00      0.73      0.85        15
           1       0.73      1.00      0.85        11

    accuracy                           0.85        26
   macro avg       0.87      0.87      0.85        26
weighted avg       0.89      0.85      0.85        26

---------------------------------------------------------

png

png

png

png

------------------------------------------------------------
1/1 [==============================] - 3s 3s/step - loss: 1.2990 - acc: 0.8462 - precision: 0.7333 - recall: 1.0000 - auc: 0.9152
loss score: 1.2989661693572998
accuracy score: 0.8461538553237915
precision score: 0.7333333492279053
recall score: 1.0
auc score: 0.9151515364646912

Time to run cell: 310 seconds

We seem to be getting an 85% accuracy with our model and a recall for the COVID class of 100%. However, when it comes to classifying our spectrogram images with a basic model we've created, the dataset we are working with is too small - only containing around 16 different patients with 121 segmented audio samples. We can also see that our model begins to overfit after the 8th epoch. Let's get more audio data from the coughvid dataset and combine it with the virufy dataset.

Obtaining CoughVid audio data

We understand that in order to create a model that we ca rely on to give us trustworthy results, we must include more data for our model to look at - or run the risk of having our model overfit to the data. We will begin exploring the CoughVid dataset. Again, we must unzip the saved file stored in our Google Drive in order to access the CoughVid audio data. Note that CoughVid is a crowdsourced dataset, gathered from the CoughVid app, which can be found in the link below. This dataset contains over 20,000 different types of cough audio samples.

link to CoughVid app: https://coughvid.epfl.ch/

# Viewing the distribution of age in coughvid dataset
fig, ax = plt.subplots(figsize=(6, 5))
sns.histplot(data=coughvid_df, x='age', bins=10, kde=True)
plt.axvline(x=coughvid_df['age'].mean(), c = 'r', label='Mean Age')
plt.title('Distribution of Age')
plt.legend()
plt.show()

png

Column information

  • uuid: The address of the associated audio and json file for a patient.
  • datetime: Timestamp of the received recording in ISO 8601 format.
  • cough_detected: Probability that the recording contains cough sounds, according to the automatic detection algorithm that was used by Orlandic et al.
  • latitude: Self-reported latitude geolocation coordinate with reduced precision.
  • longitude: Self-reported longitude geolocation coordinate with reduced precision.
  • age: Self-reported age value.
  • gender: Self-reported gender.
  • respiratory_condition: If the patient has other respiratory conditions (self-reported).
  • fever_muscle_pain: If the patient has a fever or muscle pain (self-reported).
  • status: The patient self-reports that has been diagnosed with COVID-19 (COVID), that has symptoms but no diagnosis (symptomatic), or that is healthy (healthy).

Within the next set of columns, it is important to know that 3 expert pulmonologists were each assigned with revising 1000 recordings to enhance the quality of the dataset with clinically validated information. They selected one of the predefined options to each of the following 10 items:

Categorical Columns:

  • quality: quality of the recorded cough sound.
    • values: {good, ok, poor, no_cough}
  • cough_type: Type of the cough.
    • values: {wet, dry, unknown}
  • diagnosis: Impression of the expert about the condition of the patient. It can be an upper or lower respiratory tract infection, an obstructive disease (Asthma, COPD, etc), COVID-19, or a healthy cough.
    • values: {upper_infection, lower_infection, obstructive_disease, COVID-19, healthy_cough}
  • severity: Impression of the expert about the severity of the cough. It can be a pseudocough from a healthy patient, a mild or severe cough from a sick patient, or unknown if the expert can’t tell.
    • values: {pseudocough, mild, severe, unknown}

Boolean Columns:

  • dyspnea: Presence of any audible dyspnea.
  • wheezing: Presence of any audible wheezing.
  • stridor: Presence of any audible stridor.
  • choking: Presence of any audible choking.
  • congestion: Presence of any audible nasal congestion.
  • nothing: Nothing specific is audible.

We see that the majority of our data have missing values in the expert columns. This is expected because they each reviewed only 1000 audio files, therefore the majority of these values should be missing. Also note that about 15% of the recordings were labeled by all three reviewers, so that Orlandic et al. could assess the level of agreement among the pulmonologists.

Our Reasoning for setting the threshold of cough_detection >= 0.8:

According to Orlandic et al., "the ROC curve of the cough classifier is displayed below, which users of the COUGHVID database can consult to set a cough detection threshold that suits their specifications. As this figure shows, only 10.4% of recordings with a cough_detected value less than 0.8 actually contain cough sounds. Therefore, they should be used only for robustness assessment, and not as valid cough examples."

Inspecting, Scrubbing, & Exploring

auc_roc_curve_cough_detection.jpg

Figure from: Lara Orlandic, Tomas Teijeiro, & David Atienza. (2021). The COUGHVID crowdsourcing dataset: A corpus for the study of large-scale cough analysis algorithms (Version 2.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4498364

This dataset required a lot of scrubbing to retrieve the data we needed to work with. Along with using the 'cough_detected' column > 0.8 as a filter for choosing the correct audio to model off of, we also had to remove rows where the labels were null or labeled as 'symptomatic'. The values within the 'status' column are self-reported by the patient. The values include: COVID, symptomatic, and healthy. In order for us to properly train a model, we cannot use the symptomatic audio files. People who report their status as symptomatic indicating that they do have symptoms similar to COVID-19 symptoms, but they have not been diagnosed as being COVID-19 positive. So while some of the patients who report as symptomatic may in fact have COVID-19, we must discard this class because it is not definitive.

Moving all audio files into new directory

Since the unzipped public dataset not only contains our desired audio files, but also related .json files and a .csv file containing the metadata for the entire dataset, we are going to move the audio files into a new directory so we don't have to worry about any other files in the directory while working with the target audio files.

Separating healthy and covid audio via separate directories

We filter our audio files by placing the audio file into a new folder, either 'pos' or 'neg', by targeting the uuid column and moving the audio from its address into the proper folder.

Finding Duration for audio files

Like we mentioned earlier, it's important to make sure the audio clips are all the same length. Below, we create a dataframe that contains the duration of each audio file in our folder that contains the audio files that match our 'uuid' column in our scrubbed dataset.

sns.histplot(duration_df, x='duration')
plt.title('Distribution of Duration of Audio Files');

png

Above, we see that the majority of our audio in the target coughvid data is around 10 seconds long. However, there is a lot of data that is less than 10 seconds. Our next step is to combine the virufy data with the coughvid data, and then extend the audio files that are less than 10 seconds. Below, we find that the 99th percentile is approximately 10.02 seconds. We will set 10.02 as the maximum duration for our audio files.

Converting .webm audio to .ogg audio

Many of the files from the coughvid dataset are .webm files - which are video files. In order for librosa to be able to extend the duration of these files, we must first convert them into .ogg files so they will be compatible with the librosa library. These steps took a a considerable amount of time to run, so we also created a copy of these folders and saved them in Google Drive so we would not have to run the below cells every time we wanted to access these files.

Adding Silent length to Target audio folders

Now that we've converted the .webm files to .ogg files, our next step is to extend the length of these files and combine extended virufy audio into our new folders. Similar to how we want to make sure CNN models are fed images of the same image shape, we want to make sure our audio files are also the same shape. Time is an axis on a spectrogram, therefore we must make sure the time in each audio file is equal.

Extending length of virufy audio

Since we are combining the CoughVid dataset with the Virufy dataset, we must make sure tha the Virufy set is the same length in terms of time as the rest of the CoughVid dataset. We implement the librosa library and zero pad the audio files until their duration is equal to 10.02 seconds - the same as our extended CoughVid audio.

Creating and saving mel-spectrograms

Now that we've extended all target audio files and combined the virufy data with the Coughvid data, it is time to create and save mel-spectrogram images that our model will be training on. For more information on how to create mel-spectrograms via the librosa library, visit this link: https://librosa.org/doc/main/generated/librosa.feature.melspectrogram.html

Modeling

Preparing Image Data Generators

Now that we've gone through the preprocessing steps to make our audio the same length and created mel-spectrograms, we now must create train, test, and validation folders to split the spectrogram files according to class. Again, we use the splitfolders library to help us. We should also note that we have manually zipped the folder containing all the spectrograms before continuing. This way, we can extract the contents of the zipped folder within our Google Drive, allowing for much quicker access to the spectrogram images.

We then create train, test, and validation folder sets, and have our newly created Image Data Generator flow the data from each of these directories into respective iterators. We implement some basic augmentation on the training set to try and protect against the heavy class imbalance. We also apply our class weights function to add to our fit_plot_report_gen() function.

Note here that unlike our Virufy dataset, the combined dataset is heavily imbalanced, with most images belonging to the "Healthy" class. Notice that when we created the training image data generator, we added augmentation along with pixel normalization as an effort to help the model see "more images" because of the class imbalance.

# Our classes are extremely unbalanced
class_weights_dict = make_class_weights(train_spectro_gen)
{0: 0.5515752032520326, 1: 5.347290640394089}

Creating a model function for spectrograms

spectro_fpr = fit_plot_report_gen(spectro_model1, train_spectro_gen, 
                                  test_spectro_gen, val_spectro_gen, epochs=5, 
                                  class_weights=class_weights_dict)
Epoch 1/5
136/136 [==============================] - 433s 3s/step - loss: 0.1075 - acc: 0.8786 - precision: 0.0000e+00 - recall: 0.0000e+00 - auc: 0.4752 - val_loss: 0.0674 - val_acc: 0.9079 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_auc: 0.5492
Epoch 2/5
136/136 [==============================] - 421s 3s/step - loss: 0.0741 - acc: 0.9030 - precision: 0.0000e+00 - recall: 0.0000e+00 - auc: 0.4876 - val_loss: 0.0517 - val_acc: 0.9079 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_auc: 0.5493
Epoch 3/5
136/136 [==============================] - 427s 3s/step - loss: 0.0706 - acc: 0.9110 - precision: 0.0000e+00 - recall: 0.0000e+00 - auc: 0.5100 - val_loss: 0.0636 - val_acc: 0.9079 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_auc: 0.5325
Epoch 4/5
136/136 [==============================] - 423s 3s/step - loss: 0.0725 - acc: 0.9046 - precision: 0.0000e+00 - recall: 0.0000e+00 - auc: 0.5530 - val_loss: 0.0721 - val_acc: 0.9079 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_auc: 0.4843
Epoch 5/5
136/136 [==============================] - 420s 3s/step - loss: 0.0736 - acc: 0.9020 - precision: 0.0000e+00 - recall: 0.0000e+00 - auc: 0.5067 - val_loss: 0.0634 - val_acc: 0.9079 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_auc: 0.5759
---------------------------------------------------------
                  Classification Report

              precision    recall  f1-score   support

           0       0.91      1.00      0.95      1126
           1       0.00      0.00      0.00       118

    accuracy                           0.91      1244
   macro avg       0.45      0.50      0.48      1244
weighted avg       0.82      0.91      0.86      1244

---------------------------------------------------------


/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))

png

png

png

png

png

png

------------------------------------------------------------
39/39 [==============================] - 78s 2s/step - loss: 0.0635 - acc: 0.9051 - precision: 0.0000e+00 - recall: 0.0000e+00 - auc: 0.5751
loss score: 0.0634758397936821
accuracy score: 0.9051446914672852
precision score: 0.0
recall score: 0.0
auc score: 0.5750632882118225

Time to run cell: 2285 seconds

Our model is having a very difficult time differentiating between our classes. Even though we used class weights, our model is only guessing every label as the majority label (healthy). So while our model is giving us an 91% accuracy, it is basically useless. Let's try and give our model more COVID images to look at. In our next step, we will create augmented images of the minority class from the training set and balance the training set with those augmented images.

Oversampling with Image Augmentation Manipulation

We are going to attempt to address the class imbalance by creating augmented images of the minority class spectrograms and combine these images with our training folder to create a faux-balanced dataset. To do this, we will:

  • create a copy of our training set audio folder containing both classes
  • remove all majority class images
  • create a new ImageDataGenerator with augmentation
  • manipulate the generator by stating the batch size as the amount in our minority class (406)
  • create a new folder to store iterations of the augmented data until the minority folder number of images is equal to the number of images in the majority folder
  • create another copy of our audio folder containing both classes
  • add the augmented images into the minority folder of the training set
  • create a new image data generator and iterators for our new data to flow through
  • Fit a model to the new training set
spectro_oversamp_fpr = fit_plot_report_gen(spectro_model3, train_spec_gen, test_spec_gen, val_spec_gen, 
                                           epochs=10)
Epoch 1/10
246/246 [==============================] - 335s 1s/step - loss: 0.0718 - acc: 0.5370 - precision: 0.6727 - recall: 0.1092 - auc: 0.6404 - val_loss: 0.0843 - val_acc: 0.9128 - val_precision: 0.8000 - val_recall: 0.0702 - val_auc: 0.4806
Epoch 2/10
246/246 [==============================] - 331s 1s/step - loss: 0.0485 - acc: 0.7717 - precision: 0.9553 - recall: 0.5679 - auc: 0.8598 - val_loss: 0.0331 - val_acc: 0.9095 - val_precision: 0.6667 - val_recall: 0.0351 - val_auc: 0.5297
Epoch 3/10
246/246 [==============================] - 330s 1s/step - loss: 0.0289 - acc: 0.8845 - precision: 0.9844 - recall: 0.7825 - auc: 0.9318 - val_loss: 0.0318 - val_acc: 0.9079 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_auc: 0.6028
Epoch 4/10
246/246 [==============================] - 330s 1s/step - loss: 0.0275 - acc: 0.8805 - precision: 0.9809 - recall: 0.7754 - auc: 0.9468 - val_loss: 0.0334 - val_acc: 0.9079 - val_precision: 0.5000 - val_recall: 0.0526 - val_auc: 0.5786
Epoch 5/10
246/246 [==============================] - 329s 1s/step - loss: 0.0214 - acc: 0.9052 - precision: 0.9859 - recall: 0.8229 - auc: 0.9693 - val_loss: 0.0406 - val_acc: 0.8853 - val_precision: 0.0625 - val_recall: 0.0175 - val_auc: 0.6374
Epoch 6/10
246/246 [==============================] - 327s 1s/step - loss: 0.0132 - acc: 0.9416 - precision: 0.9921 - recall: 0.8880 - auc: 0.9882 - val_loss: 0.0500 - val_acc: 0.8643 - val_precision: 0.2000 - val_recall: 0.1579 - val_auc: 0.6153
Epoch 7/10
246/246 [==============================] - 327s 1s/step - loss: 0.0088 - acc: 0.9671 - precision: 0.9928 - recall: 0.9408 - auc: 0.9945 - val_loss: 0.0534 - val_acc: 0.8627 - val_precision: 0.2083 - val_recall: 0.1754 - val_auc: 0.6336
Epoch 8/10
246/246 [==============================] - 327s 1s/step - loss: 0.0052 - acc: 0.9814 - precision: 0.9974 - recall: 0.9657 - auc: 0.9980 - val_loss: 0.0574 - val_acc: 0.8788 - val_precision: 0.0909 - val_recall: 0.0351 - val_auc: 0.6200
Epoch 9/10
246/246 [==============================] - 326s 1s/step - loss: 0.0024 - acc: 0.9902 - precision: 0.9983 - recall: 0.9821 - auc: 0.9997 - val_loss: 0.0931 - val_acc: 0.8772 - val_precision: 0.2564 - val_recall: 0.1754 - val_auc: 0.6208
Epoch 10/10
246/246 [==============================] - 324s 1s/step - loss: 0.0021 - acc: 0.9949 - precision: 0.9990 - recall: 0.9905 - auc: 0.9994 - val_loss: 0.0813 - val_acc: 0.8805 - val_precision: 0.2258 - val_recall: 0.1228 - val_auc: 0.6396
---------------------------------------------------------
                  Classification Report

              precision    recall  f1-score   support

           0       0.91      0.96      0.94      1126
           1       0.27      0.14      0.18       118

    accuracy                           0.88      1244
   macro avg       0.59      0.55      0.56      1244
weighted avg       0.85      0.88      0.87      1244

---------------------------------------------------------

png

png

png

png

png

png

------------------------------------------------------------
39/39 [==============================] - 77s 2s/step - loss: 0.0879 - acc: 0.8834 - precision: 0.2712 - recall: 0.1356 - auc: 0.6228
loss score: 0.08785425126552582
accuracy score: 0.8834404945373535
precision score: 0.2711864411830902
recall score: 0.1355932205915451
auc score: 0.622836172580719

Time to run cell: 3448 seconds

Interpretation

Our model had an accuracy of 88% with a recall of around 13.5%. This model is performing better than our first model that used the spectrograms without augmentation, in terms of recall scoring (the first model had a recall rate of 0). This model is still having a difficult time differentiating the correct class based off of the spectrogram images alone. Our AUC improved slightly improved, however it is still too low of a score to trust this model's ability to classify COVID-19 in a patient's cough audio.

Possible reasons our models are struggling to differentiate between classes:

  1. The silence in audio files may be introducing ambiguity into the models, which could be interfering with our model's accuracy and ability to differentiate between the classes.
  2. Our model could struggling with identifying the different classes due to the heavy class imbalance.
  3. The labels that were marked in the coughvid dataset 'status' column were self-reported, so there could be noise in the labels.
  4. The model may not be able to find any patterns in the spectrograms we created from the audio files.
  5. Our model may not be complex enough to find any patterns in our spectrograms.
  6. The difference in duration between audio files may be too great; some are 2 second audio files with 8 seconds of zero padding, which could be affecting the model's ability to correctly classify.
  7. More data may be required.

Recommendations Section

Taking advantage of the splitfolders library is a great and easy way to create train, test, and validation folders for any type of classification data - as long as the classes are predefined in their own folders.

I highly recommend using CLAHE as a preprocessing technique if you're working with images like x-rays or MRI scans. CLAHE is able to provide enough contrast to the image without overamplifying the intensity of the pixels, providing more 'depth' within each image. It is a great tool if the goal of your project involves detection and/or recognition between classes.

The librosa library is filled with tools to help assist with manipulating audio, extracting features, and creating different plots of different audio transformations. If you end up working with audio data, I recommend implementing the librosa library to help explore and create valuable features and plots from the audio data.

When working with spectrograms created from human audio, taking the log of the amplitude and converting it to decibels will give your model more to look at, and allow it to learn more from each image. Since we are working with coughing audio, converting the frequency to the mel-scale allows us to peer more into the tonal relationship of the frequencies.

While my audio model still needs more work and further exploration, I recommend health companies to invest in obtaining high quality COVID-19 positive audio data with expert-confirmed labels. Having publicly accessible high quality data would be the key to helping prevent the further spread of COVID-19, and possible future variants/strains of coronavirus.

Conclusion

The cough data we retrieved is currently (early 2021) sparse and hard to come by. COVID-19 audio datasets of high quality with laboratory-diagnosed labels are rare and many institutions that are working on cleaning their data for their own models have yet to make their data available to the public (such as University of Cambridge, Massachussetts Institute of Technology, NIH, etc.). There are different ways other researchers have gone about using Convolutional Neural Networks with cough audio data, such as: spectrograms, Mel-Frequency cepstral coefficients, audio feature extraction, and mel-spectrograms why trying to classify the COVID-19 cough audio. While I was unable to get any real traction when it came to classifying COVID through the use of spectrograms, I learned a lot about manipulating auditory data. Even though the models did not perform to my expectation, I'm sure the knowledge I've gained from this project could be useful when identifying other health-related events, such as detecting heart diseases based off of electrocardiographies or heartbeat audio. While I am nowhere near done with this project, I can say I've fully enjoyed the entire process.

Future Research

While I have only been able to explore a few different methods when creating spectrograms, I have a lot of different options in front of me in terms of identifying further ways to tackle this problem. Finding other coughing audio through other sources could prove to be beneficial, especially if the audio is high quality with laboratory confirmed labels. Using different parameters when creating the spectrograms could also help the model's recognition ability. We could also try other audio imaging techniques like Mel-Frequency Cepstral Coefficients (MFCCs), or even try feature extraction to try and find key features that impact the detection efficiency when using the model to diagnose the audio.

Another tactic that I will try will involve setting the audio files to a lower time duration before creating the spectrograms. I've realized that some of the audio (like the virufy audio) is less than 2 seconds in duration. By zero padding 8 seconds to the audio file, there may not be much information the model can really work with. So my idea is to clip the silence off the ends of the audio files, then cut each audio file in half that is greater than 6 seconds in duration. Another similar option would be to create 2 second segments for each audio file, and zero pad audio files that are less than 2 seconds in duration. This way, I'd be reducing the width of the spectrograms and giving the models more to look at.

References

  1. M.E.H. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M.A. Kadir, Z.B. Mahbub, K.R. Islam, M.S. Khan, A. Iqbal, N. Al-Emadi, M.B.I. Reaz, M. T. Islam, “Can AI help in screening Viral and COVID-19 pneumonia?” IEEE Access, Vol. 8, 2020, pp. 132665 - 132676.

  2. Lara Orlandic, Tomas Teijeiro, & David Atienza. (2020). The COUGHVID crowdsourcing dataset: A corpus for the study of large-scale cough analysis algorithms (Version 1.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4048312

  3. Chaudhari, Gunvant, et al. "Virufy: Global Applicability of Crowdsourced and Clinical Datasets for AI Detection of COVID-19 from Cough." arXiv preprint arXiv:2011.13320 (2020).

  4. Hosseiny M, Kooraki S, Gholamrezanezhad A, Reddy S, Myers L. Radiology Perspective of Coronavirus Disease 2019 (COVID-19): Lessons From Severe Acute Respiratory Syndrome and Middle East Respiratory Syndrome. AJR Am J Roentgenol2020;214:1078-82. doi:10.2214/AJR.20.22969 pmid:32108495

About

Using neural networks to classify COVID-19 in chest x-rays (part 1) and mel-spectrograms (part 2)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published