Skip to content

ChristophGeske/Covid-19-Detector

Repository files navigation

Covid-19-Detector using Cough and Sound Recordings - Android App

Why is this relevant?

With the emergence of the Omicron variant we will soon (mid January 2022 peak in 15.01.2022) see a spike in cases including increased hospitalizations and a need for rapid antigen tests, according to the latest models. New studies also show that current tests are less accurate in detecting Omicron and the recommendation ist to use 2 tests instead of one which further increases demand for tests.

However with a doubling of cases every 3.5 days it is hardly imaginable how demand for tests can keep up.

Research over the last year showed that detecting Corona from cough sounds alone is possible, but so far no app making use of this technology was made available to the public.

But the latest research has also shown, that Omicron affects the upper airways more than the lung which might be the reason for less sevarity and might result in decreased accurecy when only relying on cough data. Therefore a focous should be put on other sounds like voice and breathing as well!

The development of a publically available Covid test using only the microphone in widly available Android and iOS phones could help reduce the impact the Omicron wave by detecting infection earlier.

Current state of the project

A simple model (not for cough data yet) was trained and imported into an Android app. The model used in the app is just an proof of concept and needs to be replaced with a model able to detect real covid cough data. The App needs to be extended to record cough sounds and put them through the model returning a positiv/negative result.

Prerequesits

Available public code projects:

Available public sound Datasets:

Paper List:

Implementation details (Work in progress)

  1. We start with a new Android Studio project. Using the "Basic Activity template", API level 23 Marshmallow (for >95% device coverage) and Java as the programming language.
  2. Preparing the data. Offten this step is the hardest since building the model is easy when using tools like kares which provide all the parts to train a model.
  3. Train the model with Kares in Google Colab resulting in a .h5 and .tflite file. See the SimpleExampleOfTrainigATesnsorflowModel.ipynb for details.
  4. Add the functionality of running pretrained models on android following this guide and this github repo
  5. Created an asset folder and add the tflite file you trained with Google Colab and downloaded in the previous steps. https://stackoverflow.com/questions/18302603/where-to-place-the-assets-folder-in-android-studio
  6. ...

Some thought for later implementation

  • transfer lerning looks like a must
  • the cough sounds must be cropped to have the same length for training and detection!
  • 'selective Training' idealy we collect personalised cough data of the user before he gets covid to reduce the false positive rate of the app. Gender, age, ... or just use user recordings to classefy the user and train a better personalized model with trining data similar to the user.
  • Put disclaimers with the accuracy of the test, using graphics comparing the accuracy with rapid antigen and PCR tests for comparison
  • Inform user on what sound is best for detection and dicurrage users with bg noise or other respiratory deseases to use the app since its not clear if it works well for them.
  • Avoid Text as much as possible and only use GIF's/Animations so the test can be used by everyone without a language barrier.
  • Output should include the confidence of the model and the information that the disclaimer that the results can be wrong even if confidence is high. Also It should be very simple by presenting a probability of having covid and giving the user the option to see more detailed data of his recording analysis.
  • A combination of cloud based analysis when an internet connection is available and a on device analysis tool for offline use would be ideal.
  • According to Andrew Ng famous ML lecture
    • CNNs are good for image detection but RNNs are better for sounds
    • larger network and more data are the 2 main factors for improving the network
    • ReLU speeds up training compared to sigmoid activation function, but sigmoid should be used for the last(output) layer since we only have 0 or 1 as an output
    • Hyperparameters are Alpha (Learning rate), # of iterations of Gradient Descent, find the right number of hidden layers, # of hidden Units (nodes per layer), which activation function in which layer, momentum, min-batch size, regularization parameters, .... -> use trial and error and itterate to find optimum.
    • train/dev/test set should have a ration of 60%/20%/20% when dealing with limited amount of data as in our case of covid sounds.
    • Make sure that dev and test set come from the same distribution but it is ok if training set comes from an other distribution e.g. for the sake of more data
    • If the result has high bias(underfitting) and/or high variance(overfitting) try: bigger network (until bias shrinks), train longer (never hurts), different nural network arcitecture, more data and regularization (in case of high variance).
  • Data Preperation has 3 main steps:
    • Cleaning data to remove missing data, noise, ...
    • Data Transformation and normalization: Normaly we standardize so that the standarddiviation is 1. Attention! If you later add new data the normalization and standardization must be the same as for the previous data!
    • Data Reduction: remove duplicates, remove data you dont need for your analysis, corrolation analysis (removes data which are so simmilar that removing them doesnt change the result we want), forward-backward-attribut selection (train ML model with and without the data ans check if prediction quality is affected. If it has no affect the data can be removed), forward-attribut selection(start with one attribute and add more until learning doesnt get better)
      • To make a more informet decision on what to remove, use Principal Component Analysis (PCA). Remember to standardize the data first to avoide vastly different variance between the dimensions. Variance should be 1 for all dimensions. Data Cutoff normaly set so that only the PCA's are used, that explain 99% of the differences.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published