Skip to content

Web Tool for Disease Incidence Estimation with Shogun

Giovanni De Toni edited this page Jan 22, 2020 · 11 revisions

Predicting how some diseases spread among the population of certain countries is a useful application to machine learning techniques to a real-world problem. Several studies highlighted how it is possible to estimate the incidence of certain diseases by looking at social networks (e.g. Twitter) and/or other sources of information.

This project means to replicate the work of McIver and Brownstein [1] by using Shogun and by creating a tool which can be used to monitor influenza-like illnesses in near-real-time. In the presented paper, the author shows how it is possible to estimate the incidence of influenza-like illness in the USA by looking at Wikipedia's page views of certain articles.

Several hints suggest that this model can also work for other countries as well.

Description

This GSoC project can be divided into two parts:

  • Develop a machine learning model by using Wikipedia's data with Shogun;
  • Expose the model to the internet with some REST API and provide a web interface to access the results;

The first part is aimed to replicate the results of the original paper by using Shogun architecture. The output of this first part will be a complete script/notebook which shows the obtained results with a data analysis/visualization section and a description of the models used with their strength and weaknesses. To have an idea of what it should look like, you can have a look at several good Kaggle Notebooks (for instance, this one here about predicting house prices). Moreover, the first part must also produce a serialized version of the final Shogun model such to be reused and updated.

The second part of the project is aimed to expose this model to the internet by an API. The ideal outcome of this would be a docker container which deploys the model. A web interface must be available such to show the current ILI estimated levels by fetching data from the Wikipedias API [3].

Mentors

Requirements

You need to know:

  • C++
  • Python
  • Shogun (just a little bit 😉 )
  • Machine Learning Basics (understanding of regression models)
  • Docker and Flask (basic level)
  • HTML/CSS/Javascript (basic level)

If you already have experience in working on machine learning projects (e.g., previous open-source contributions, coursework, etc.) then it would be a plus, but it is not mandatory.

Bear in mind that the focus of this project will be on the machine learning application of Shogun. Therefore, you do not necessarily need to possess phantasmagorical frontend/backend skills.

Why this is cool

This project will give you the opportunity to apply machine learning techniques to a real-world project. Moreover, it will be possible for you to advance your skills in several areas (programming, data analysis, data visualizations, web servers, etc.). You will be able to develop a full-fledged system which exposes automatically a trained machine learning model online. Moreover, this project could be used in the future to showcase your abilities.

First Steps

The first steps would be to document yourselves initially about the topic itself and to do a little research about which are the solutions already available out there. You should then produce a plan about which kind of techniques you plan to use (this can be discussed directly with the mentors) and about what you imagine the web application to be like. Be creative!. Don't be afraid to propose new ideas. In the end, this will be YOUR project and the mentor(s) will just help you to make it a reality 😉

This project requires close collaboration between you and your future mentor(s), so in order to increase the changes to be selected, be sure to start interacting as soon as possible with them to discuss your ideas and various details.

Useful Resources

  1. Wikipedia Usage Estimates Prevalence of Influenza-Like Illness in the United States in Near Real-Time
  2. Wikipedia's Pageview + Influenza Incidence in Europe Dataset
  3. Wikipedia Pageview API
Clone this wiki locally