Skip to content
This repository has been archived by the owner on May 2, 2022. It is now read-only.

Enhancement: Analytics that predict future spreading zone before outbreak occurs. #245

Open
jelbazi opened this issue Mar 23, 2020 · 15 comments
Labels
enhancement New feature or request

Comments

@jelbazi
Copy link
Contributor

jelbazi commented Mar 23, 2020

This research (see pdf + images below) does an analysis of questionnaires filled out by Israeli citizens where they argue they can predict "...future spreading zones a few days before an [COVID19] outbreak occurs."

As we plan to survey citizens and expect an enormous (hopefully) amount of data, this can be a very significant next step in helping authorities (and individuals) to take action BEFORE an exponential outbreak occurs in a specific township/county/zip-code area/region.

  • The first steps on our end are to see if we have similar questionnaires (and if not, if we can adapt/change them before going live).
  • Then bring in knowledge/skills of data scientists/researchers/epidemiologists to evaluate the setup. Maybe RIVM/NHS connected people can help here?
  • After that, we should start implementing the actual app features to provide everyone using predictive models with relevant information.

For NOW, I will at least compare which variables/symptoms we ask of people now vs what's in the research. Then we can more easily decide on the next steps (of going this route/enhancement or not).

Ideas on this?

PDF: A framework for identifying regional outbreak and spread of COVID-19 from one-minute population-wide surveys.

researchimg

Survey form
researchqs

@jelbazi
Copy link
Contributor Author

jelbazi commented Mar 23, 2020

I did a comparison in detail 👇

For detailed notes go to the Google sheet.

It seems we tick a lot of the variables already. Maybe it's not too much work to add the other ones?

One important factor, which I am not sure of now. Do we keep a time-series record of the data? E.g., do we track (or at least save in the DB) changes over time?

Comparison:
Variablescoronastatus

@adriaandotcom
Copy link
Contributor

I think it would be great to be able to predict the outbreak based on these questions.

We have some data already so I’m not sure how much code needs to change and how the data will change as well compared to what we already collected in Norway.

@michaelmcmillan
Copy link
Member

This would be extremely cool to check against the numbers we've collected in Norway!

@michaelmcmillan
Copy link
Member

And yes, we track changes over time – so it is in fact a time series, but what we've seen is that numbers of people who come back to the site to update their health status is pretty low:

From left to right: Reported at least 1 time, reported at least 2 times, reported at least 3 times, reported at least 4 times, and so on
image

@jelbazi
Copy link
Contributor Author

jelbazi commented Mar 23, 2020

And yes, we track changes over time – so it is in fact a time series, but what we've seen is that numbers of people who come back to the site to update their health status is pretty low:

From left to right: Reported at least 1 time, reported at least 2 times, reported at least 3 times, reported at least 4 times, and so on
image

Interesting numbers!

Maybe getting the return visit up if it's being talked about in the news constantly. But if not then I guess it'll be low indeed.

But good that we have a time-series.

I think implementing the below variables will be useful though, so we can reference the paper and get other governments/OS communities to adopt this repo/model.
And how awesome would it be if we can actually predict outbreaks!

So here what I think needs to be added:
Here the ones we need to add if we want the same set (+ extra's we already have) as in the research.

These are like the other symptoms in the list, just yes/no:

  • NAUSEA OR VOMITING

  • DIAGNOSED OTHER CONDITIONS (Diabetes mellitus, Hypertension, Ischemic heart disease, Asthma, chronic lung disease, Chronic kidney disease)

This one takes in a number (will depend on langue if C or F I guess?)

  • BODY TEMPERATURE

These take in a checkbox (1 of 3):

  • IN ISOLATION (1. Not in isolation, 2. In isolation - due to a recent international travel, 3. In isolation - due to a contact with an individual who was infected with coronavirus or an individual who recently returned from any destination abroad.)

  • SMOKING HISTORY (1. I currently smoke, 2. I used to smoke, 3. I have never smoked)

@fossecode
Copy link
Member

I will make the required changes to the form

@kaared
Copy link
Contributor

kaared commented Mar 23, 2020

This is a great idea! What questions are asked on the form is critical to what kind of answers one can extract from the dataset. I think it would be worthwhile to contact the authors of the paper and see if we can get them interested. They in turn can reach out to their community. I tried to get these kinds of answers from Norwegian authorities, but no answers, and the recently published official form is quite lacking, I think -- but I'm not an epidemiologist, don't even play one on TV.

The questionnaire in the paper is a great start and may even be all they need. Getting the codebase to the point where that questionnaire is supported is a very good thing, even though it means that already collected data may have to be thrown away or massaged. But for new countries or areas this doesn't matter.

Another example of a form: http://test.koronaegenmelding.no:9000/survey/take/1. Google translate doesn't like the port number so copy+paste text to translate.

@kaared
Copy link
Contributor

kaared commented Mar 23, 2020

I think issues #172, #173 and #184 can help boost the repeat visits.

@jelbazi
Copy link
Contributor Author

jelbazi commented Mar 23, 2020

I'll see if we can look up the authors online.

@fossecode
Copy link
Member

It would be nice if someone could test the newly added fields on https://cac3bd65.ngrok.io/

The URL will work for a couple of hours. Translations are not completed yet.

@kaared
Copy link
Contributor

kaared commented Mar 23, 2020

Quick feedback:

  1. Swap geography and demography. The fields order dictate that.
  2. Biological gender is sufficient. Avoid sex. Male and female is context enough if any doubt.
  3. Zip code should be postal code / zip code. Zip code is primarily USA/Canada.
  4. Zip code is limited to size 4. Should be way more. E.g. England has really long ones.
  5. Change 'someone who was tested' to 'someone who tested'.
  6. 'Select only the symptoms you have experienced'. Should this be 'currently have or have experienced'? Since we're time-based I guess 'currently have' is what we want?
  7. I wonder if 'loss of' should be 'reduced'. Is loss absolute or relative? Not sure.
  8. I personally think it would be useful to know each of the preexisting conditions the person have, i.e. avoid yes/no and accept multiple choices. Epidemiologist feedback would be great here!
  9. Some labels have a colon while others do not. Nitpick.
  10. About smoking: I wonder if current smoker, recently quit, quit long time ago, never smoked would be good. Verbose? Yes, definitely. Useful to know? Maybe.
  11. Change 'temperature? (Celcius)' to 'temperature in Celsius'.
  12. Suggestion: 'You are completely anonymous. The data you submit cannot be used to identify you.'
  13. Privacy policy is typically internal. We want to use privacy statement which is external facing.

The additional field when selecting something that needs more information is a nice touch, but I actually missed it the first time. Would it be better to add it below so that I will catch it next? We tend to scroll top to bottom so if something shows up on my right I may not catch it depending on where my eyeballs are. On mobile it's not a problem of course.

Good work -- big improvement!

@fossecode
Copy link
Member

Thanks @kaared, fixed most of the points, but I waited with some of them:
2. We have gotten quite a lot negative feedback when using only "Biological gender/Biologisk kjønn".
4. Will fix this in a separate task
6. Yeah, we should probably discuss what we actually want before changing this.
7. This was initially a suggestion from a doctor. The research paper has also used "loss of".
8. Should be discussed first
10. Should be discussed first

Can be tested with translations on https://cac3bd65.ngrok.io/ now.

@ian-starts
Copy link
Contributor

especially with edits, looks really good and understandable. Also works as expected.

@adriaandotcom
Copy link
Contributor

2: Indeed, let's not use gender: #202

@adriaandotcom adriaandotcom added the enhancement New feature or request label Mar 23, 2020
@michaelmcmillan
Copy link
Member

I've reached out to the researchers of the paper to hear if we can be of any help to them.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants