Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad data quality South America #291

Open
co-miko opened this issue Jul 20, 2020 · 5 comments
Open

Bad data quality South America #291

co-miko opened this issue Jul 20, 2020 · 5 comments

Comments

@co-miko
Copy link

co-miko commented Jul 20, 2020

Hi Opencovid Team

Thanks again for your efforts on gathering all the data.
While looking through the data I observed some strange behaviour in various country districts and municipalities.

Especially in those countries:
Argentina: La Rioja, La Pampa...
there the value of the cases increase at the begining and decrase in the middle again.

Bolivia: La Paz,...
Brazil: Acre,...
Chile: Antofagasta,...
Peru: Ancash
There the values at the beginning are constantly very wrong.

Mexico: Tlaxcala
The values jump a lot at the start of the tracking

Poland: Greater Poland,... 13.6
There the values decrease from 2000 something to 24, and increase the next day again to 2000

Czechia: Prague, 13.7 no more data for death or recovered are available

We use your data for our website to show some statistics and developments. You can have a look at one example here:
https://covid.lanthaler.com/BO/cochabamba/

I hope you keep up your great work.
Thank you

@owahltinez
Copy link
Contributor

owahltinez commented Jul 20, 2020

Thank you for the kind words and for reporting these issues. I can confirm that I see some of the problems that you reported, for example Tlaxcala's numbers:
image

I'm guessing it's some date-parsing error. I'll look into it and get back to you.

@owahltinez
Copy link
Contributor

We narrowed it down to a particularly careless data source, and we now heavily filter their data to only take what looks reasonable. I visually inspected all the examples you provided, and they look fine to me now. Can you verify?

Also, can I add your page to the grid of data users at the top of the page?

@co-miko
Copy link
Author

co-miko commented Jul 22, 2020

I will check them. And of course you can add us to the the grid of data users.

@co-miko
Copy link
Author

co-miko commented Jul 22, 2020

The data for Bolivia, Brazil, Chile looks very good.

There are still some minor data anomalies:
Argentina:

  • Chubut (has 84 total cases on april 14, on april 15 it is reduced to 1),
  • La Pampa (has more total deaths than total infected)
  • La Rioja (same as Chubut)

Peru:

  • Lima (total death is the same as total cases)
  • nearly all provinces show the behaviour of Lima

Mexico:

  • Baja California: Current day shows only a fraction of the previous day (looks like incomplete count for this day)
  • Campeche (same as Baja)
  • Chiapas (same as Baja)
  • Morelos (same as Baja)

@owahltinez
Copy link
Contributor

Thank you for the detailed feedback!

Argentina

We just switched to a new data source via #301 so all of these should be resolved.

Peru

I had made a silly mistake and used the same URL for confirmed and deceased cases... Fix via #302

Mexico

I will double check, but I think this is just the nature of our data source which outputs incomplete data for the latest day. If it's frequent enough (i.e. it's happening for all subregions) I would consider tossing out the latest day but I would strongly prefer not to filter the data since it's coming directly from an authoritative source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants