Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Totals mismatch between UK and countries breakdown #52

Open
boogheta opened this issue May 4, 2020 · 9 comments
Open

Totals mismatch between UK and countries breakdown #52

boogheta opened this issue May 4, 2020 · 9 comments

Comments

@boogheta
Copy link

boogheta commented May 4, 2020

Hello,

I've been reusing your great work on UK data within my dashboard here: https://boogheta.github.io/coronavirus-countries/#country=UK

While adding Tests data since you completed it across all countries, I encountered something that looks to me like an error but I might read things wrong:
When looking at confirmed cases for whole UK https://github.com/tomwhite/covid-19-uk-data/blob/master/data/covid-19-totals-uk.csv and for just England https://github.com/tomwhite/covid-19-uk-data/blob/master/data/covid-19-totals-england.csv, there appears to be greater values for just England than for the whole UK until April 10th.

I realised it because I'm completing England figures by doing UK - Wales - Scotland - Eire whenever an England figure is missing since others are all complete but maybe I'm misunderstanding something?

@tomwhite
Copy link
Owner

tomwhite commented May 4, 2020

Ah, this is because the number of confirmed cases for England is being updated with historic data from https://coronavirus.data.gov.uk/downloads/csv/coronavirus-cases_latest.csv, whereas there are no revised historic figures for the UK in that feed.

However, the England confirmed cases figures should be complete now, so there shouldn't be a reason to compute (UK - Wales - Scotland -NI).

(I wonder if we should set the UK confirmed cases total to be the sum of the four nations values.)

@boogheta
Copy link
Author

boogheta commented May 4, 2020

OK this is what I guessed: I stopped using the substraction, except for Tests which are not included in the England totals files.
Thanks for the quick return. Feel free to close this issue or leave it open if you want to change a few related things

@gfaggio
Copy link

gfaggio commented May 6, 2020

Hello Tom,

Many thanks for making the UK data on Covid-19 tests, confirmed cases and deaths publicly available.

Following up on the issue raised here above, I have compared the data series (tests, confirmedcases and deaths) reported in the file covid-19-totals-uk with those obtained aggregating corresponding files across the four countries (Eng, Wal, Sct and Nir). I have found the following:

  1. Over the whole time period, the number of tests is always higher in totals-uk. This is understandable since totals-eng does not report the number of tests.
  2. The number of deaths is always higher when aggregating the four countries together. This is probably due to the fact that the series for the four countries have been updated but the ones for UK totals have not (as you explained here above).
  3. What is puzzling me is the following: the number of confirmed cases is higher when aggregating the four countries only up to 18 April 2020. From 19 April 2020, the number reported by totals-uk is higher. Why is this the case?
    Many thanks for your help.
    Giulia

@tomwhite
Copy link
Owner

tomwhite commented May 7, 2020

For 3. UK confirmed cases is not being revised, as I mentioned above, so it doesn't equal the sum of the totals for the four nations. I don't know what happened on 18 April.

@gfaggio
Copy link

gfaggio commented May 7, 2020

Thanks for your reply.

@tomwhite
Copy link
Owner

I looked into this more, and it looks like the UK totals (for confirmed cases) being higher than the sum of the four nations (from April 11 onwards) is due to the "pillar 2" tests, for which no location is reported. I wrote about this more here: http://tom-e-white.com/datavision/20-where-are-the-coronavirus-cases.html

@gfaggio
Copy link

gfaggio commented May 15, 2020

Thanks very much.

@LoryPack
Copy link

I noticed that there is also a mismatch on the 25th April between the total number of tests in the UK reported cumulatively (in the covid-19-totals-uk.csv file) and the "DailyPeopleTested" in the covid-19-tests-uk.csv file. More precisely, the cumulative total of tests increased by around 70000 units on that day, while the value of "DailyPeopleTested" for the same day was around 20000. For the other days the two numbers mostly agree, even if I also found some other small mismatches on the 1st and 7th May.

I believe these increases in the cumulative are due to revising total due to late reporting, as other people mentioned above. I just wanted to report that, in case somebody else will notice that in the future.

immagine

@gfaggio
Copy link

gfaggio commented May 16, 2020

Hi Tom,

I have just seen the great job you have done in creating the file covid-19-tests-uk.csv.
Thanks a lot! Really appreciated.

Since I am working on the files you have created right now, I have found two discrepancies between the totals reported in covid-19-tests-uk.csv and covid-19-total-uk.csv. I am pointing this out only because it can help someone else, not for being picky!

When comparing the series 'tests' in covid-19-total-uk.csv with the series 'TotalPeopleTested' in covid-19-tests-uk.csv, all figures matched expect those on April 12. In covid-19-total-uk.csv, it is 95 units lower.

When comparing the series 'confirmed' in covid-19-total-uk.csv with the series 'TotalPositive' in covid-19-tests-uk.csv, all figures matched expect those on April 10. In covid-19-total-uk.csv, it is 3486 units lower. And this might explain the big spike in confirmed cases that we see on April 11 (pointed out before).

I assure you that we are doing interesting things with all the data you have created.
Thanks again,
Giulia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants