Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate the covid-19-uk-data repo #68

Open
tomwhite opened this issue Jul 2, 2020 · 9 comments
Open

Deprecate the covid-19-uk-data repo #68

tomwhite opened this issue Jul 2, 2020 · 9 comments

Comments

@tomwhite
Copy link
Owner

tomwhite commented Jul 2, 2020

I would like to deprecate this repo and encourage consumers to move to official upstream data sources. I'd like to stop updates in a month's time (1 August 2020).

When I started curating UK COVID-19 data in early March, numbers for people tested, confirmed cases, and deaths were only available on web pages, and did not provide a historical timeseries. That has now changed, with all the UK health agencies (except Northern Ireland, see below) providing machine-readable historical datasets. In fact, most of the datasets are now much richer than the data provided in this repository, including data such as number of hospitalizations and calls to helplines. For that reason, people who are working with COVID-19 data will typically be using the upstream sources anyway, to access this richer data.

As a case in point, the debate over Pillar 2 data has meant that the confirmed case numbers of England have become potentially misleading, so I have stopped providing them from this repository (#67). The data is still available from https://coronavirus.data.gov.uk/downloads/csv/coronavirus-cases_latest.csv, and in the last few days PHE have published week-level case numbers for England that contain Pillar 2 data (see the spreadsheet on this page: https://www.gov.uk/government/publications/national-covid-19-surveillance-reports). The hope is that they will publish this information at daily granularity, but until they do this illustrates the fact that working with COVID data is messy and necessarily involves working with multiple sources of data, even with efforts like this one.

The lack of machine-readable data for Northern Ireland is another unfortunate reality, and while I have been able to work around this problem in the past by using an undocumented backend API to get the case numbers for LGDs, this stopped working recently in such a way that it started reporting incorrect data. I feel it is wrong to rely on this undocumented API, given how it can silently break, and that people who want machine-readable data should make the case to the NI Department for Health (I was not successful in my request to them, see #63).

The data sources that this repo relies on are documented here: https://github.com/tomwhite/covid-19-uk-data#data-sources. Most consumers of the data should be able to move to these sources fairly easily. Most of them are in CSV or JSON format, at known locations, and with stable formats. There may be some challenges though - URLs that change every day, or parsing XLSX (for Wales) on some platforms - spring to mind, but these are the kind of things that I hope can be fixed by the community or the official providers.

@nickcotter
Copy link

Many thanks to you and everyone else who contributed to this repo.

@gfaggio
Copy link

gfaggio commented Jul 3, 2020

Many thanks for all the help! Much appreciated.
Without this repo, I would have been very hard for me to understand covid-19 data.
Best,
Giulia

@robchallen
Copy link

Hi Tom.

It makes sense, although sad to see it stop as it's been an island of sanity in the lunacy of our 4 nations approaches to reporting data streams.

One thing this repo offers (which the "official" sources don't) is the commit history of the time series. This will be useful in investigating issues in delays in reporting and recreating the data set as it was at particular points in time. For example, I think that delays reporting cases in the early days of the outbreak may have significantly affected the interpretation of the situation, and hence decisions around timing of the lockdown.

Obviously the main use case for the evolution of the historical time series is the early stage, which wouldn't be impacted by winding this up now, but my point is that the official sources do not provide the commit history in the same way and this makes your repository unique in the UK. We may find that the historical data around local outbreaks are similarly interesting in the future.

It's your call, and it will continue to be a useful resource either way.

Cheers,
Rob.

@tomwhite
Copy link
Owner Author

Hi Rob,

Thanks for your comments. I agree that having a history of changes so people can look back and see how things were reported at the time is valuable. As you said it's especially interesting at the beginning of the pandemic.

I thought about this as a reason for continuing, but the change history is now being published for England, and for Scotland (on GitHub!) at least. Wales publishes a new spreadsheet every day, which may have revised historical figures in it (so doesn't retain the change history), and NI doesn't publish its data in machine readable form.

I think it would be fairly easy for someone to write a GH action (or similar) that downloads and archives the Wales data every day. It could also translate it into a set of CSVs to make it easier to consume.

Cheers,
Tom

@Jcamain
Copy link

Jcamain commented Jul 21, 2020

Thanks so much for all your help and assistance, the ever changing goal posts in the ways in which the different countries chose to deal with their data, make it available, change it every five mins, was a nightmare and your repository has been a god send!

@gbugmann
Copy link

Hello Tom,
thanks for the good work. I gave me the feeling I knew what covid 19 was doing. Good for my mental health.
I looked at the new official data https://coronavirus.data.gov.uk/downloads/csv/coronavirus-cases_latest.csv but thy only cover England...
I attach my latest visualization.
Good luck.

Guido
Case_history.zip

@tomwhite
Copy link
Owner Author

Thanks Guido. BTW you can get data for the other nations (except NI) at the links listed here: https://github.com/tomwhite/covid-19-uk-data#data-sources

@geeogi
Copy link

geeogi commented Jul 30, 2020

Thanks for all your work Tom. Your data enabled us to build our application https://covidlive.co.uk. We'll be maintaining a limited fork of this repo at https://github.com/geeogi/covid-19-uk-data while we migrate to a new service.

@Amol-Soneji
Copy link

Hello Tom,

I think it is still possible to keep this project code relevant by slightly changing its purpose. Instead of just dealing with UK, if this project deals with global statistics, this project may still be useful. There are many countries that still do not provide easy machine readable data yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants