Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archive data through the API #3

Open
radka-j opened this issue Aug 4, 2020 · 8 comments
Open

Archive data through the API #3

radka-j opened this issue Aug 4, 2020 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@radka-j
Copy link

radka-j commented Aug 4, 2020

Hello 馃憢

The API documentation says that data previously published in the dashboard (currently on the archive: https://coronavirus.data.gov.uk/archive ) is available for download through the API as well. But I haven't been able to figure out how to do structure the query to get this. Is there an example of what such a query looks like?

For example, I want to know the number of cases in Leicester for the date 2020-06-01 as reported on 2020-06-02, 2020-06-03, .....

Any help with this would be much appreciated, thank you!

@xenatisch
Copy link
Contributor

Hi @radka-j

Thanks for getting in touch.

So what you are looking for is previous states of the data (previous reports). We do have the data in the database and are planning to provide the means to retrieve them via the API in the future. However, there is very little demand for it, and we have some high-priority works in our development pipeline.

We will get it done as soon as possible. Keep an eye out for the feature in the API docs and our SDKs.

@xenatisch xenatisch self-assigned this Aug 4, 2020
@xenatisch xenatisch added the enhancement New feature or request label Aug 4, 2020
@lewissmit
Copy link

Hi @xenatisch,

Just hoping to clarify - could you confirm that the API does not currently support provision of data for previous dates - so as an example the number of positive tests in a geography is only available as at the most recent data release?

If this is the case it causes us some difficulties - the daily positive tests data is frequently retrospectively amended, so a day to day record of positive cases will not provide the full picture. Specifically users will not be able to track the final tally positive tests over time series. To be clear: I'd like to have access to a day-to-day total of positive tests at all geographies, which when combined would be equal to the cumulative total currently available on the front end of your site.

If this is the case please could this be escalated in terms of priority? If I'm misinterpreting somehow apologies in advance.

@geeogi
Copy link

geeogi commented Aug 5, 2020

Hi @xenatisch,

We are also very eager to regain access to the time series data for UTLA positive tests. As it stands the newCasesByPublishDate field from this API returns the latest numbers only which is difficult to interpret for the reasons described by @lewissmit.

We used to have access to this data via this link: https://coronavirus.data.gov.uk/downloads/data/data_latest.json but it doesn't seem to have been updated since the 3rd.

@bhavesh0009
Copy link

Hi @xenatisch,

We are also very eager to regain access to the time series data for UTLA positive tests. As it stands the newCasesByPublishDate field from this API returns the latest numbers only which is difficult to interpret for the reasons described by @lewissmit.

We used to have access to this data via this link: https://coronavirus.data.gov.uk/downloads/data/data_latest.json but it doesn't seem to have been updated since the 3rd.

I am able to find workaround for this issue.

    ltla_filter = ['areaType=ltla']
    cases_and_deaths = {
                        "areaType":"areaType"
                        ,"areaName":"areaName"
                        ,"areaCode":"areaCode"
                        ,"specimenDate":"date"
                        ,"dailyLabConfirmedCases":"newCasesBySpecimenDate"
                        ,"totalLabConfirmedCases":"cumCasesBySpecimenDate"
                        }
    api = Cov19API(filters=ltla_filter, structure=cases_and_deaths)
    data = api.get_json()  # Returns a dictionary                        
    lastUpdate = data['lastUpdate']

above code gives historical information for LTLA. I don't know how but probably due to some changes in the structure.

@geeogi
Copy link

geeogi commented Aug 5, 2020

Hi @xenatisch,
We are also very eager to regain access to the time series data for UTLA positive tests. As it stands the newCasesByPublishDate field from this API returns the latest numbers only which is difficult to interpret for the reasons described by @lewissmit.
We used to have access to this data via this link: https://coronavirus.data.gov.uk/downloads/data/data_latest.json but it doesn't seem to have been updated since the 3rd.

I am able to find workaround for this issue.

    ltla_filter = ['areaType=ltla']
    cases_and_deaths = {
                        "areaType":"areaType"
                        ,"areaName":"areaName"
                        ,"areaCode":"areaCode"
                        ,"specimenDate":"date"
                        ,"dailyLabConfirmedCases":"newCasesBySpecimenDate"
                        ,"totalLabConfirmedCases":"cumCasesBySpecimenDate"
                        }
    api = Cov19API(filters=ltla_filter, structure=cases_and_deaths)
    data = api.get_json()  # Returns a dictionary                        
    lastUpdate = data['lastUpdate']

above code gives historical information for LTLA. I don't know how but probably due to some changes in the structure.

thanks! This works for me using UTLAs too e.g. link. The Python SDK retrieves all the pages which is handy.

@xenatisch
Copy link
Contributor

xenatisch commented Aug 5, 2020

Hi @lewissmit and @geeogi ... sorry for my late response. It's been a long day.

There is a different between Archive data and historical data. We release historical data everyday, but they may contain revised figures. This is because we receive new data everyday + deduplicate the data everyday (sometimes from months back). So "Archive data" provides the data as released on day X, but historical data provides the data for everyday since the beginning - whenever that may be for a specific area. Hope that makes sense? It's explained better in the About the Data page on the website.

The issue you have raised was addressed in #1 (which I have now pinned to the issues page because it seems to be very popular). I see that @bhavesh0009 has kindly shared his solution with you too.

The reason why we have 2 types of data for cases / deaths is because some DAs release the data only by Reporting Date, so for consistency on the website, we also produce England data by Reporting Data, but only for the latest day. We are working with other DAs to get the data by specimen date, in which case, we will have consistent data for everything.

Let me know if you need more info, or are still experiencing any difficulties.

@radka-j
Copy link
Author

radka-j commented Sep 14, 2020

@xenatisch I hope you are well! 馃檪 Is there any update on this (e.g., when might we expect for the archive to be available through the API)? Or do you know if there is another way to access this data?

@theosanderson
Copy link

For anyone stumbling upon this -- this data for cases is (unofficially) available for recent days here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants