Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Historical data using the API #1

Closed
themonk911 opened this issue Jul 31, 2020 · 6 comments
Closed

Historical data using the API #1

themonk911 opened this issue Jul 31, 2020 · 6 comments
Assignees
Labels
good first issue Good for newcomers question Further information is requested

Comments

@themonk911
Copy link

Thanks very much for publishing this API! I couldn't work out from the docs what the best way to get all historical records for a given resource is. Could you provide some guidance?

@xenatisch
Copy link
Contributor

@themonk911 Thank you for getting in touch.

Could you please elaborate on the metrics you were trying to extract? It might be helpful if you share your filters and structure parameters here.

@xenatisch xenatisch self-assigned this Aug 1, 2020
@xenatisch xenatisch added the help wanted Extra attention is needed label Aug 1, 2020
@themonk911
Copy link
Author

Hi,

Yeah I'm trying to get granular data for all ltla and utla, for all recorded data points (I'm part of the https://github.com/GoogleCloudPlatform/covid-19-open-data project). It seems like by default I only get a couple of days data.

My example below is including only one area for brevity.

ltla_only = ['areaType=ltla', 'areaName=Adur']
cases_and_deaths = {
    "date":"date",
    "areaName":"areaName",
    "areaCode":"areaCode",
    "newCasesByPublishDate":"newCasesByPublishDate",
    "cumCasesByPublishDate":"cumCasesByPublishDate",
    "newDeathsByDeathDate":"newDeathsByDeathDate",
    "cumDeathsByDeathDate":"cumDeathsByDeathDate"
}
>>> api = Cov19API(filters=ltla_only, structure=cases_and_deaths)
>>> api.get_json()
{'data': [{'date': '2020-08-01', 'areaName': 'Adur', 'areaCode': 'E07000223', 'newCasesByPublishDate': None, 'cumCasesByPublishDate': 190, 'newDeathsByDeathDate': None, 'cumDeathsByDeathDate': None}, {'date': '2020-07-31', 'areaName': 'Adur', 'areaCode': 'E07000223', 'newCasesByPublishDate': None, 'cumCasesByPublishDate': None, 'newDeathsByDeathDate': None, 'cumDeathsByDeathDate': 28}], 'lastUpdate': '2020-08-01T14:25:26.000000Z', 'length': 2, 'totalPages': 1}

@xenatisch
Copy link
Contributor

xenatisch commented Aug 1, 2020

Hi @themonk911

So, you can get the data for either ltla or utla in one request, not both. The issue here is that not all metrics are available for all area types. For instance, we only have ...CasesByPublishDate available for nation, and ...CasesBySpecimenDate for everything else.

So what you need is as follows (for Adur):

adur_data = [
    'areaType=ltla',
    'areaName=Adur'
]

cases_and_deaths = {
    "date": "date",
    "areaName": "areaName",
    "areaCode": "areaCode",
    "cases": {
        "new": "newCasesBySpecimenDate",
        "total": "newCasesBySpecimenDate",
    },
    "deaths": {
        "new": "newDeathsByDeathDate",
        "total": "cumDeathsByDeathDate"
    }
}

api = Cov19API(filters=adur_data, structure=cases_and_deaths)
data = api.get_json()

print(data)

This would return 136 records.

If you need all the data for ltla (regardless of areaName), then all you need to do is omit the areaName metric from your filters:

all_ltla = [
    'areaType=ltla'
]

cases_and_deaths = {
    "date": "date",
    "areaName": "areaName",
    "areaCode": "areaCode",
    "cases": {
        "new": "newCasesBySpecimenDate",
        "total": "newCasesBySpecimenDate",
    },
    "deaths": {
        "new": "newDeathsByDeathDate",
        "total": "cumDeathsByDeathDate"
    }
}

api = Cov19API(filters=all_ltla, structure=cases_and_deaths)
data = api.get_json()

print(data)

This one would return 50,000+ records (50+ pages), so it might take a while if the data isn't cached.

Feel free to change the structure hierarchy or key names to whatever you need. For instance, it can be something like this:

{
    "date": "date",
    "areaName": "areaName",
    "areaCode": "areaCode",
    "newCasesBySpecimenDate": "newCasesBySpecimenDate",
    "cumCasesBySpecimenDate": "cumCasesBySpecimenDate",
    "newDeathsByDeathDate": "newDeathsByDeathDate",
    "cumDeathsByDeathDate": "cumDeathsByDeathDate"
}

Hope this helps.

Re your involvement with the Open Data project, keep an eye out for our R and JavaScript SDKs. You might find them useful too. They'll be released soon.

@xenatisch xenatisch added good first issue Good for newcomers question Further information is requested and removed help wanted Extra attention is needed labels Aug 1, 2020
@themonk911
Copy link
Author

Thanks, much appreciated :)

@themonk911
Copy link
Author

It might be worth listing the metrics compatible with each areaType in the documentation.

@xenatisch
Copy link
Contributor

xenatisch commented Aug 1, 2020

@themonk911 I would have if it were static, even so we have over 100 metrics for 619 area names in 7 area types.

They tend to change regularly as we add new metrics or change the existing ones. We use some to calculate the others, and those would be useless to end users because some of them may not even be available for constituents of a specific area type. Just to give you an idea, we uploaded over 124k records to the database today - it increases every day - which consisted of over 58 million lines of data. This was produced in a massive pipeline with >20 sources and 100s of data bricks. We QA this whole dataset every single day before we release it.

Rule of thumb: we use the API to populate the website, so your best reference is the website. If a specific metric isn't displayed in there, it's probably because we don't have data for that metric in the area type / name.

Having said that, I'll try to draft something that covers the most important metrics - as soon as I find a bit of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants