Skip to content
This repository has been archived by the owner on Dec 17, 2021. It is now read-only.

Adding total_vaccinations and population field at a national level #27

Open
sanyam-git opened this issue Feb 12, 2021 · 4 comments
Open
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@sanyam-git
Copy link
Contributor

Currently the country-wise latest and all API have the following structure :

{
    "country": "India",
    "country_iso": "IN",
    "last_update": "2021-02-10",
    "source_url": "https://india-covid19vaccine.github.io",
    "data": [
    ]
}

Two more fields can be added : total_vaccinations and population as such:

  • total_vaccination :

    • One can loop over the data for all the regions of a country and get cumulative, but I think it will be better if it can be provided pre-calculated.
    • Another reason for this is some countries (I'm only aware of India in this case currently, but it is fairly possible that it maybe the case somewhere else also), are adding some vaccinations under the heading of Miscellaneous, so this can't be accounted to any region and should be reflected in the national total.
  • population :
    will be helpful in normalizing data as per capita. (I'm not sure about what source should be used here, maybe https://www.worldometers.info/world-population/)

The updates structure as :

{
    "country": "India",
    "country_iso": "IN",
    "total_vaccinations":7017114,
    "population":1371360350,
    "last_update": "2021-02-10",
    "source_url": "https://india-covid19vaccine.github.io",
    "data": [
    ]
}
@lucasrodes
Copy link
Member

lucasrodes commented Feb 12, 2021

Hi @sanyam-git,
Thanks for your proposal! It could be a nice-to-have feature.

The reason for not adding these fields so far was because the https://github.com/owid/covid-19-data project already does. But still, we could give it a try so we can have this info all in one API.

Your points regarding how-to obtain the aggregated national values are quite relevant, as simply iterating over the available regional JSON files would not work. Some countries add "Misc", "Others" fields, which are removed in the process of generating the API.

Data update process

To give you an overview, the data update is performed with the script update_all, which sequentially executes the following steps:

  1. Update country regional data. For each country do:
    1.1. Scrape each country's source link and get the raw data.
    1.2. Process the raw data (change column names, standardize region names & ISO codes, etc.)
    1.3. Export the processed data as a CSV file to data/countries directory.
  2. Merge all country generated CSV files into a single vaccinations.csv file.
  3. Add population-related metrics to vaccinations.csv file (e.g. total_vaccinations_per_100, etc.).
  4. Generate API files using each country's CSV file
  5. Update documentation with changes (e.g. update README.md)

Note that in step 1.2 all special regions like "Misc", "Others" are discarded. Hence, recovering these at step 4. would be quite complex at the moment.

Some ideas:

API proposals

Proposal 1 (yours)

{
    "country": "India",
    "country_iso": "IN",
    "total_vaccinations":7017114,
    "population":1371360350,
    "last_update": "2021-02-10",
    "source_url": "https://india-covid19vaccine.github.io",
    "data": [
    ]
}

Proposal 2

Having total_vaccinations_per_100 instead population.

{
    "country": "India",
    "country_iso": "IN",
    "total_vaccinations":7017114,
    "total_vaccinations_per_100":0.5117,
    "last_update": "2021-02-10",
    "source_url": "https://india-covid19vaccine.github.io",
    "data": [
    ]
}

Proposal 3

{
    "country": "India",
    "country_iso": "IN",
    "total_vaccinations":7017114,
    "total_vaccinations_per_100":0.5117,
    "population":1371360350,
    "last_update": "2021-02-10",
    "source_url": "https://india-covid19vaccine.github.io",
    "data": [
    ]
}

I would probably go for proposal 2 and leave the population field out. My reasoning is that:

  • Having all three fields would be redundant.
  • total_vaccinations_per_100 would probably be more interesting than population in the context of covid19 vaccinations.

Please let me know what you think! 😄

@lucasrodes lucasrodes added the enhancement New feature or request label Feb 12, 2021
@lucasrodes
Copy link
Member

I'll be adding total_vaccinations_per_100 to individual region JSON files.

@sanyam-git
Copy link
Contributor Author

@lucasrodes, Thanks for giving a detailed info of the inner working of project. Here's my take :

  • others/misc data : I agree with you, it seems to be cumbersome to retrieve this data after the initial two steps. The other option suggested by you seems to more feasible to me currently, that retrieve national total from some some other reliable source (like https://github.com/owid/covid-19-data).

    Another thing you can do for adding the misc data is, don't know if its the best way or not. But you can calculate the other/misc vaccination numbers by taking the difference of national total (from some other source) and total of region-wise cases.

  • Population : I saw that you have already added the per capita field to all individual regions, that's really looking great !

I think that adding the total_vaccinations and total_vaccinations_per_100 at national level also will be quite helpful, what do you think ? (as mentioned above by you the data is available at owid, but it will be better if one can get it all at one place)

Keep the good work :) 👍

@lucasrodes
Copy link
Member

Hi @sanyam-git,
Yes, just added per 100-capita metrics recently to region files. I Will think about how to add such info at the national level, shouldn't be difficult. I Will get back to this thread once I get to something more concrete.

Thanks for your contribution 😄 !

@lucasrodes lucasrodes added the help wanted Extra attention is needed label Feb 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants