Skip to content
This repository has been archived by the owner on Dec 17, 2021. It is now read-only.

Not adding dates with zero vaccinations #33

Open
sanyam-git opened this issue Feb 17, 2021 · 3 comments
Open

Not adding dates with zero vaccinations #33

sanyam-git opened this issue Feb 17, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@sanyam-git
Copy link
Contributor

I have observed from comparing India.csv and state_timeline.csv that the script is leaving the date with zero vaccinations (I've checked and its seems it is the case for all other countries also).
For example : On 20th January 202, the union territory of AN in India had zero vaccination does administered so that date is not present in India.csv.

I'm relatively new in this stuff, so please don't mind if I'm wrong here. Will not this create issues when using the API to directly plot any visualizations or using the data for analysis directly ?

@lucasrodes
Copy link
Member

The decision behind this was to only add entries whenever there are new values. In particular, this call to keep_min_date is the responsible:

def keep_min_date(df):
df = df.copy()
cols = df.columns
# Remove NaNs
count_cols = [col for col in COLUMNS_INT if col in cols]
df.loc[:, count_cols] = df.loc[:, count_cols].fillna(-1).astype(int)
# Goup by
df = df.groupby(
by=[col for col in df.columns if col != "date"]
).min().reset_index()
# Bring NaNs back
df.loc[:, count_cols] = df.loc[:, count_cols].astype("Int64").replace({-1: pd.NA})
return df.loc[:, cols]

In the CSV files, I think this behavior makes sense. However, in the API files, I agree that this may cause some issues.

To this end, I'd say we could modify the update_api_v1.py script to fill these gaps, potentially adding a new field like total_vaccinations_daily to remark that there were 0 vaccinations that day and data was copied from the prior day.

Let me know what you think and thanks for your feedback

@sanyam-git
Copy link
Contributor Author

I think it will be better to account for the zero_vaccination dates both in JSON and CSV. (specially in JSON as you mentioned) As some people prefer to use CSV over JSON and it is good to keep both in similar structure.

Regarding adding total_vaccinations_daily, yeah I think it can be helpful. (can be kept in the enhancement list)
Thanks for the reply :)

@lucasrodes lucasrodes added the enhancement New feature or request label Feb 17, 2021
@lucasrodes
Copy link
Member

Thanks for your comment. Some notes:

  • Some countries do not provide information on some days, question here is, should we assume that the number of vaccinations was zero? My opinion is that this should be treated as missing data. If these days were to be added, I'd suggest adding a flag stating that his entry was recovered from previous entries.
  • However, if source data specifically states that there were zero vaccinations, probably these should be added, as this wouldn't count as missing data. To ensure this is reliable, some simple exploration in source data should be done.

What do you think? @sanyam-git

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants