Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make NCHS data available at HHS, nation level #1041

Open
capnrefsmmat opened this issue May 5, 2021 · 4 comments · May be fixed by #1213, #1243 or #1258
Open

Make NCHS data available at HHS, nation level #1041

capnrefsmmat opened this issue May 5, 2021 · 4 comments · May be fixed by #1213, #1243 or #1258
Labels
good first issue Priority-P1 Hope to do; lab will survive without it

Comments

@capnrefsmmat
Copy link
Contributor

The NCHS mortality data is currently only available at the state level. It seems like it should be possible to aggregate it to the nation and HHS levels. (If it's not possible for some reason, we should document that so nobody tries to use state and aggregate themselves.) Having all our signals consistently available at HHS and nation when possible would make it easy to compare things.

There's no pressing need for this that I know of; I just noticed the inconsistency and think it'd be nice to fix.

@zhuoran-Cheng16 zhuoran-Cheng16 linked a pull request Aug 20, 2021 that will close this issue
@krivard krivard added the Priority-P1 Hope to do; lab will survive without it label Aug 24, 2021
@krivard
Copy link
Contributor

krivard commented Sep 8, 2021

Hey @alexcoda want to take a look at this? Cheryl has a good start in #1213 but it's missing the critical plumbing to actually do the geographic aggregations (it currently just outputs extra copies of the state df under different names).

I've attached a csv file which is the result of recently running pull.pull_nchs_mortality_data with our Socrata key for you to use for testing.

socrata_df.csv

@alexcoda
Copy link
Contributor

@krivard yep! I'll take a crack at it sometime this weekend. I'll let you know if I have any more questions about it

@chinandrew
Copy link
Contributor

Looking at the csv columns, there's a mix of counts and percent values (from the source), so presumably we'll have to do something like the following to do a weighted average of the percentages and a nonweighted sum for the counts? And NCHS should cover all states that are included in HHS regions, so we don't need to worry about weird denominator handling right?

df"weight"] = df["population"]
proportion_vals = gmpr.replace_geocode(df, "state_id", new_geo, ... , date_col="timestamp", data_cols=[<all columns that are a percentage>]
# weight column gets removed
count_vals = gmpr.replace_geocode(df, "state_id", new_geo,..., date_col="timestamp", data_cols=[<all columns that are a counts>]
# combine the two dataframes back

We ended up going down a huge rabbit hole after realizing geomapper doesnt do state_id to fips, only state_code, but will continue to finish this next week.

@krivard
Copy link
Contributor

krivard commented Sep 13, 2021

presumably we'll have to do something like the following to do

that looks right, yes.

NCHS should cover all states that are included in HHS regions

That's correct; here's the NCHS coverage map for reference

(and the like five different ways of specifying states is indeed a pain)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment