Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NCHS mortality geo aggregation at the HHS and nation levels #1243

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

alexcoda
Copy link
Contributor

@alexcoda alexcoda commented Sep 14, 2021

Description

Add NCHS mortality geo aggregation at the HHS and nation levels.

Note that because one of the columns being aggregated is a percentage, we're doing a weighted sum with the geomapper

Also there are some pylint disables for too many branches, this is because of the slight difference in logic for handling the percentage vs. count variables. This could be refactored later for the entire run.py function, but is not in scope for this change.

Fixes

@alexcoda
Copy link
Contributor Author

@krivard - ready for review!

@krivard krivard self-requested a review September 14, 2021 13:55
Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a bonus, resolving the code-duplication questions may get the linter to hush.
I could be convinced otherwise though if I've misread what's going on, wow that is a monster.

@@ -18,7 +19,7 @@
from .pull import pull_nchs_mortality_data


def run_module(params: Dict[str, Any]):
def run_module(params: Dict[str, Any]): # pylint: disable=too-many-branches, too-many-statements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this better than splitting it up into more-specific functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not, just pushed some changes to pull out some helper methods!

Comment on lines 97 to 99
df = df_pull.copy()
df["se"] = np.nan
df["sample_size"] = np.nan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be pulled out of the for loop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it can probably be pulled out of the if block as well

Comment on lines 101 to 105
if geo in ["hhs", "nation"]:
df = gmpr.replace_geocode(
df, "state_id", "state_code", from_col="geo_id", date_col="timestamp")
df = gmpr.replace_geocode(
df, "state_code", geo, date_col="timestamp").rename(columns={geo: "geo_id"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reduce some of the duplication with percent_of_expected_deaths here at all?

@@ -4,6 +4,7 @@
disable=logging-format-interpolation,
too-many-locals,
too-many-arguments,
fixme,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this for a TODO I left in the code. Let me know if this is something we'd want to keep consistent across all indicators and I can add it to others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make NCHS data available at HHS, nation level
3 participants