Allow Unknown DNC in Community Level #1367

BrettBoval · 2023-04-14T20:40:04Z

Situation

The state of Iowa has decided as of 1 April to no longer provide postive case reports at the state or county level (see Iowa Department of Health and Human Services Press Release). Florida has suspended reporting "due to a technical issue" and the odds don't look good that they will restart.

Complication

Divergent from CDC

We deliberately had a different implementation than the CDC, which more prominently flagged places where we believed the case data should be treated with caution.

The current implementation from libs/metricslcommunity_levels.py#L20:

   # TODO(michael): The CDC footnotes say:
    #     If the number of cases in 7 days for a jurisdiction is missing, the
    #     7-day case rate is assigned to the “low” category. If both 7-day
    #     admissions and 7-day percentage inpatient beds indicators are N/A, the
    #     community burden category is assigned N/A.
    #
    # For now I'm allowing 1 hospital metric to be missing, but not allowing
    # cases to be missing since that is rare and usually indicates we're
    # blocking data or something. In that case, I'd rather have no community
    # level calculated. But we can revisit if it ends up being a problem.

Implementation Artifact in our Filtering

Our current implementation of forward filling heuristics + don't-grade-stale-data creates unexpected behaviour to the end user.

Specifically, we usually ingest cumulative data and then calculate the diffs to create a daily new timeseries. In the case where the cumulative number is the same day to day, it can be difficult to tell the difference between "the county did not report today" and "the county reported affirmatively that there were no new cases". On a scraper by scraper basis there may be some meta-text or context to discern between the two, but we currently don't have a full solution for this.

Our current solution has two heuristics that are temporally staggered (by somewhat chance) by one day.

We treat trailing zeros (a.k.a. all the most recent data is zeros) with suspicion. For the first x days of consecutive zeros, we use the last non-zero value as the latest value for above-the-fold metrics. so [15,0,0,0,0,0,0] returns 15 as the "latest" value.
Eventually, after x days of consecutive zeros, we give up waiting, and let all those zeros propagate turning Daily New Cases (DNC) to zero. So [15,0,...,0,0,] returns 0 once the gap is long enough.
Separately, we have a different process that looks at all our datastreams, and masks (likely too aggressively) any timeseries where the day is more than y days stale. This turns the metric to Unknown, which bubbles up and will turn a metric gray.

The complication is that x and y are both ~14 days, but one of them is calculated before applying a np.diff which shifts it by one day. So for the first ~13 days (I'm being deliberately vague on defining a day here) we return "last non-zero value in series" and then on one day the "if y days stale mask everything" returns Unknown, and then the following day the case timeseries gets stale enough that we let the zeros forward fill and it returns a zero.

The exact number of zeros to be confirmed.

t-1 day:[15,0,0,0,0,0,0,0,0,0,0,0,0,0] pipes out [15] 13 days ago which passes the stale test which then counts as 15 for risk level calculation
t-0 day:[15,0,0,0,0,0,0,0,0,0,0,0,0,0,0] pipes out [15] 14 days ago which fails the stale test which then returns None for risk level calculation
t+1 day:[15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] pipes out [0] 1 day ago which passes the stale test which then returns 0 for risk level calculation

This causes the map and the location page to transiently filter to gray and then green.

Resolutions

Summary

Update community risk level logic to handle unknown DNC as 0 (the lesser of two evils)
Add the appropriate disclaimers to Iowa and Florida in our normal disclaimer code.
Tweak the "turn this unknown after x days to something longer than the DNC ffill code.
Explore special casing the DNC where we freeze it as Unknown starting from a specific day which overrides the "turn it unknown and hide it".

	Current	Proposed	Ideal
US Map Color	Grey/Green	Green	Green w/ Flag
Community Risk Level	Grey/Green	Green	Green w/ Flag
DNC Latest Value	None/0	None/0	None
Timeseries	Masked/ffilled	Masked/ffilled	Frozen @ Last Reporting

This test was passing incorrectly because the assert was wrong. Now it should be failing, which represents the broken state of this PR.

Adds test for default to 0 for stale DNC after 14 days. We've had some off-by-1 uncertainty in what "14 days lookback before blocking looks" really means. Here's an updated test, that currently fails, that captures the expected behavior. I propose that we either change the code to handle this correctly, or update our docs to say 15 whenever they used to say 14.

smcclure17 · 2023-04-28T18:35:10Z

Action item from this PR #1368

BrettBoval and others added 8 commits April 14, 2023 14:36

Allow Unknown DNC in Community Level

988165c

Updating saved datasets at Fri Apr 14 21:32:45 UTC 2023

3768283

Pushing Broken Test

b19a1cf

This test was passing incorrectly because the assert was wrong. Now it should be failing, which represents the broken state of this PR.

merge main into accept-unknown-DNC-for-community-risk

5cf7445

add null checking for metrics in community level, remove forward filling

91c4018

Merge branch 'main' into accept-unknown-DNC-for-community-risk

5e43b9e

fix off by one error

8cc41bf

BrettBoval merged commit df963dd into main Apr 28, 2023
5 checks passed

BrettBoval deleted the accept-unknown-DNC-for-community-risk branch April 28, 2023 18:15

smcclure17 mentioned this pull request Apr 28, 2023

Audit metric threshold/cutoff date behavior #1368

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Unknown DNC in Community Level #1367

Allow Unknown DNC in Community Level #1367

BrettBoval commented Apr 14, 2023

smcclure17 commented Apr 28, 2023

Allow Unknown DNC in Community Level #1367

Allow Unknown DNC in Community Level #1367

Conversation

BrettBoval commented Apr 14, 2023

Situation

Complication

Divergent from CDC

Implementation Artifact in our Filtering

Resolutions

Summary

smcclure17 commented Apr 28, 2023