Allow Unknown DNC in Community Level #1367
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Situation
The state of Iowa has decided as of 1 April to no longer provide postive case reports at the state or county level (see Iowa Department of Health and Human Services Press Release). Florida has suspended reporting "due to a technical issue" and the odds don't look good that they will restart.
Complication
Divergent from CDC
We deliberately had a different implementation than the CDC, which more prominently flagged places where we believed the case data should be treated with caution.
The current implementation from libs/metricslcommunity_levels.py#L20:
Implementation Artifact in our Filtering
Our current implementation of forward filling heuristics + don't-grade-stale-data creates unexpected behaviour to the end user.
Specifically, we usually ingest cumulative data and then calculate the diffs to create a daily new timeseries. In the case where the cumulative number is the same day to day, it can be difficult to tell the difference between "the county did not report today" and "the county reported affirmatively that there were no new cases". On a scraper by scraper basis there may be some meta-text or context to discern between the two, but we currently don't have a full solution for this.
Our current solution has two heuristics that are temporally staggered (by somewhat chance) by one day.
[15,0,0,0,0,0,0]
returns15
as the "latest" value.[15,0,...,0,0,]
returns0
once the gap is long enough.The complication is that x and y are both ~14 days, but one of them is calculated before applying a np.diff which shifts it by one day. So for the first ~13 days (I'm being deliberately vague on defining a day here) we return "last non-zero value in series" and then on one day the "if y days stale mask everything" returns Unknown, and then the following day the case timeseries gets stale enough that we let the zeros forward fill and it returns a zero.
The exact number of zeros to be confirmed.
[15,0,0,0,0,0,0,0,0,0,0,0,0,0]
pipes out[15] 13 days ago
which passes the stale test which then counts as15
for risk level calculation[15,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
pipes out[15] 14 days ago
which fails the stale test which then returnsNone
for risk level calculation[15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
pipes out[0] 1 day ago
which passes the stale test which then returns0
for risk level calculationThis causes the map and the location page to transiently filter to gray and then green.
Resolutions
Summary