hhs cadence&latency documentation is imprecise, outdated; hhs acquisition might be late #1889

brookslogan · 2023-08-12T00:31:54Z

Actual Behavior:

HHS issues updates to this timeseries once a week, and occasionally more often. We check for updates daily. Lag varies from 0 to 6 days.

There was a reporting cadence change around the beginning of July 2023 that impacts this statement, and I don't think this precisely describes either the pre- or post- cadence-change state.

Prior to cadence change:

Lag 0 should be impossible. Since we are not reporting previous_day_* measurements but instead shifting them to the relevant days, we won't have a measurement about admissions/etc. that happen today, reported today.
In practice, updates seemed to occur more than weekly very regularly, and lag was 2 days most places&versions, with extremes of 1 day and 16 days.

Post cadence change:

The expected cadence appears to be sort of weekly, technically twice a week:

Preliminary data pertaining to Sat--Fri is published the following Fri. (Upstream, this appears as Sun--Sat previous_day_* data.)
Revisions to these measurements are published on Monday, but initial measurements for later days aren't added.

The encountered lag in our hhs endpoint is 8 to 14 days, plus some versions that were staler due to transient issues (up to 18 days of lag, or 26 for American Samoa). That's 1 day higher than expected.

Expected behavior

Documentation should note the cadence change, and somewhat accurately describe the current cadence, and maybe the prior cadence as well.
The typical lag range encountered should be 7 to 13, not 8 to 14. (I think I might have seen this already in an Issue or Slack thread about the acquisition being performed at the end of the day, but I can't seem to relocate it. It might have referenced covid_hosp rather than hhs.)

Context

I spotted this while reading the documentation for other purposes. However, I do have some plots of lag by location & version generated by this Rmd for the influenza admissions signal. Swapping for the covid signal seemed to give similar/identical results regarding lag.

The text was updated successfully, but these errors were encountered:

brookslogan · 2023-08-26T20:05:28Z

The time of day at which reporting occurs may also have shifted. So if things are shifted back to the day they are published, we can't use the originally planned timing. See here.

brookslogan · 2023-10-13T23:53:19Z

Update: upstream reporting cadence has changed to Wed + Fri. The acquisition pipeline has been updated to ensure we will get Wed data into the API on Wed moving forward, but Fri updates to hhs data source may still be recorded on Sat.

brookslogan added the data quality Missing data, weird data, broken data label Aug 12, 2023

brookslogan assigned nolangormley Aug 12, 2023

brookslogan mentioned this issue Oct 14, 2023

hhs latest data and version history have mismatches with upstream timeseries and archive data sets #1903

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hhs cadence&latency documentation is imprecise, outdated; hhs acquisition might be late #1889

hhs cadence&latency documentation is imprecise, outdated; hhs acquisition might be late #1889

brookslogan commented Aug 12, 2023

brookslogan commented Aug 26, 2023

brookslogan commented Oct 13, 2023 •

edited

hhs cadence&latency documentation is imprecise, outdated; hhs acquisition might be late #1889

hhs cadence&latency documentation is imprecise, outdated; hhs acquisition might be late #1889

Comments

brookslogan commented Aug 12, 2023

brookslogan commented Aug 26, 2023

brookslogan commented Oct 13, 2023 • edited

brookslogan commented Oct 13, 2023 •

edited