Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hhs cadence&latency documentation is imprecise, outdated; hhs acquisition might be late #1889

Open
brookslogan opened this issue Aug 12, 2023 · 2 comments
Assignees
Labels
data quality Missing data, weird data, broken data

Comments

@brookslogan
Copy link

Actual Behavior:

Documentation states

HHS issues updates to this timeseries once a week, and occasionally more often. We check for updates daily. Lag varies from 0 to 6 days.

There was a reporting cadence change around the beginning of July 2023 that impacts this statement, and I don't think this precisely describes either the pre- or post- cadence-change state.

Prior to cadence change:

  • Lag 0 should be impossible. Since we are not reporting previous_day_* measurements but instead shifting them to the relevant days, we won't have a measurement about admissions/etc. that happen today, reported today.
  • In practice, updates seemed to occur more than weekly very regularly, and lag was 2 days most places&versions, with extremes of 1 day and 16 days.

Post cadence change:

The expected cadence appears to be sort of weekly, technically twice a week:

  • Preliminary data pertaining to Sat--Fri is published the following Fri. (Upstream, this appears as Sun--Sat previous_day_* data.)
  • Revisions to these measurements are published on Monday, but initial measurements for later days aren't added.

The encountered lag in our hhs endpoint is 8 to 14 days, plus some versions that were staler due to transient issues (up to 18 days of lag, or 26 for American Samoa). That's 1 day higher than expected.

Expected behavior

  • Documentation should note the cadence change, and somewhat accurately describe the current cadence, and maybe the prior cadence as well.
  • The typical lag range encountered should be 7 to 13, not 8 to 14. (I think I might have seen this already in an Issue or Slack thread about the acquisition being performed at the end of the day, but I can't seem to relocate it. It might have referenced covid_hosp rather than hhs.)

Context

I spotted this while reading the documentation for other purposes. However, I do have some plots of lag by location & version generated by this Rmd for the influenza admissions signal. Swapping for the covid signal seemed to give similar/identical results regarding lag.

@brookslogan brookslogan added the data quality Missing data, weird data, broken data label Aug 12, 2023
@brookslogan
Copy link
Author

The time of day at which reporting occurs may also have shifted. So if things are shifted back to the day they are published, we can't use the originally planned timing. See here.

@brookslogan
Copy link
Author

brookslogan commented Oct 13, 2023

Update: upstream reporting cadence has changed to Wed + Fri. The acquisition pipeline has been updated to ensure we will get Wed data into the API on Wed moving forward, but Fri updates to hhs data source may still be recorded on Sat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data quality Missing data, weird data, broken data
Projects
None yet
Development

No branches or pull requests

2 participants