Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix covid_hosp state_daily #1225

Open
2 tasks
krivard opened this issue Jun 30, 2023 · 1 comment · May be fixed by #1244
Open
2 tasks

Fix covid_hosp state_daily #1225

krivard opened this issue Jun 30, 2023 · 1 comment · May be fixed by #1244
Labels

Comments

@krivard
Copy link
Contributor

krivard commented Jun 30, 2023

covid_hosp state daily has been failing since June 17 with the following error:

Traceback (most recent call last):
  File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/state_daily/update.py", line 42, in <module>
    Utils.launch_if_main(Update.run, __name__)
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/common/utils.py", line 38, in launch_if_main
    entrypoint()
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/state_daily/update.py", line 38, in run
    return Utils.update_dataset(Database, network)
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/common/utils.py", line 220, in update_dataset
    dataset = Utils.merge_by_key_cols([network.fetch_dataset(url, logger=logger) for url, _ in revisions],
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/common/utils.py", line 162, in merge_by_key_cols
    dfs = [df.set_index(key_cols) for df in dfs
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/common/utils.py", line 162, in <listcomp>
    dfs = [df.set_index(key_cols) for df in dfs
  File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pandas/core/frame.py", line 4727, in set_index
    raise KeyError(f"None of {missing} are in the columns")
KeyError: "None of ['reporting_cutoff_start'] are in the columns"

This suggests the file format changed for state daily. Indeed, there's a line on the state-daily healthdata.gov site that says the name of this column is now date:

image

We should:

  • Figure out exactly which date the format change was made (the screenshot above claims June 26 but the traceback above occurred on June 17)
  • Update the code to use date for files posted on or after that date, and reporting_cutoff_start for files posted before that date
    • or maybe check to see if date is present and if not use reporting_cutoff_start instead?
@krivard krivard added the bug label Jun 30, 2023
@dshemetov dshemetov linked a pull request Jul 25, 2023 that will close this issue
5 tasks
@melange396
Copy link
Collaborator

the last file to use reporting_cutoff_start is 6xf2-c3ie_2023-06-16T01-05-09.csv
the first file to use date is 6xf2-c3ie_2023-06-16T12-07-16.csv

both were published on the same day. in fact, there are 2 files with each version of the column names, with all 4 files date stamped 16 June.

the typo on the healthdata.gov site is that it lists "26" which should be "16"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants