Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indicator runners should output files with issue date #1907

Open
melange396 opened this issue Nov 1, 2023 · 1 comment
Open

Indicator runners should output files with issue date #1907

melange396 opened this issue Nov 1, 2023 · 1 comment

Comments

@melange396
Copy link
Contributor

Indicator runners should output CSV files with issue date, wherever possible.

Most indicators (if not all of the currently active indicators) output CSV files without an "issue date" saved/encoded anywhere in or around them... They assume the issue date is "today", and that the files will be ingested into the database the same day (our acquisition process also assumes an issue date of "today" (by default) upon reading these files). This can lead to inaccurate "issue" columns when the data finally makes it to the database, if the acquisition job(s) are broken, backed up by a long queue, or otherwise delayed.

If we export with an explicit issue date, it does not matter when the files are consumed, the "issue" should still be accurate. In fact, this can make it so re-importing the same CSV files multiple times is an idempotent operation. It will help us when there are problems with our systems in real-time (as listed above), plus it will simplify things if we need to import CSV files at some later date (such as adding new data files on top of a restored database snapshot). This can also be useful when the external data source specifies an issue date explicitly.

There is a provision in our acquisition process to use an issue date that is taken from the directory structure. The "nowcast" indicator seems to be able to produce this directory structure, but AFAICT this indicator is not being run successfully anywhere at present.

@melange396
Copy link
Contributor Author

see my novelesque slack message about applying this to the hhs indicator, whose source data includes an issue date that can be carried through to the output files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant