Indicator runners should output files with issue date #1907

melange396 · 2023-11-01T23:16:11Z

Indicator runners should output CSV files with issue date, wherever possible.

Most indicators (if not all of the currently active indicators) output CSV files without an "issue date" saved/encoded anywhere in or around them... They assume the issue date is "today", and that the files will be ingested into the database the same day (our acquisition process also assumes an issue date of "today" (by default) upon reading these files). This can lead to inaccurate "issue" columns when the data finally makes it to the database, if the acquisition job(s) are broken, backed up by a long queue, or otherwise delayed.

If we export with an explicit issue date, it does not matter when the files are consumed, the "issue" should still be accurate. In fact, this can make it so re-importing the same CSV files multiple times is an idempotent operation. It will help us when there are problems with our systems in real-time (as listed above), plus it will simplify things if we need to import CSV files at some later date (such as adding new data files on top of a restored database snapshot). This can also be useful when the external data source specifies an issue date explicitly.

There is a provision in our acquisition process to use an issue date that is taken from the directory structure. The "nowcast" indicator seems to be able to produce this directory structure, but AFAICT this indicator is not being run successfully anywhere at present.

melange396 · 2023-11-03T20:15:55Z

see my novelesque slack message about applying this to the hhs indicator, whose source data includes an issue date that can be carried through to the output files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indicator runners should output files with issue date #1907

Indicator runners should output files with issue date #1907

melange396 commented Nov 1, 2023

melange396 commented Nov 3, 2023

Indicator runners should output files with issue date #1907

Indicator runners should output files with issue date #1907

Comments

melange396 commented Nov 1, 2023

melange396 commented Nov 3, 2023