Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[commands] add replay GitHub csv commits option #210

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

space-buzzer
Copy link
Collaborator

@space-buzzer space-buzzer commented Apr 14, 2021

This includes a command to "replay" the API CSVs stored in GitHub as data updates, such that the changes will be stored in the DB as batches.

This command gets an input file with list of commits, and goes over the commits, in order, fetches the raw CSV content from GH (not through GH API, because then it hits a rate limit pretty fast), and sends the commit as a new published batch.

There are a few things that happen locally (in the command), to reduce the size of updates, and calculate the message for the batch, changed fields, etc.
The heuristics are:

  • Updates that changed 56 rows for 56 different states, on a single day => daily
  • Everything else => edit

Some heuristics about date/time/date formatting
Some commits with bad data are completely skipped
The process to find the diff (between 2 consecutive commits) and submit only the rows that changed (or added) is done locally in the command, it's faster this way.

I changed some logging -- this is minor.
I commented out the requirement to submit states as part of the batch, because I didnt do the cross-ref of history of states_info to the commits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant