Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a maximum limit for how far BQ data is retrieved (event_time) #1574

Open
jonespm opened this issue Apr 23, 2024 · 0 comments
Open

Add a maximum limit for how far BQ data is retrieved (event_time) #1574

jonespm opened this issue Apr 23, 2024 · 0 comments

Comments

@jonespm
Copy link
Member

jonespm commented Apr 23, 2024

Thank you for contributing to this project!

Describe your problem or feature you'd like added

When the cron runs if it has never ran before, it goes all the way back to the earliest start_date in the database. This could potentially result in a huge query if there are old courses in here. We should probably have some limit configured that could be overridden via the cron as most of the time we don't need this old data. I'm not sure how else to avoid it without changes to BigQuery like partitions, that we are unable to do.

In addition, it will process older courses that don't actually have any activity and don't need processed. We should add a "last_accessed_date" field or similar to the course table to keep track if we actually need to update it.

Describe the solution you'd like

Add a date either dynamic or config that limits the earliest date data will be returned. This could be 4 months, 6 months a year. Just something lower than everything. We also need a way to override this on the cron if completely necessary.

Describe any possible alternatives you've considered

We have considered only running on active courses, but in testing and on the first run it might not be known what's active or not. Once. this has run once we shouldn't have this problem again.

@jonespm jonespm changed the title Add a maximum limit for how far BQ data is Add a maximum limit for how far BQ data is retrieved (event_time) Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant