Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collection #975

Open
6 tasks
cavis opened this issue Feb 27, 2024 · 1 comment
Open
6 tasks

Garbage collection #975

cavis opened this issue Feb 27, 2024 · 1 comment
Labels

Comments

@cavis
Copy link
Member

cavis commented Feb 27, 2024

Cleanup our stuff.

A bunch of our data is paranoid now. But we don't have any process/hooks/crons to actually delete S3 files.

  • Write some async jobs to actually go delete S3 resources. MediaResources/Images should delete their files, maybe episodes should rm -rf their S3 directories, maybe podcasts as well?
  • Wire those jobs into the after_real_destroy callback, if that's not too dangerous.
  • Also - we can reap Tasks more often. Write a cron that goes and looks for tasks that aren't the latest for their owner, belong to soft-deleted owners, etc.
  • Make sure owner-destroys cascade to the Task

Some open questions around how long we retain data for old shows. But should figure that out as part of this ticket. And then:

  • Go through our oldest podcast_ids and delete that we don't care about.
  • Maybe we need this in staging, for integration tests, as well
@cavis
Copy link
Member Author

cavis commented Feb 27, 2024

Also wondering - should BigQuery still have a record of really-deleted podcasts?

Right now dt_downloads and dt_impressions will forever have those. But podcasts / episodes / etc get overwritten prett frequently, so the gone-show would disappear there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants