Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling of "out of sync" Tables #1246

Open
noah-paige opened this issue May 2, 2024 · 1 comment
Open

Better handling of "out of sync" Tables #1246

noah-paige opened this issue May 2, 2024 · 1 comment

Comments

@noah-paige
Copy link
Contributor

noah-paige commented May 2, 2024

Is your idea related to a problem? Please describe.
data.all has logic for when a user manually goes to delete a table (shares checked, table removed, permissions cleaned, etc.) but something to think about is how best should we handle table syncs (I am not sure the correct answer here). For instance:

  • Either when a user starts a manual sync or the scheduled table_syncer ECS task is run (ultimately both run DatasetTableService.sync_existing_tables() at some point)
  • If the glue table exists in data.all but does not exist in the glue response --> we update table status to Deleted
  • On the UI we no longer show those tables by filtering for != Deleted (ref: DatasetService.paginated_dataset_tables()
  • But these tables still do have associated permission records for who should be able to access the table
    • And if we do clean them up right away but the table DOES still exist on Glue (i.e. some other error in API Response or similar returns 0 tables incorrectly) then we have potentially just broken existing access or shares if the next time a user hits Sync the table re-appears

Describe the solution you'd like
Some ideas of what we can do:

  1. Do nothing. There will be stale tables and permission records in RDS but it avoids risk of removing permissions inappropriately and should not greatly affect logic of data.all
  2. Implement some type of garbage collection. Delete tables and associated shares on those tables after the tables have been in status of Deleted for some extended period of time (i.e. 30 days)
  3. As soon as a table status gets updated to Deleted still show it on the UI but with a Deprecated Flag and with the only option for the user to do is to delete the table + clean up shares (pre-req to table delete already)
    • Remove from Catalog (already done by sync), prevent all new shares, only can revoke share and clean-up / delete of table
    • If next sync restores the glue table back to InSync allow for normal activity again (no longer Deprecated)

P.S. Don't attach files. Please, prefer add code snippets directly in the message body.

@noah-paige
Copy link
Contributor Author

Adding to Backlog to be picked up when have some capacity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

1 participant