Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement tenant deletion for empty tenants #3611

Merged
merged 22 commits into from May 3, 2024

Conversation

zalegrala
Copy link
Contributor

@zalegrala zalegrala commented Apr 26, 2024

What this PR does:

In previous attempts to delete the tenant index, it was possible that undiscoverable objects remained in the backend as a partial blocks, which lead to unintended behavior for the poller, whereby the poller would discover a tenant, determine there were no blocks, and delete the index. However, the partial blocks kept the tenant in the list and so would repeat this forever, or until bucket retention policies removed the objects.

Here we take a more complete approach to tenant deletion. Upon deletion of a tenant index, we search the backend for objects on a given tenant, and then delete them after a configurable amount of time has expired since the object was last modified.

  • Implement Find() on backend readers
  • Implement Delete() on backend writers
  • Add configuration for EmptyTenantDeletionAge, default to 4x the polling cycle, which will delete objects from empty tenants after this time has expired.

Background Summary:

  • A partial block is written to a tenant and then stops sending spans.
  • Retention eventually removes all known blocks.
  • The partial block remains for the tenant because it is not known.
    • i.e: <tenant>/<blockID>/data.parquet exists
    • No meta.json nor meta.compacted.json exist for this block.
  • A poller owning this tenant calls Tenants() to know on which tenants to build an index.
    • This call is simply a list at / to determine all the tenant “directory” names.
  • For the given tenant, no blocks are found, and so the tenant performs a WriteTenantIndex()
    • If we are writing an index with no blocks, the tenant index is instead deleted.
  • A poller that does not own the tenant performs the above, with a subtle but important difference.
    • First attempt to read the index with a TenantIndex() call
    • The call will fail because the owning poller has deleted the index.
    • The failing index metric used to alert is incremented.
    • The poller will fallback to full polling as described above.

Which issue(s) this PR fixes:
Related #2678
Related #2754
Related #2781
Fixes #2878

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@zalegrala zalegrala changed the title Implement tenant deletion for tenants which have no blocks Implement tenant deletion for empty tenants Apr 26, 2024
@zalegrala zalegrala marked this pull request as ready for review April 29, 2024 15:15
tempodb/backend/raw.go Outdated Show resolved Hide resolved
tempodb/blocklist/poller.go Show resolved Hide resolved
docs/sources/tempo/configuration/_index.md Show resolved Hide resolved
tempodb/blocklist/poller.go Show resolved Hide resolved
@zalegrala zalegrala self-assigned this May 1, 2024
Copy link
Contributor

@mdisibio mdisibio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Left a few more idle thoughts and responses, but nothing blocking.

docs/sources/tempo/configuration/_index.md Show resolved Hide resolved
tempodb/backend/raw.go Outdated Show resolved Hide resolved
tempodb/blocklist/poller.go Show resolved Hide resolved
tempodb/blocklist/poller.go Show resolved Hide resolved
tempodb/blocklist/poller.go Show resolved Hide resolved
@zalegrala zalegrala merged commit 3f831f9 into grafana:main May 3, 2024
15 checks passed
@zalegrala zalegrala deleted the tenantDeletionRedux branch May 3, 2024 20:48
joe-elliott pushed a commit to joe-elliott/tempo that referenced this pull request May 7, 2024
* Extend backend.Reader with Find() function

* Extend backend.Writer with Delete() function

* Add EmptyTenantDeletion age config to tempodb

* Add delete handling to poller

* Add integration tests for poller change

* Update backend implementations

* Fix lint

* Setup test defaults

* Integrate prefix test permutations

* Add doc for the new configuration option

* Update changelog

* Fix local implementation to ignore directories

* Tidy up todo after test coverage

* Update docs/sources/tempo/configuration/_index.md

Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>

* Address some PR feedback

* Rename backend.FindOpts -> backend.FindMatch
* Adjust default delete time from 20m -> 12h

* Add log message for object deletion

* Add additional safety check to avoid deletion

* Add test default value

* Godoc update after rename

* Drop logging from the backend used for testing

* Require empty tenant deletion to be enabled in the config

* Add docs for _enabled config

---------

Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Tenant Index Failures for deleted tenants
3 participants