Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3 metrics always increasing #3529

Open
keyolk opened this issue Mar 28, 2024 · 1 comment
Open

s3 metrics always increasing #3529

keyolk opened this issue Mar 28, 2024 · 1 comment

Comments

@keyolk
Copy link

keyolk commented Mar 28, 2024

Describe the bug

Its objects and bucket size keep growing and never goes down.

image
image

Also I can see about 5GB parquet datas in each block dir.
image

When I see its log, many lines like the belows from the compactor pods

level=warn ts=2024-03-28T01:49:49.036459838Z caller=compactor.go:248 msg="max size of trace exceeded" tenant=mesg traceId=eddc0f76f1d19e6e898d1f2b60b9c431 discarded_span_count=19697

and some metrics
image

To Reproduce
Steps to reproduce the behavior:

  1. Start Tempo (SHA or version)
 /tempo -version
tempo, version 2.2.0 (branch: HEAD, revision: cce8df1b6)
  build user:
  build date:
  go version:       go1.20.4
  platform:         linux/arm64
  tags:             unknown
compactor:
  compaction:
    block_retention: 168h
    compacted_block_retention: 1h
    compaction_cycle: 30s
    compaction_window: 1h
    max_block_bytes: 1073741824
    max_compaction_objects: 600000
    max_time_per_tenant: 5m
    retention_concurrency: 10
    v2_in_buffer_bytes: 5242880
    v2_out_buffer_bytes: 20971520
    v2_prefetch_traces_count: 1000
  ring:
    kvstore:
      store: memberlist
...
storage:
  trace:
    backend: s3
    blocklist_poll: 5m
    cache: memcached
    local:
      path: /var/tempo/traces
    memcached:
      consistent_hash: true
      host: o11y-tempo-memcached
      service: memcached-client
      timeout: 500ms
    s3:
      bucket: tempo-apne2
      endpoint: s3.amazonaws.com
      region: ap-northeast-2
    wal:
      path: /var/tempo/wal

Expected behavior

s3 obejcts size should be reduced

Environment:

  • Infrastructure: EKS
  • Deployment tool: helm tempo-distributed v1.6.1

Additional Context

@joe-elliott
Copy link
Member

Based on your metrics it does seem like Tempo is performing retention, but the bucket size is still growing. If an ingester or compactor exits unexpectedly it will sometimes write a partial block that will then be "invisible" to Tempo.

We recommend setting bucket policies to remove all objects a day or so after your Tempo retention to clean up these objects. I'd recommend a similar policy for multipart uploads which s3 also likes to keep around.

The docs on this are not great. We mention the multipart upload here:

https://grafana.com/docs/tempo/latest/configuration/hosted-storage/s3/#lifecycle-policy

but no real mention of the partial blocks. If this solves your issue, I'd like to turn this into a docs issue to add these details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants