Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend blocks are missing compression\encoding #3562

Open
edgarkz opened this issue Apr 10, 2024 · 3 comments
Open

Backend blocks are missing compression\encoding #3562

edgarkz opened this issue Apr 10, 2024 · 3 comments

Comments

@edgarkz
Copy link
Contributor

edgarkz commented Apr 10, 2024

Describe the bug
Per documentation, both the backend and wal blocks should have configured default compression, however listing a block stats using tempo cli shows no compression at all.

To Reproduce
Steps to reproduce the behavior:

  1. Run cli command go run ./cmd/tempo-cli list block single-tenant xxxxxx --backend=s3
  2. Result shows no compression
    ID : 18a2572b-cdea-40d8-b468-9845ccc4e95f Version : vParquet3 Total Objects : 91933 Data Size : 198 MB Encoding : none Level : 1 Window : 475701 Start : 2024-04-07 21:29:59 +0000 UTC End : 2024-04-07 21:49:52 +0000 UTC Duration : 19m53s Age : 71h57m28s

Expected behavior
Blocks should have zstd compressed by default.

Environment:
eks 1.26
tempo 2.4.1 deployed via distributed helm chart

@joe-elliott
Copy link
Member

The compression field is used for the old v2 format. the vParquet* formats compress columns individually.

You can review the schema to see how each column is handled differently:

https://github.com/grafana/tempo/blob/main/tempodb/encoding/vparquet3/schema.go#L111

@edgarkz
Copy link
Contributor Author

edgarkz commented Apr 11, 2024

Hi @joe-elliott thanks for quick feedback,
So the snappy encoding is default and hardcoded as per schema or its a configurable option like in v2 format?
I'll raise a pr for docs to make it clear those changes are relevant only for v2.

@joe-elliott
Copy link
Member

The snappy encoding is hardcoded. When we initially built the schema we ran a series of tests to determine which encodings performed best for which columns and they are now hardcoded in the schema. If we wanted to change a compression type we would need to cut a new parquet version.

I'll raise a pr for docs to make it clear those changes are relevant only for v2.

awesome! thank you

@edgarkz edgarkz mentioned this issue Apr 25, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants