Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store metrics using explicit schema #165

Closed
ants opened this issue Apr 5, 2023 · 2 comments
Closed

Store metrics using explicit schema #165

ants opened this issue Apr 5, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request refactoring Something done as it should've been done from the start sinks Where and how to store monitored data

Comments

@ants
Copy link

ants commented Apr 5, 2023

In pgw2 data and tags are stored as jsonb. This duplicates schema information in every row, giving bad storage efficiency for uncompressed data. It also defeats timeseries compression mechanisms for compressed data. Additionally, any access to data needs to decompress the whole document instead of accessing just the necessary column, which is very important for columnar storage.

Storing series metadata (e.g. full query text) out-of-line in a separate table might also be a good idea.

TBD: measurements on real world data

@pashagolub pashagolub self-assigned this Apr 5, 2023
@pashagolub pashagolub added enhancement New feature or request metrics Metrics related issues refactoring Something done as it should've been done from the start labels Apr 5, 2023
@pashagolub pashagolub added this to the Metrics format milestone Apr 5, 2023
@ants
Copy link
Author

ants commented Jun 14, 2023

Measurement results on 45M rows of real world stat_statements data (6db * 3mo) + a query to get month worth of data for a single queryid, 4 data cols.

Schema Size uncompressed Size compressed Compression ratio Query time Buffers accessed
jsonb 70 GB 8050 MB 8.9x 1600ms 143596
relational 11 GB 763 MB 14.5x 4ms 522

@kmoppel
Copy link
Contributor

kmoppel commented Aug 30, 2023

My .02$ - in practice I think that extra data proliferation only becomes visible only with this stat_statements metric - and using TimescaleDB reduced old data heavily still (aroun ~10x) - so indeed, would be nice, but not sure if its worth the extra complexity as users on very high instance counts would probably prefer Prometheus anyways.

@pashagolub pashagolub added sinks Where and how to store monitored data and removed metrics Metrics related issues labels Jan 11, 2024
@cybertec-postgresql cybertec-postgresql locked and limited conversation to collaborators May 15, 2024
@pashagolub pashagolub converted this issue into discussion #447 May 15, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
enhancement New feature or request refactoring Something done as it should've been done from the start sinks Where and how to store monitored data
Projects
None yet
Development

No branches or pull requests

3 participants