Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple retentions in one file [BUG] #329

Open
dschmidt-itn opened this issue Jul 25, 2023 · 2 comments
Open

multiple retentions in one file [BUG] #329

dschmidt-itn opened this issue Jul 25, 2023 · 2 comments
Labels

Comments

@dschmidt-itn
Copy link

If I have multiple different retentions (archives) in one file I can't get them to show together in one graph.
Is this normal or am I doing something wrong? Whisper-fetch also displays them wrong if I set a "--from"

Steps to reproduce the behavior:
whisper-resize $file 60s:5d 5m:30d 15m:365d 1h:5y

For example if I choose the retention periods above and I set a "--from=" with a timestamp from 7 days ago I only get values up until 5 days ago (the 5 minute-values) after that the graph just "stops" and whisper-fetch only shows "None"
But If I choose a start time from 3 days ago I get the correct values.

Environment (please complete the following information):

  • OS flavor: Ubuntu 22.04
  • Setup type: from OS packages
  • Python 3.10.6
@rrendec
Copy link

rrendec commented May 9, 2024

I have a similar problem. Looking at your whisper-resize command, it seems to me that you're actually getting the 1-minute values (not the 5-minute values). So there's a problem with the 5m:30d set, which is supposed to be used when you go back further than 5 days.

In my case, I have this in storage-schemas.conf:

[default]
pattern = .*
retentions = 60s:1d,5m:30d,1h:3y

And I can only get values (or graphs) as far as 1 day back.

But this is where it gets interesting. I have two different sensors, one that sends data every 1 hour and another one that sends data every 15 minutes. For the first sensor, all values beyond 1 day come back as None. For the other sensor, I get some values, but still a lot more None gaps than expected.

I believe there's something wrong with the way data is aggregated for the lower resolution data sets. There's something about that in the documentation, but it only says that the "average" method is the default, it doesn't say how you can configure a different method.

I haven't looked at the code, but I think new values are not fed directly into the lower resolution archives (like the documentation says), and instead they are generated by applying the aggregation function over the higher resolution values (which makes more sense if you think about it). And the problem we're seeing is probably related to the fact that the higher resolution values are "sparse" i.e. most of them are None.

@rrendec
Copy link

rrendec commented May 9, 2024

There is no better documentation than the source code 😄

After looking at the __propagate() and create() functions in whisper.py, I quickly realized this was a feature, not a bug. There's something called "X-files factor", which defines the minimum ratio of valid values (i.e. non-None) of the total aggregated data points to generate a valid aggregated value. And the default is 0.5 (half).

For example, in my case the retention is configured as 60s:1d,5m:30d,1h:3y, which means 5 data points in the first set are aggregated into 1 data point in the second set. With the default X-Files factor of 0.5, it means at least 3 values out of 5 must be valid. Since my sensor logs a value every 1 hour and the resolution of the first data set is 1 minute, I get series of 1 valid data point and 59 invalid data points. Of course, when this is aggregated into the 5-minute set, the one valid data point is always surrounded by 4 other invalid data points, no matter how the 5 consecutive data points are picked. This falls under the 0.5 threshold.

Next, I looked at lib/carbon/storage.py in the carbon Git repository, and found out that the X-files factor (and the aggregation method) can be configured in the storage-aggregation.conf file. This is actually pretty neat, because you can use different patterns for the retention and the aggregation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants