Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReplicationSource cacheCapacity space avialble metric #1159

Open
reefland opened this issue Mar 7, 2024 · 2 comments
Open

ReplicationSource cacheCapacity space avialble metric #1159

reefland opened this issue Mar 7, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@reefland
Copy link

reefland commented Mar 7, 2024

Describe the feature you'd like to have.
I'm was trying to have existing PrometheusRules to alert me if the volsync cache PVCs are not sized appropriately.

Then I noticed that I do not see PVCs created by volsync visible within Prometheus. Perhaps this is just how volsync uses PVC's and kubelet can't gather metrics on PVCs not actively mounted.

What is the value to the end user? (why is it a priority?)
The docs state This volume contains cached metadata from the backup repository. It must be large enough to hold the non-pruned repository metadata.

  • I do not know how much space is being used by Restic metadata or how that changes over time
  • I would like to bump up the cache size before the volume fills up and volsync backups are impacted

How will we know we have a good solution? (acceptance criteria)
I'm going to assume that volsync does not normally mount cache PVCs (and thus kubelet can't not report on it). If this is true, perhaps when trigger.schedule event happens would it be possible for volsync to then emit its own metric with cache capacity? perhaps percent free? Something like volsync_cache_capacity_available

maybe "-1" if unknown (no event triggered), otherwise a number between 0 and 100 as a percentage of capacity left.

Then I can have an alert like:

- alert: VolSyncCacheVolumeCapacityLow
  annotation:
    summary: >-
        {{ $labels.obj_namespace }}/{{ $labels.obj_name }} cache volume space is almost full. 
        Increase size of cacheCapacity value.
    description: >-
        {{ $labels.obj_namespace }}/{{ $labels.obj_name }} cache volume space is < 15%.
        VALUE = {{ $value }}
    expr: |
      volsync_cache_capacity_available > -1 and volsync_cache_capacity_available < 15
    for: 15m
    labels:
      severity: critical
@reefland reefland added the enhancement New feature or request label Mar 7, 2024
@tesshuflower
Copy link
Contributor

The VolSync controller doesn't mount a restic cache PVC itself, it's mounted to the mover pod from the job that runs during a sync however. Can you see stats for when the mover job is running?

As such, I'm not sure we want to try to capture this usage data and have it sent back to the controller to emit as events.

Depending on your CSI driver, maybe it's possible to get some stats via volume health monitoring? https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1432-volume-health-monitor#kubelet-metrics-changes

I've never looked into this myself, but looks like potentially there could be VolumeUsage reported.

@reefland
Copy link
Author

reefland commented Mar 8, 2024

The kubelet_volume_stats_* series of metrics contain the data I want such as used_bytes or capacity_bytes but none of the PVCs created by volsync are listed. Perhaps the mover pods have the cache volume mounted so briefly it hasn't happened when kubelet is fetching data?

kube_persistentvolume_capacity_bytes does include PVCs created by volsync, but only total capacity of the volume is available. kube_persistentvolume_* series of metric do not contain any use information.

I was unable to locate anything about "VolumeUsage" other than above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

2 participants