Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Add capability for operators to monitor etcd data #597

Open
unmarshall opened this issue Mar 2, 2023 · 2 comments
Open

[Enhancement] Add capability for operators to monitor etcd data #597

unmarshall opened this issue Mar 2, 2023 · 2 comments
Assignees
Labels
area/metering Metering related area/monitoring Monitoring (including availability monitoring and alerting) related kind/enhancement Enhancement, improvement, extension lifecycle/stale Nobody worked on this for 6 months (will further age) priority/2 Priority (lower number equals higher priority)

Comments

@unmarshall
Copy link
Contributor

unmarshall commented Mar 2, 2023

Enhancement (What you would like to be added):
There is a need to get insights into data that it stores in the DB (bbolt-DB). This provides valuable information on which resource type has the most keys and size.
@istvanballok recently executed the following command to get that data out of etcd:

apk add jq util-linux
etcdctl --insecure-skip-tls-verify --cert /var/etcd/ssl/client/server/tls.crt --key /var/etcd/ssl/client/server/tls.key --cacert /var/etcd/ssl/client/ca/bundle.crt get --prefix / -w json | jq '.kvs[] | {key: .key | @base64d, valueLength: .value | length} | "\(.key | sub("/[^/]+/((?<type>[^/.]+)/.*|[^/]+/(?<customtype>[^/]+)/.*)";"\(.type  // .customtype)")) \(.valueLength)"' -r | awk '{sum[$1]+=$2; count[$1]++} END{for (key in sum) {printf "%s %s %s\n", sum[key], count[key], key}}' | sort -rn | column -t

Example output:

34156612  291   shootstates
17002464  7271  meteringreports
9932816   2592  secrets
5786756   476   shoots
3438780   38    cloudprofiles

It would be beneficial for the operators/devs to get easy access to this data either on demand or as custom metrics that are exposed to prometheus.

NOTE: The above is just one set of information. We should identify additional information/custom-metrics that is not available out-of-the-box from etcd over time.

Motivation (Why is this needed?):
Use cases:

  • Operators can inspect the etcd data to know why etcd DB is close to the 8GB mark and perhaps take corrective actions.
  • Developers can inspect this data over a period of time and fine tune the resource that get stored in etcd.

Approach/Hint to the implement solution (optional):

@unmarshall unmarshall added kind/enhancement Enhancement, improvement, extension area/monitoring Monitoring (including availability monitoring and alerting) related area/metering Metering related labels Mar 2, 2023
@unmarshall
Copy link
Contributor Author

unmarshall commented Mar 7, 2023

Apart from the above mentioned metrics, additional requirements post discussion with @istvanballok

  • Rate of object writes by resource type
  • Top 10 property paths with frequent changes. E.g if a resource is getting too many updates then expose a metric to also capture which property changes most of the times.

@shreyas-s-rao shreyas-s-rao added the priority/2 Priority (lower number equals higher priority) label May 3, 2023
@shreyas-s-rao
Copy link
Collaborator

/assign @abdasgupta

@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metering Metering related area/monitoring Monitoring (including availability monitoring and alerting) related kind/enhancement Enhancement, improvement, extension lifecycle/stale Nobody worked on this for 6 months (will further age) priority/2 Priority (lower number equals higher priority)
Projects
None yet
Development

No branches or pull requests

4 participants