Kubernetes cache optimization #2736

r2k1 · 2024-05-07T00:30:42Z

Summary

Currently, most Kubernetes objects with all their fields are stored in memory. Copies are also made on some requests, which creates multiple in-memory duplicates of etcd storage. This particularly troublesome for large clusters, requiring large amount of memory.

This update aims to optimize the memory footprint by retaining only the fields actively used in the codebase. This is achieved by replacing the Informer with the Reflector and caching only the transformed data instead of the entire original object.

The core changes are located in pkg/clustercache.

Testing and Results

Estimating memory usage can be challenging. To measure the impact of this change, I ran opencost locally while connected to a remote Kubernetes cluster, triggered requests to the /metrics endpoints, and recorded memory consumption. (Temporary changes in cmd/costmodel/main.go were used for testing purposes).

Test results from a simple cluster with a basic synthetic workload:

Baseline (develop branch):
lastHeapAlloc=87.7MiB, maxHeapAlloc=122.1MiB
With Optimization:
lastHeapAlloc=56.0MiB, maxHeapAlloc=82.7MiB

Analysis:

Static overhead is estimated at around 43MiB (due to kube configs, carbon footprint data, etc).
Adjusting for this static overhead:
- The new maximum memory usage is approximately 50% of the original (83-43=40 MiB vs. 122-43=79 MiB).
- The new average memory usage is approximately 30% of the original (56-43=13 MiB vs. 88-43=45 MiB).

Implications:

This optimization is expected to yield more substantial improvements for larger clusters handling real workloads.
Further reduction opportunities exist, as not all unused fields have been removed yet.

Additional Testing

A diff comparison between the original and optimized /metrics responses did not reveal any significant discrepancies, but further testing is required

Breaking changes.

Here are the changes that might break compatibility:

Removal of clustercache importer/exporter. This feature was undocumented. If it's used anywhere, additional information about its purpose will be required.
There is no blocking until the cluster cache is ready. (It seems to be a limitation of Reflectors, I couldn't figure out a way to achieve it)
New responses for /allStorageClasses, /allStatefulSets, /allPods, etc. The purpose of these endpoints isn't fully clear. They might be replaced with direct proxy calls to the Kubernetes API instead of using an in-memory cache.
Any other areas where Kubernetes objects were serialized/deserialized might be affected.

Signed-off-by: r2k1 <yokree@gmail.com>

vercel · 2024-05-07T00:30:48Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
opencost	✅ Ready (Inspect)	Visit Preview	May 7, 2024 0:30am

r2k1 · 2024-05-07T00:33:36Z

@opencost/opencost-maintainers any thoughts on it? Especially @mbolt35 and @AjayTripathy

nikovacevic · 2024-05-08T15:46:49Z

Thanks @r2k1 -- at a glance, this looks great. Totally supportive. If Bolt and Ajay aren't able to review this week, I'll loop back and give an official review ASAP.

mbolt35 · 2024-05-09T15:41:32Z

pkg/clustercache/clustercache.go

+		pvWatch:                NewCachingWatcher(coreRestClient, "persistentvolumes", &v1.PersistentVolume{}, "", fields.Everything()),
+		pvcWatch:               NewCachingWatcher(coreRestClient, "persistentvolumeclaims", &v1.PersistentVolumeClaim{}, "", fields.Everything()),
+		storageClassWatch:      NewCachingWatcher(storageRestClient, "storageclasses", &stv1.StorageClass{}, "", fields.Everything()),
+		jobsWatch:              NewCachingWatcher(batchClient, "jobs", &batchv1.Job{}, "", fields.Everything()),


I'm pretty sure that everything that is being monitored here is done so for one reason or another, not limited to just opencost scope. If we're going to remove these, we would really want to make sure that everything downstream is updated.

Either way I'm now able to prepare reduced structure if I don't know what fields are used.
I check the usage and fix compilation errors; in this case I would just create empty structs.

It's unfortunate that kubecost requirements adds extra footprint on opencost users.

I guess it will require someone from kubecost to mark all the used fields for all k8s objects

mbolt35 · 2024-05-09T15:46:56Z

pkg/cmd/agent/agent.go

-	// Initialize cluster exporting if it's enabled
-	if env.IsExportClusterCacheEnabled() {
-		cacheLocation := confManager.ConfigFileAt(path.Join(configPrefix, "cluster-cache.json"))
-		clusterExporter = clustercache.NewClusterExporter(clusterCache, cacheLocation, ClusterExportInterval)
-		clusterExporter.Run()
-	}
-


Why remove this?

I’m not sure what to do with this part yet. For now, I’ve just turned it off. This way, I can try other things without having to change too much at the start.

This change will affect the format of data is saved and loaded, which might cause problems. I need to understand better how it’s used before I can decide what to do next.

Could this kind of feature removal be behind a flag at all?

I'll put it back in the final version.

But it will likely fail to load any existing files.

Can you provide any extra information on how it's used?

I think it's used by a few upstream dependencies that have a multicluster implementation. We can check if its necesary but we should probably issue a deprecation notice even if we can't find anything that uses it.

I put it back

mbolt35 · 2024-05-09T15:47:50Z

pkg/env/kubemetricsenv.go

-func IsExportClusterCacheEnabled() bool {
-	return env.GetBool(ExportClusterCacheEnabledEnvVar, false)
-}
-


Why are we removing this option?

mbolt35 · 2024-05-09T15:51:48Z

I think there was a ResourceWatcher[T] instance written a while back -- I believe it achieves the same functionality as what is being proposed (would just like to separately confirm that Kubernetes API doesn't provide a very nice way to get around the internal Store implementation when using a watcher):

package watcher

import (
	"k8s.io/apimachinery/pkg/fields"
	"k8s.io/apimachinery/pkg/runtime"
	"k8s.io/client-go/rest"
	"k8s.io/client-go/tools/cache"

	"k8s.io/klog"
)

// ResourceHandler is a function alias for a receiver of an update or remove.
type ResourceHandler[T any] func(*T)

// ResourceWatcher is a controller which constructs a lightweight store implementation for add/update/remove
// passthroughs via a kubernetes Reflector.
type ResourceWatcher[T any] struct {
	listWatch *cache.ListWatch
	reflector *cache.Reflector
}

// KubernetesResource is a generic constraint that ensures the T passed into NewWatcher will implement kubernetes runtime.Object
// and pointer receiver for the T instance.
type KubernetesResource[T any] interface {
	runtime.Object
	*T
}

// NewWatcher creates a new ResourceWatcher[T] instance which calls the provided handlers on add, update, and removal.
func NewResourceWatcher[T any, U KubernetesResource[T]](
	restClient rest.Interface,
	resource string,
	namespace string,
	fieldSelector fields.Selector,
	updateHandler ResourceHandler[T],
	removeHandler ResourceHandler[T],
) *ResourceWatcher[T] {
	// we construct a lightweight store proxy which doesn't actually store
	// resources. instead, it passes lists, adds, updates, and removals
	// through to the handlers
	proxy := &storeProxy[T]{
		onUpdate:  updateHandler,
		onRemoved: removeHandler,
	}

	resourceType := new(T)
	listWatch := cache.NewListWatchFromClient(restClient, resource, namespace, fieldSelector)
	reflector := cache.NewReflector(listWatch, resourceType, proxy, 0)

	return &ResourceWatcher[T]{
		listWatch: listWatch,
		reflector: reflector,
	}
}

// Run starts the underlying kubernetes reflector and begins receiving watch updates
func (w *ResourceWatcher[T]) Run(stopCh <-chan struct{}) {
	w.reflector.Run(stopCh)
}

//--------------------------------------------------------------------------
//  store proxy
//--------------------------------------------------------------------------
// The kubernetes Reflector implementation only leverages 4 methods on the
// Store contract: Add, Update, Delete, and Replace. Since the point of this
// optimization is to completely avoid caching the full resource objects, we
// provide no-ops for the remaining methods: List, ListKeys, Get, GetByKey,
// and Resync.
//
// Unfortunately, without recreating the Reflector using a lighter interface,
// there's not another way to avoid this. The side-effect is that at any time
// the Reflector implementation could change and begin using the no-op methods.
// It would be unlikely, but possible and worth noting here
//--------------------------------------------------------------------------

// storeProxy is a write-only storage mechanism used to inject passthrough behavior
// to the kubernetes Reflector utility for resource watching.
type storeProxy[T any] struct {
	onUpdate  ResourceHandler[T]
	onRemoved ResourceHandler[T]
}

// Add tests to ensure the object is the correct type and passes it to the update method.
func (ps *storeProxy[T]) Add(obj interface{}) error {
	if resource, ok := obj.(*T); ok {
		ps.onUpdate(resource)
	}
	return nil
}

// Update tests to ensure the object is the correct type and passes it to the update method.
func (ps *storeProxy[T]) Update(obj interface{}) error {
	if resource, ok := obj.(*T); ok {
		ps.onUpdate(resource)
	}
	return nil
}

// Delete tests to ensure the object is the correct type and passes it to the remove method.
func (ps *storeProxy[T]) Delete(obj interface{}) error {
	if resource, ok := obj.(*T); ok {
		ps.onRemoved(resource)
	}
	return nil
}

// Replace occurs on initial listing of the resource, so we simply iterate the replacement items
// and call add.
func (ps *storeProxy[T]) Replace(items []interface{}, resourceVersion string) error {
	for _, item := range items {
		ps.Add(item)
	}
	return nil
}

// No-Op
func (ps *storeProxy[T]) List() []interface{} {
	klog.Warning("Unexpected call: List() on resource store proxy")
	return nil
}

// No-Op
func (ps *storeProxy[T]) ListKeys() []string {
	klog.Warning("Unexpected call: ListKeys() on resource store proxy")
	return nil
}

// No-Op
func (ps *storeProxy[T]) Get(obj interface{}) (item interface{}, exists bool, err error) {
	klog.Warning("Unexpected call: Get() on resource store proxy")
	return nil, false, nil
}

// No-Op
func (ps *storeProxy[T]) GetByKey(key string) (item interface{}, exists bool, err error) {
	klog.Warning("Unexpected call: GetByKey() on resource store proxy")
	return nil, false, nil
}

// No-Op
func (ps *storeProxy[T]) Resync() error {
	klog.Warning("Unexpected call: Resync() on resource store proxy")
	return nil
}

mbolt35 · 2024-05-09T16:02:38Z

Overall, removing so much functionality concerns me, as opencost is a public package that has dependents. Since we're now also providing our own flyweight model for the resources, any dependents are now limited to opencost requirements, not kubernetes. Since we're not providing any way to publically extend the watcher list or provide custom model mapping,
dependents would have to completely scrap using this.

I totally support the notation of reducing the footprint here, as the watchers and the resource objects themselves can be quite taxing on cpu and memory. Thanks for drafting this up -- would love to work with you to figure out how we can make this work. Give me some time to survey some of our dependent code's use-cases and see if I can't propose something that would push this through without being as severe of a breaking change.

Thanks again :)

r2k1 · 2024-05-09T22:51:45Z

The ResourceHandler[T any] seems to achieve similar outcome.

AjayTripathy · 2024-05-13T05:48:22Z

Is there any way to switch between the two implementations, possiby with an interface? That way we wouldn't have to worry about deprecating potentially used fields. Won't block this PR on that though given we can also review dependencies on our side.

r2k1 · 2024-05-13T06:56:24Z

I don't see a reasonable implementation hidden behind an interface. However, depending on your use of interface{}, I would expect most potential issues to be caught during compilation time (excluding serialization/deserialization)

Signed-off-by: r2k1 <yokree@gmail.com>

sonarcloud · 2024-05-17T02:19:58Z

Quality Gate passed

Issues
8 New issues
0 Accepted issues

Measures
0 Security Hotspots
2.1% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

r2k1 · 2024-05-17T03:33:55Z

I run some tests on a synthetic workload with 10 nodes X 250 pods each:

Before the change:
avg=171MiB max=266MiB

After the change:
avg=91MiB max=158MiB

Subtraction of 40MiB of static load (part that doesn't scale with cluster size):

Before the change:
avg=131MiB max=226MiB

After the change:
avg=51MiB max=118MiB

This is synthetic workload, with bare minimum object fields populated. Memory consumption is reduced in half.
On real cluster with complex object spec savings in memory should be more significant.

r2k1 · 2024-05-17T03:35:38Z

I'll need more help to find if there are any additional fields that's being used outside of opencost.

r2k1 added 17 commits May 6, 2024 14:31

delete cluster importer/exporter

ba0f535

Signed-off-by: r2k1 <yokree@gmail.com>

use reduced pod and namespace structs

538f161

Signed-off-by: r2k1 <yokree@gmail.com>

use reduced node struct

d95ccf9

Signed-off-by: r2k1 <yokree@gmail.com>

use reduced service struct

7aa5dcf

Signed-off-by: r2k1 <yokree@gmail.com>

use reduced daemonset struct

9014b34

Signed-off-by: r2k1 <yokree@gmail.com>

use reduced deployment struct

97cb286

Signed-off-by: r2k1 <yokree@gmail.com>

use reduced statefulset struct

b034f72

Signed-off-by: r2k1 <yokree@gmail.com>

delete unused ReplicaSets ReplicationControllers watches

14cf317

Signed-off-by: r2k1 <yokree@gmail.com>

use reduced PersistentVolume struct

e7bb17c

Signed-off-by: r2k1 <yokree@gmail.com>

use reduced PersistentVolumeClaim struct

622eb40

Signed-off-by: r2k1 <yokree@gmail.com>

use reduced StorageClass struct

b7fdb11

Signed-off-by: r2k1 <yokree@gmail.com>

use reduced Job struct

b33cda2

Signed-off-by: r2k1 <yokree@gmail.com>

delete unused PodDisruptionBudget

9caf2ec

Signed-off-by: r2k1 <yokree@gmail.com>

create reflector-based version of cluster cache

58ff00f

Signed-off-by: r2k1 <yokree@gmail.com>

update uid getter

ade0be4

Signed-off-by: r2k1 <yokree@gmail.com>

upgrade kube api version

d5df71e

Signed-off-by: r2k1 <yokree@gmail.com>

rename NewKubernetesClusterCache2 => NewKubernetesClusterCacheV2

04b1d40

Signed-off-by: r2k1 <yokree@gmail.com>

github-actions bot added the needs-follow-up label May 7, 2024

mbolt35 reviewed May 9, 2024

View reviewed changes

improve type safety

49de435

Signed-off-by: r2k1 <yokree@gmail.com>

r2k1 added 3 commits May 16, 2024 16:00

add replicationControllerStore, replicaSetStore, pdbStore

17b58df

Signed-off-by: r2k1 <yokree@gmail.com>

return cluster importer/exporter

7a35ecf

Signed-off-by: r2k1 <yokree@gmail.com>

group statements

5f72d06

Signed-off-by: r2k1 <yokree@gmail.com>

r2k1 marked this pull request as ready for review May 26, 2024 00:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes cache optimization #2736

Kubernetes cache optimization #2736

r2k1 commented May 7, 2024 •

edited

vercel bot commented May 7, 2024

r2k1 commented May 7, 2024

nikovacevic commented May 8, 2024

mbolt35 May 9, 2024

r2k1 May 9, 2024

mbolt35 May 9, 2024

r2k1 May 9, 2024 •

edited

AjayTripathy May 13, 2024

r2k1 May 13, 2024

AjayTripathy May 14, 2024

r2k1 May 17, 2024

mbolt35 May 9, 2024

r2k1 May 17, 2024

mbolt35 commented May 9, 2024

mbolt35 commented May 9, 2024

r2k1 commented May 9, 2024

AjayTripathy commented May 13, 2024

r2k1 commented May 13, 2024

sonarcloud bot commented May 17, 2024

r2k1 commented May 17, 2024 •

edited

r2k1 commented May 17, 2024

Kubernetes cache optimization #2736

Are you sure you want to change the base?

Kubernetes cache optimization #2736

Conversation

r2k1 commented May 7, 2024 • edited

Summary

Testing and Results

Additional Testing

Breaking changes.

vercel bot commented May 7, 2024

r2k1 commented May 7, 2024

nikovacevic commented May 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

r2k1 May 9, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbolt35 commented May 9, 2024

mbolt35 commented May 9, 2024

r2k1 commented May 9, 2024

AjayTripathy commented May 13, 2024

r2k1 commented May 13, 2024

sonarcloud bot commented May 17, 2024

Quality Gate passed

r2k1 commented May 17, 2024 • edited

r2k1 commented May 17, 2024

r2k1 commented May 7, 2024 •

edited

r2k1 May 9, 2024 •

edited

r2k1 commented May 17, 2024 •

edited