Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes cache optimization #2736

Open
wants to merge 21 commits into
base: develop
Choose a base branch
from
Open

Kubernetes cache optimization #2736

wants to merge 21 commits into from

Conversation

r2k1
Copy link
Contributor

@r2k1 r2k1 commented May 7, 2024

Summary

Currently, most Kubernetes objects with all their fields are stored in memory. Copies are also made on some requests, which creates multiple in-memory duplicates of etcd storage. This particularly troublesome for large clusters, requiring large amount of memory.

This update aims to optimize the memory footprint by retaining only the fields actively used in the codebase. This is achieved by replacing the Informer with the Reflector and caching only the transformed data instead of the entire original object.

The core changes are located in pkg/clustercache.

Testing and Results

Estimating memory usage can be challenging. To measure the impact of this change, I ran opencost locally while connected to a remote Kubernetes cluster, triggered requests to the /metrics endpoints, and recorded memory consumption. (Temporary changes in cmd/costmodel/main.go were used for testing purposes).

Test results from a simple cluster with a basic synthetic workload:

  • Baseline (develop branch):
    lastHeapAlloc=87.7MiB, maxHeapAlloc=122.1MiB

  • With Optimization:
    lastHeapAlloc=56.0MiB, maxHeapAlloc=82.7MiB

Analysis:

  • Static overhead is estimated at around 43MiB (due to kube configs, carbon footprint data, etc).
  • Adjusting for this static overhead:
    • The new maximum memory usage is approximately 50% of the original (83-43=40 MiB vs. 122-43=79 MiB).
    • The new average memory usage is approximately 30% of the original (56-43=13 MiB vs. 88-43=45 MiB).

Implications:

  • This optimization is expected to yield more substantial improvements for larger clusters handling real workloads.
  • Further reduction opportunities exist, as not all unused fields have been removed yet.

Additional Testing

A diff comparison between the original and optimized /metrics responses did not reveal any significant discrepancies, but further testing is required

Breaking changes.

Here are the changes that might break compatibility:

  • Removal of clustercache importer/exporter. This feature was undocumented. If it's used anywhere, additional information about its purpose will be required.
  • There is no blocking until the cluster cache is ready. (It seems to be a limitation of Reflectors, I couldn't figure out a way to achieve it)
  • New responses for /allStorageClasses, /allStatefulSets, /allPods, etc. The purpose of these endpoints isn't fully clear. They might be replaced with direct proxy calls to the Kubernetes API instead of using an in-memory cache.
  • Any other areas where Kubernetes objects were serialized/deserialized might be affected.

r2k1 added 17 commits May 6, 2024 14:31
Signed-off-by: r2k1 <yokree@gmail.com>
Signed-off-by: r2k1 <yokree@gmail.com>
Signed-off-by: r2k1 <yokree@gmail.com>
Signed-off-by: r2k1 <yokree@gmail.com>
Signed-off-by: r2k1 <yokree@gmail.com>
Signed-off-by: r2k1 <yokree@gmail.com>
Signed-off-by: r2k1 <yokree@gmail.com>
Signed-off-by: r2k1 <yokree@gmail.com>
Signed-off-by: r2k1 <yokree@gmail.com>
Signed-off-by: r2k1 <yokree@gmail.com>
Signed-off-by: r2k1 <yokree@gmail.com>
Signed-off-by: r2k1 <yokree@gmail.com>
Signed-off-by: r2k1 <yokree@gmail.com>
Signed-off-by: r2k1 <yokree@gmail.com>
Signed-off-by: r2k1 <yokree@gmail.com>
Signed-off-by: r2k1 <yokree@gmail.com>
Copy link

vercel bot commented May 7, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
opencost ✅ Ready (Inspect) Visit Preview May 7, 2024 0:30am

@r2k1
Copy link
Contributor Author

r2k1 commented May 7, 2024

@opencost/opencost-maintainers any thoughts on it? Especially @mbolt35 and @AjayTripathy

@nikovacevic
Copy link
Contributor

Thanks @r2k1 -- at a glance, this looks great. Totally supportive. If Bolt and Ajay aren't able to review this week, I'll loop back and give an official review ASAP.

pvWatch: NewCachingWatcher(coreRestClient, "persistentvolumes", &v1.PersistentVolume{}, "", fields.Everything()),
pvcWatch: NewCachingWatcher(coreRestClient, "persistentvolumeclaims", &v1.PersistentVolumeClaim{}, "", fields.Everything()),
storageClassWatch: NewCachingWatcher(storageRestClient, "storageclasses", &stv1.StorageClass{}, "", fields.Everything()),
jobsWatch: NewCachingWatcher(batchClient, "jobs", &batchv1.Job{}, "", fields.Everything()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that everything that is being monitored here is done so for one reason or another, not limited to just opencost scope. If we're going to remove these, we would really want to make sure that everything downstream is updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way I'm now able to prepare reduced structure if I don't know what fields are used.
I check the usage and fix compilation errors; in this case I would just create empty structs.

It's unfortunate that kubecost requirements adds extra footprint on opencost users.

I guess it will require someone from kubecost to mark all the used fields for all k8s objects

Comment on lines 191 to 197
// Initialize cluster exporting if it's enabled
if env.IsExportClusterCacheEnabled() {
cacheLocation := confManager.ConfigFileAt(path.Join(configPrefix, "cluster-cache.json"))
clusterExporter = clustercache.NewClusterExporter(clusterCache, cacheLocation, ClusterExportInterval)
clusterExporter.Run()
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this?

Copy link
Contributor Author

@r2k1 r2k1 May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure what to do with this part yet. For now, I’ve just turned it off. This way, I can try other things without having to change too much at the start.

This change will affect the format of data is saved and loaded, which might cause problems. I need to understand better how it’s used before I can decide what to do next.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this kind of feature removal be behind a flag at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put it back in the final version.

But it will likely fail to load any existing files.

Can you provide any extra information on how it's used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's used by a few upstream dependencies that have a multicluster implementation. We can check if its necesary but we should probably issue a deprecation notice even if we can't find anything that uses it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it back

Comment on lines 23 to 26
func IsExportClusterCacheEnabled() bool {
return env.GetBool(ExportClusterCacheEnabledEnvVar, false)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing this option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restored

@mbolt35
Copy link
Contributor

mbolt35 commented May 9, 2024

I think there was a ResourceWatcher[T] instance written a while back -- I believe it achieves the same functionality as what is being proposed (would just like to separately confirm that Kubernetes API doesn't provide a very nice way to get around the internal Store implementation when using a watcher):

package watcher

import (
	"k8s.io/apimachinery/pkg/fields"
	"k8s.io/apimachinery/pkg/runtime"
	"k8s.io/client-go/rest"
	"k8s.io/client-go/tools/cache"

	"k8s.io/klog"
)

// ResourceHandler is a function alias for a receiver of an update or remove.
type ResourceHandler[T any] func(*T)

// ResourceWatcher is a controller which constructs a lightweight store implementation for add/update/remove
// passthroughs via a kubernetes Reflector.
type ResourceWatcher[T any] struct {
	listWatch *cache.ListWatch
	reflector *cache.Reflector
}

// KubernetesResource is a generic constraint that ensures the T passed into NewWatcher will implement kubernetes runtime.Object
// and pointer receiver for the T instance.
type KubernetesResource[T any] interface {
	runtime.Object
	*T
}

// NewWatcher creates a new ResourceWatcher[T] instance which calls the provided handlers on add, update, and removal.
func NewResourceWatcher[T any, U KubernetesResource[T]](
	restClient rest.Interface,
	resource string,
	namespace string,
	fieldSelector fields.Selector,
	updateHandler ResourceHandler[T],
	removeHandler ResourceHandler[T],
) *ResourceWatcher[T] {
	// we construct a lightweight store proxy which doesn't actually store
	// resources. instead, it passes lists, adds, updates, and removals
	// through to the handlers
	proxy := &storeProxy[T]{
		onUpdate:  updateHandler,
		onRemoved: removeHandler,
	}

	resourceType := new(T)
	listWatch := cache.NewListWatchFromClient(restClient, resource, namespace, fieldSelector)
	reflector := cache.NewReflector(listWatch, resourceType, proxy, 0)

	return &ResourceWatcher[T]{
		listWatch: listWatch,
		reflector: reflector,
	}
}

// Run starts the underlying kubernetes reflector and begins receiving watch updates
func (w *ResourceWatcher[T]) Run(stopCh <-chan struct{}) {
	w.reflector.Run(stopCh)
}

//--------------------------------------------------------------------------
//  store proxy
//--------------------------------------------------------------------------
// The kubernetes Reflector implementation only leverages 4 methods on the
// Store contract: Add, Update, Delete, and Replace. Since the point of this
// optimization is to completely avoid caching the full resource objects, we
// provide no-ops for the remaining methods: List, ListKeys, Get, GetByKey,
// and Resync.
//
// Unfortunately, without recreating the Reflector using a lighter interface,
// there's not another way to avoid this. The side-effect is that at any time
// the Reflector implementation could change and begin using the no-op methods.
// It would be unlikely, but possible and worth noting here
//--------------------------------------------------------------------------

// storeProxy is a write-only storage mechanism used to inject passthrough behavior
// to the kubernetes Reflector utility for resource watching.
type storeProxy[T any] struct {
	onUpdate  ResourceHandler[T]
	onRemoved ResourceHandler[T]
}

// Add tests to ensure the object is the correct type and passes it to the update method.
func (ps *storeProxy[T]) Add(obj interface{}) error {
	if resource, ok := obj.(*T); ok {
		ps.onUpdate(resource)
	}
	return nil
}

// Update tests to ensure the object is the correct type and passes it to the update method.
func (ps *storeProxy[T]) Update(obj interface{}) error {
	if resource, ok := obj.(*T); ok {
		ps.onUpdate(resource)
	}
	return nil
}

// Delete tests to ensure the object is the correct type and passes it to the remove method.
func (ps *storeProxy[T]) Delete(obj interface{}) error {
	if resource, ok := obj.(*T); ok {
		ps.onRemoved(resource)
	}
	return nil
}

// Replace occurs on initial listing of the resource, so we simply iterate the replacement items
// and call add.
func (ps *storeProxy[T]) Replace(items []interface{}, resourceVersion string) error {
	for _, item := range items {
		ps.Add(item)
	}
	return nil
}

// No-Op
func (ps *storeProxy[T]) List() []interface{} {
	klog.Warning("Unexpected call: List() on resource store proxy")
	return nil
}

// No-Op
func (ps *storeProxy[T]) ListKeys() []string {
	klog.Warning("Unexpected call: ListKeys() on resource store proxy")
	return nil
}

// No-Op
func (ps *storeProxy[T]) Get(obj interface{}) (item interface{}, exists bool, err error) {
	klog.Warning("Unexpected call: Get() on resource store proxy")
	return nil, false, nil
}

// No-Op
func (ps *storeProxy[T]) GetByKey(key string) (item interface{}, exists bool, err error) {
	klog.Warning("Unexpected call: GetByKey() on resource store proxy")
	return nil, false, nil
}

// No-Op
func (ps *storeProxy[T]) Resync() error {
	klog.Warning("Unexpected call: Resync() on resource store proxy")
	return nil
}

@mbolt35
Copy link
Contributor

mbolt35 commented May 9, 2024

Overall, removing so much functionality concerns me, as opencost is a public package that has dependents. Since we're now also providing our own flyweight model for the resources, any dependents are now limited to opencost requirements, not kubernetes. Since we're not providing any way to publically extend the watcher list or provide custom model mapping,
dependents would have to completely scrap using this.

I totally support the notation of reducing the footprint here, as the watchers and the resource objects themselves can be quite taxing on cpu and memory. Thanks for drafting this up -- would love to work with you to figure out how we can make this work. Give me some time to survey some of our dependent code's use-cases and see if I can't propose something that would push this through without being as severe of a breaking change.

Thanks again :)

@r2k1
Copy link
Contributor Author

r2k1 commented May 9, 2024

The ResourceHandler[T any] seems to achieve similar outcome.

@AjayTripathy
Copy link
Contributor

Is there any way to switch between the two implementations, possiby with an interface? That way we wouldn't have to worry about deprecating potentially used fields. Won't block this PR on that though given we can also review dependencies on our side.

@r2k1
Copy link
Contributor Author

r2k1 commented May 13, 2024

I don't see a reasonable implementation hidden behind an interface. However, depending on your use of interface{}, I would expect most potential issues to be caught during compilation time (excluding serialization/deserialization)

Signed-off-by: r2k1 <yokree@gmail.com>
r2k1 added 3 commits May 16, 2024 16:00
Signed-off-by: r2k1 <yokree@gmail.com>
Signed-off-by: r2k1 <yokree@gmail.com>
Signed-off-by: r2k1 <yokree@gmail.com>
Copy link

sonarcloud bot commented May 17, 2024

@r2k1
Copy link
Contributor Author

r2k1 commented May 17, 2024

I run some tests on a synthetic workload with 10 nodes X 250 pods each:

Before the change:
avg=171MiB max=266MiB

After the change:
avg=91MiB max=158MiB

Subtraction of 40MiB of static load (part that doesn't scale with cluster size):

Before the change:
avg=131MiB max=226MiB

After the change:
avg=51MiB max=118MiB

This is synthetic workload, with bare minimum object fields populated. Memory consumption is reduced in half.
On real cluster with complex object spec savings in memory should be more significant.

@r2k1
Copy link
Contributor Author

r2k1 commented May 17, 2024

I'll need more help to find if there are any additional fields that's being used outside of opencost.

@r2k1 r2k1 marked this pull request as ready for review May 26, 2024 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants