Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add two feature gates : Image based deployment and additional runtimeClasses #394

Closed
wants to merge 16 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
55 changes: 41 additions & 14 deletions config/samples/featuregates.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,46 @@ metadata:
name: osc-feature-gates
namespace: openshift-sandboxed-containers-operator
data:
# timeTravel allows navigating through cluster states across time.
# It is useful for scenarios where you want to view historical data or
# predict future states based on current trends. Default is "false".
# timeTravel: "false"
# OpenShift supports image layering, that allows deployment of packages by
# providing a custom RHCOS image. LayeredImageDeployment feature enables the
# support for Kata artifacts deployment using RHCOS image layer

# quantumEntanglementSync enables instant state consistency across clusters
# using principles of quantum mechanics. This advanced feature ensures
# data is synchronized across different locations without traditional
# network latency. Default is "false".
# quantumEntanglementSync: "false"
# Additionally you'll need to create a configmap named layeredimagedeployment-config to provide relevant
# configurations for this feature. Check the example yaml later on in this file

# autoHealingWithAI employs artificial intelligence to automatically
# detect and resolve cluster issues. It leverages machine learning algorithms
# to predict potential problems before they occur, ensuring high availability.
# Default is "true".
# autoHealingWithAI: "true"
#LayeredImageDeployment: "false"

# AdditionalRuntimeClasses feature is enables creating additional runtimeClasses
# This feature typically will be used with other features.
# For example this feature can be used with the layered image based deployment feature as well.
# The layered image based deployment might have configuration for multiple
# runtimeclasses which is then exposed to users as RuntimeClass objects
# created by OSC operator when this feature gate is enabled.

# Additionally you'll need to create a configmap named additionalruntimeclasses-config to provide relevant
# configurations for this feature. Check the example yaml later on in this file

#AdditionalRuntimeClasses: "false"

---
apiVersion: v1
kind: ConfigMap
metadata:
name: layeredimagedeployment-config
namespace: openshift-sandboxed-containers-operator
data:
# Provide the configurations for the LayeredImageDeployment feature
# osImageURL is mandatory and should be a valid URL containing the RHCOS layered image
#osImageURL: "quay.io/...."
#kernelArguments: "a=b c=d ..."

---
apiVersion: v1
kind: ConfigMap
metadata:
name: additionalruntimeclasses-config
namespace: openshift-sandboxed-containers-operator
data:
# Use any one of the formats below to provide the runtime class configurations
#runtimeClassConfig: "name1:cpuOverHead1:memOverHead1, name2:cpuOverHead2:memOverHead2"
#runtimeClassConfig: "name1, name2"
74 changes: 74 additions & 0 deletions controllers/cm_event_handler.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
package controllers
bpradipt marked this conversation as resolved.
Show resolved Hide resolved

import (
"context"

"github.com/openshift/sandboxed-containers-operator/internal/featuregates"
"k8s.io/client-go/util/workqueue"
"sigs.k8s.io/controller-runtime/pkg/event"
)

type ConfigMapEventHandler struct {
reconciler *KataConfigOpenShiftReconciler
}

func (ch *ConfigMapEventHandler) Create(ctx context.Context, event event.CreateEvent, queue workqueue.RateLimitingInterface) {

if ch.reconciler.kataConfig == nil {
return
}

cm := event.Object

// Check if the configmap name is one of the feature gates configmap or
// feature config
if cm.GetNamespace() != OperatorNamespace || !featuregates.IsFeatureGateConfigMap(cm.GetName()) {
return
}
log := ch.reconciler.Log.WithName("CMCreate").WithValues("cm name", cm.GetName())
log.Info("FeatureGates configmap created")

queue.Add(ch.reconciler.makeReconcileRequest())
}

func (ch *ConfigMapEventHandler) Update(ctx context.Context, event event.UpdateEvent, queue workqueue.RateLimitingInterface) {

if ch.reconciler.kataConfig == nil {
return
}

cm := event.ObjectNew

// Check if the configmap name is one of the feature gates configmap or
// feature config
if cm.GetNamespace() != OperatorNamespace || !featuregates.IsFeatureGateConfigMap(cm.GetName()) {
return
}

log := ch.reconciler.Log.WithName("CMUpdate").WithValues("cm name", cm.GetName())
log.Info("FeatureGates configmap updated")

queue.Add(ch.reconciler.makeReconcileRequest())

}

func (ch *ConfigMapEventHandler) Delete(ctx context.Context, event event.DeleteEvent, queue workqueue.RateLimitingInterface) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we reconcile on Create I guess it would make sense to also reconcile on Delete, wouldn't it? I understand handling Create since we want to handle creation of both the feature gate master configmap and individual gated features' configmaps. With the same reasoning though, wouldn't we want handle their deletion as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you think the deletion of configmaps be handled? Everything roll backed ? Or the configmap being recreated based on known state?
My thinking was to only handle create/update and document that on deletion the behaviour is not deterministic. And handle these cases when feature gate graduates to stable and part of KataConfig.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me start by stating the wider context so that we can be sure we're on the same page.

I assume that we're only watching configmaps because we want to support modifications on-the-fly (ie. while a KataConfig exists). If we didn't we'd just sample the configmaps during installation and then no further modification would be possible.

I believe it would then make sense to support all such modifications rather than a (pretty much arbitrary) subset. I mean, we could say "you can enable a gated feature subsequently if you didn't enable it initially but once you do that you're stuck with it, no disabling possible" but does it make much sense?

I'm not sure what you mean when you mention non-deterministic behaviour. The feature gates are booleans so they fundamentally switch between two states, both of which necessarily need to be well defined. If you delete a configmap that enabled both images and additional runtimeclasses I imagine that the controller should deterministically respond by switching back to extension and by deleting the extra runtimeclasses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you didn't enable it initially but once you do that you're stuck with it, no disabling possible" but does it make much sense?

What we say is you disable it by turning it off in the configmap and not by deleting the configmap.

I'm not comfortable to do a complete rollback for all extensions to default values if the configmap is deleted. IOW switching to extension, removing all runtimeClasses and other features we add going forward.
Will be a heavy operation for say an accidental delete. Although one can argue that it's an admin operation and need to be done carefully.
Do you think this is how we should handle or there is an alternative?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is, do we even have a choice? Cluster admin can delete the configmap as they see fit. Our only choice that I can see is, do we handle the deletion or do we allow the cluster to stay in an inconsistent state?

It's true that depending on the gated feature set a complete rollback might be a heavy operation and it could be done in error. But the same is true of just about any KataConfig operation, including creation, modification and deletion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One approach I was thinking was to use the in-memory state of the featuregate to decide. But then it deviates from the using etcd as the source of truth. I'll do some experiments and update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In-memory state wouldn't be reliable enough, the controller could be restarted at any time. And yes, it could be rather confusing for cluster admin, too, if the controller is guided by its hidden in-memory state rather the visible cluster resources...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a k8s way to "lock" the CM once it's created? or to block its deletion (finalizers?)? would it help in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a finalizer would indeed delay deletion, I'm afraid however that it would be quite a misuse (or even abuse) of finalizer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah adding finalizer will not give us anything here imho..
Anyways I have added some changes to handle deletion.. PTAL

if ch.reconciler.kataConfig == nil {
return
}

cm := event.Object

// Check if the configmap name is one of the feature gates configmap or
// feature config
if cm.GetNamespace() != OperatorNamespace || !featuregates.IsFeatureGateConfigMap(cm.GetName()) {
return
}
log := ch.reconciler.Log.WithName("CMDelete").WithValues("cm name", cm.GetName())
log.Info("FeatureGates configmap deleted")

queue.Add(ch.reconciler.makeReconcileRequest())
}

func (ch *ConfigMapEventHandler) Generic(ctx context.Context, event event.GenericEvent, queue workqueue.RateLimitingInterface) {
}
7 changes: 5 additions & 2 deletions controllers/credentials_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"context"
"fmt"
"os"
"path/filepath"

"github.com/go-logr/logr"
v1 "github.com/openshift/cloud-credential-operator/pkg/apis/cloudcredential/v1"
Expand Down Expand Up @@ -293,7 +294,8 @@ func (kh *KataConfigHandler) getCredentialsRequest() (*v1.CredentialsRequest, er
}

fileName := fmt.Sprintf(peerpodsCredentialsRequestFileFormat, provider)
yamlData, err := readCredentialsRequestYAML(fileName)
credentialsRequestsYamlFile := filepath.Join(peerpodsCredentialsRequestsPathLocation, fileName)
yamlData, err := readYamlFile(credentialsRequestsYamlFile)
if os.IsNotExist(err) {
kh.reconciler.Log.Info("no CredentialsRequestYAML for provider", "err", err, "provider", provider)
return nil, nil
Expand Down Expand Up @@ -324,7 +326,8 @@ func (kh *KataConfigHandler) skipCredentialRequests() bool {
// check if CredentialsRequest implementation exists for the cloud provider
if provider, err := getCloudProviderFromInfra(kh.reconciler.Client); err == nil {
fileName := fmt.Sprintf(peerpodsCredentialsRequestFileFormat, provider)
if _, err := readCredentialsRequestYAML(fileName); os.IsNotExist(err) {
credentialsRequestsYamlFile := filepath.Join(peerpodsCredentialsRequestsPathLocation, fileName)
if _, err := readYamlFile(credentialsRequestsYamlFile); os.IsNotExist(err) {
kh.reconciler.Log.Info("no CredentialsRequest yaml file for provider, skipping", "provider", provider, "filename", fileName)
return true
}
Expand Down
8 changes: 6 additions & 2 deletions controllers/image_generator.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"errors"
"fmt"
"os"
"path/filepath"
"strconv"
"strings"
"sync"
Expand Down Expand Up @@ -242,7 +243,9 @@ func newImageGenerator(client client.Client) (*ImageGenerator, error) {
func (r *ImageGenerator) createJobFromFile(jobFileName string) (*batchv1.Job, error) {
igLogger.Info("Create Job out of YAML file", "jobFileName", jobFileName)

yamlData, err := readJobYAML(jobFileName)
jobYamlFile := filepath.Join(peerpodsImageJobsPathLocation, jobFileName)

yamlData, err := readYamlFile(jobYamlFile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm confused but I don't see how this is related to feature gating. (?)

if err != nil {
return nil, err
}
Expand Down Expand Up @@ -630,7 +633,8 @@ func (r *ImageGenerator) createImageConfigMapFromFile() error {
}

filename := r.provider + "-podvm-image-cm.yaml"
yamlData, err := readConfigMapYAML(filename)
configMapYamlFile := filepath.Join(peerpodsImageJobsPathLocation, filename)
yamlData, err := readYamlFile(configMapYamlFile)
if err != nil {
return err
}
Expand Down
195 changes: 195 additions & 0 deletions controllers/mc_handler.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
package controllers

/*
This code handles the installation of Kata artifacts using an RHCOS extension or image and deploying it
via MachineConfig
*/

import (
"context"
"encoding/json"
"fmt"
"strings"

ignTypes "github.com/coreos/ignition/v2/config/v3_2/types"
mcfgv1 "github.com/openshift/machine-config-operator/pkg/apis/machineconfiguration.openshift.io/v1"
"github.com/openshift/sandboxed-containers-operator/internal/featuregates"
k8serrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
)

const (
extension_mc_name = "50-enable-sandboxed-containers-extension"
image_mc_name = "50-enable-sandboxed-containers-image"
)

type DeploymentState struct {
PreviousMethod string
PreviousMcName string
}

// If the first return value is 'true' it means that the MC was just created
// by this call, 'false' means that it's already existed. As usual, the first
// return value is only valid if the second one is nil.
func (r *KataConfigOpenShiftReconciler) createMc(machinePool string) (bool, error) {

// In case we're returning an error we want to make it explicit that
// the first return value is "not care". Unfortunately golang seems
// to lack syntax for creating an expression with default bool value
// hence this work-around.
var dummy bool

/* Create Machine Config object to deploy sandboxed containers artifacts */
mcName := extension_mc_name
if r.FeatureGatesStatus[featuregates.LayeredImageDeployment] {
mcName = image_mc_name
}

r.Log.Info("Checking if MachineConfig exists", "mc.Name", mcName)
mc := &mcfgv1.MachineConfig{}
err := r.Client.Get(context.TODO(), types.NamespacedName{Name: mcName}, mc)
if err != nil && (k8serrors.IsNotFound(err) || k8serrors.IsGone(err)) {
r.Log.Info("creating new MachineConfig")
mc, err = r.newMCForCR(machinePool)
if err != nil {
return dummy, err
}

err = r.Client.Create(context.TODO(), mc)
if err != nil {
r.Log.Error(err, "Failed to create a new MachineConfig ", "mc.Name", mc.Name)
return dummy, err
}
r.Log.Info("MachineConfig successfully created", "mc.Name", mc.Name)
return true, nil
} else if err != nil {
r.Log.Info("failed to retrieve MachineConfig", "err", err)
return dummy, err
} else {
r.Log.Info("MachineConfig already exists")
return false, nil
}
}

// Create a new MachineConfig object for the Custom Resource
func (r *KataConfigOpenShiftReconciler) newMCForCR(machinePool string) (*mcfgv1.MachineConfig, error) {
r.Log.Info("Creating MachineConfig for Custom Resource")

ic := ignTypes.Config{
Ignition: ignTypes.Ignition{
Version: "3.2.0",
},
}

icb, err := json.Marshal(ic)
if err != nil {
return nil, err
}

if r.FeatureGatesStatus[featuregates.LayeredImageDeployment] {
r.Log.Info("Creating MachineConfig for Custom Resource with image")
return r.newImageMCForCR(machinePool, icb)
} else {
r.Log.Info("Creating MachineConfig for Custom Resource with extension")
return r.newExtensionMCForCR(machinePool, icb)
}

}

// Create a new MachineConfig object for the Custom Resource with the extension
func (r *KataConfigOpenShiftReconciler) newExtensionMCForCR(machinePool string, icb []byte) (*mcfgv1.MachineConfig, error) {
extension := getExtensionName()

mc := mcfgv1.MachineConfig{
TypeMeta: metav1.TypeMeta{
APIVersion: "machineconfiguration.openshift.io/v1",
Kind: "MachineConfig",
},
ObjectMeta: metav1.ObjectMeta{
Name: extension_mc_name,
Labels: map[string]string{
"machineconfiguration.openshift.io/role": machinePool,
"app": r.kataConfig.Name,
},
Namespace: OperatorNamespace,
},
Spec: mcfgv1.MachineConfigSpec{
Extensions: []string{extension},
Config: runtime.RawExtension{
Raw: icb,
},
},
}

return &mc, nil
}

// Create a new MachineConfig object for the Custom Resource with the image
func (r *KataConfigOpenShiftReconciler) newImageMCForCR(machinePool string, icb []byte) (*mcfgv1.MachineConfig, error) {

// Get the LayeredImageDeployment feature gate parameters
imageParams := r.FeatureGates.GetFeatureGateParams(context.TODO(), featuregates.LayeredImageDeployment)
// Ensure at least osImageURL is set
if _, ok := imageParams["osImageURL"]; !ok {
bpradipt marked this conversation as resolved.
Show resolved Hide resolved
return nil, fmt.Errorf("osImageURL not set in feature gate %s", featuregates.LayeredImageDeployment)
}

mc := mcfgv1.MachineConfig{
TypeMeta: metav1.TypeMeta{
APIVersion: "machineconfiguration.openshift.io/v1",
Kind: "MachineConfig",
},
ObjectMeta: metav1.ObjectMeta{
Name: image_mc_name,
Labels: map[string]string{
"machineconfiguration.openshift.io/role": machinePool,
"app": r.kataConfig.Name,
},
Namespace: OperatorNamespace,
},
Spec: mcfgv1.MachineConfigSpec{
OSImageURL: imageParams["osImageURL"],
Config: runtime.RawExtension{
Raw: icb,
},
},
}

if kernelArguments, ok := imageParams["kernelArguments"]; ok {
// Parse the kernel arguments and set them in the MachineConfig
// Note that in the configmap the kernel arguments are stored as a single string
// eg. "a=b c=d ..." and we need to split them into individual arguments
// eg ["a=b", "c=d", ...]
// Split the kernel arguments string into individual arguments
mc.Spec.KernelArguments = strings.Fields(kernelArguments)
}

return &mc, nil
}

// Delete the MachineConfig object
func (r *KataConfigOpenShiftReconciler) deleteMc(mcName string) error {
r.Log.Info("Deleting MachineConfig", "mc.Name", mcName)
mc := &mcfgv1.MachineConfig{}
err := r.Client.Get(context.TODO(), types.NamespacedName{Name: mcName}, mc)
if err != nil {
if k8serrors.IsNotFound(err) || k8serrors.IsGone(err) {
r.Log.Info("MachineConfig not found. Most likely it was already deleted.")
return nil
}
r.Log.Info("failed to retrieve MachineConfig", "err", err)
return err
}

r.Log.Info("deleting MachineConfig")
err = r.Client.Delete(context.TODO(), mc)
if err != nil {
r.Log.Error(err, "Failed to delete MachineConfig")
return err
}

r.Log.Info("MachineConfig successfully deleted")
return nil
}