Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: remove keda objects during reconciliation #3676

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

converge
Copy link
Member

@converge converge commented Feb 6, 2024

Fixes #3669

Proposed Changes

Remove Keda objects during reconciliation if [configmap] config-kafka-features/controller-autoscaler-keda is set to disabled.

- remove Keda objects during reconciliation
  • early feedback welcome / will add unit tests before marking it as ready

@knative-prow knative-prow bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/control-plane labels Feb 6, 2024
Copy link

knative-prow bot commented Feb 6, 2024

Hi @converge. Thanks for your PR.

I'm waiting for a knative-extensions member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow knative-prow bot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 6, 2024
@converge
Copy link
Member Author

converge commented Feb 6, 2024

/cc @Cali0707

@knative-prow knative-prow bot requested a review from Cali0707 February 6, 2024 17:06
Copy link

codecov bot commented Feb 6, 2024

Codecov Report

Attention: Patch coverage is 31.42857% with 24 lines in your changes are missing coverage. Please review.

Project coverage is 52.40%. Comparing base (02a879b) to head (ffcae53).
Report is 118 commits behind head on main.

Files Patch % Lines
...lane/pkg/reconciler/consumergroup/consumergroup.go 31.42% 23 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3676      +/-   ##
============================================
- Coverage     61.99%   52.40%   -9.60%     
- Complexity      845      875      +30     
============================================
  Files           188      342     +154     
  Lines         12634    21449    +8815     
  Branches        273      284      +11     
============================================
+ Hits           7833    11241    +3408     
- Misses         4187     9296    +5109     
- Partials        614      912     +298     
Flag Coverage Δ
java-unittests 74.16% <ø> (-0.30%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

//
// When Keda creates a scaled object, it also generates a Horizontal Pod Autoscaler. This resource will also be removed
// when the scaled object is deleted.
func (r *Reconciler) deleteKedaObjects(ctx context.Context) error {
Copy link
Member

@Cali0707 Cali0707 Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this will delete all KEDA scaled objects and trigger authentications in the cluster, not just those created by the kafka-controller. Maybe we could leverage the fact that the name of the scaledobject (in keda.GenerateScaledObject) and trigger authentication (in keda.GenerateTriggerAuthentication) for each consumergroup is deterministic and we can generate it each time?

Then, since the ReconcileKind function will be called once per consumergroup when we update the configmap, we could:

  1. Get the name of the scaledobject associated with the consumergroup and then delete it
  2. Get the name of the trigger authentication associated with the consumergroup and then delete it

This way, we won't delete any extra KEDA resources

Copy link
Member

@pierDipi pierDipi Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to that there is also a method we could leverage metav1.IsControlledBy(<keda-object>, <consumergroup>) as we set the consumer group to be the controller of keda objects here:

OwnerReferences: []metav1.OwnerReference{
*kmeta.NewControllerRef(cg),
},

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds like a good idea. I tried to follow that on my last commit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pierDipi sorry, I got your message just after I hit comment/send. could you double check my last commit? not sure if we still need the metav1.IsControlledBy(<keda-object>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use that function as it's a more reliable and readable than the check we're anyway implemented here

	var triggerAuthName string
	if len(cg.ObjectMeta.OwnerReferences) > 0 && cg.ObjectMeta.OwnerReferences[0].UID != "" {
		triggerAuthName = string(cg.ObjectMeta.OwnerReferences[0].UID)
	}
	// skip deleting trigger authentication if object reference doesn't exist
	if triggerAuthName == "" {
		return nil
	}

For example, it's not necessarily guaranteed that first item is our object (even though it's the case currently) it would be more future proof to loom owner references

Comment on lines 861 to 873
_, err := r.KedaClient.KedaV1alpha1().ScaledObjects(cg.Namespace).Get(ctx, scaledObjectName, metav1.GetOptions{})
if err != nil && !apierrors.IsNotFound(err) {
return fmt.Errorf("failed to get Keda scaled object: %w", err)
}
if !apierrors.IsNotFound(err) {
err = r.KedaClient.KedaV1alpha1().ScaledObjects(cg.Namespace).Delete(ctx, scaledObjectName, metav1.DeleteOptions{})
if err != nil {
return fmt.Errorf("failed to delete Keda scaled object: %w", err)
}
logging.FromContext(ctx).Debugw("Keda scaled object deleted",
"name", scaledObjectName,
"namespace", cg.Namespace)
}
Copy link
Member

@pierDipi pierDipi Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of GET and then DELETE, we can just call DELETE and ignore "404", also because it's not guaranteed that DELETE will not return 404 (not found) as anything can happen between GET and DELETE

Suggested change
_, err := r.KedaClient.KedaV1alpha1().ScaledObjects(cg.Namespace).Get(ctx, scaledObjectName, metav1.GetOptions{})
if err != nil && !apierrors.IsNotFound(err) {
return fmt.Errorf("failed to get Keda scaled object: %w", err)
}
if !apierrors.IsNotFound(err) {
err = r.KedaClient.KedaV1alpha1().ScaledObjects(cg.Namespace).Delete(ctx, scaledObjectName, metav1.DeleteOptions{})
if err != nil {
return fmt.Errorf("failed to delete Keda scaled object: %w", err)
}
logging.FromContext(ctx).Debugw("Keda scaled object deleted",
"name", scaledObjectName,
"namespace", cg.Namespace)
}
err := r.KedaClient.KedaV1alpha1().ScaledObjects(cg.Namespace).Delete(ctx, scaledObjectName, metav1.DeleteOptions{})
if err != nil && !apierrors.IsNotFound(err) {
return fmt.Errorf("failed to get Keda scaled object: %w", err)
}
logging.FromContext(ctx).Debugw("Keda scaled object deleted", "name", scaledObjectName, "namespace", cg.Namespace)

I don't know if the suggestion is properly formatted, it's mostly for describing the idea better

Comment on lines 885 to 897
_, err = r.KedaClient.KedaV1alpha1().TriggerAuthentications(cg.Namespace).Get(ctx, triggerAuthName, metav1.GetOptions{})
if err != nil && !apierrors.IsNotFound(err) {
return fmt.Errorf("failed to get Keda trigger authenticatoin")
}
if !apierrors.IsNotFound(err) {
err = r.KedaClient.KedaV1alpha1().TriggerAuthentications(cg.Namespace).Delete(ctx, triggerAuthName, metav1.DeleteOptions{})
if err != nil {
return fmt.Errorf("failed to delete Keda trigger authentication: %w", err)
}
logging.FromContext(ctx).Debugw("Keda trigger authentication deleted",
"name", triggerAuthName,
"namespace", cg.Namespace)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar idea here, we can just call Delete

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pierDipi, @Cali0707 that's nice, so we don't need to care so much about validation before deleting. I have just pushed a new commit with the proposed changes.

@@ -195,6 +195,9 @@ func (r *Reconciler) ReconcileKind(ctx context.Context, cg *kafkainternals.Consu
} else {
// If KEDA is not installed or autoscaler feature disabled, do nothing
cg.MarkAutoscalerDisabled()
if err := r.deleteKedaObjects(ctx); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err := r.deleteKedaObjects(ctx); err != nil {
if err := r.deleteKedaObjects(ctx, cg); err != nil {

Copy link
Member

@Cali0707 Cali0707 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@converge I think this set of changes looks good to me, would you be able to work on updating the unit tests so that they pass with the new changes?

If you want help with that (our unit tests can take some getting used to 😆 ), feel free to ping me on slack!

@converge
Copy link
Member Author

@converge I think this set of changes looks good to me, would you be able to work on updating the unit tests so that they pass with the new changes?

If you want help with that (our unit tests can take some getting used to 😆 ), feel free to ping me on slack!

thanks xD
I pushed one commit to fix the unit test. are the unit test auto generated somehow? should I write more unit tests?

@converge converge marked this pull request as ready for review February 10, 2024 20:46
@knative-prow knative-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 10, 2024
@pierDipi
Copy link
Member

/ok-to-test

@knative-prow knative-prow bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 14, 2024
@converge
Copy link
Member Author

/retest

@converge
Copy link
Member Author

/retest

3 similar comments
@converge
Copy link
Member Author

/retest

@converge
Copy link
Member Author

/retest

@converge
Copy link
Member Author

/retest

Comment on lines 883 to 889
err = r.KedaClient.KedaV1alpha1().ScaledObjects(cg.Namespace).Delete(ctx, scaledObjectName, metav1.DeleteOptions{})
if err != nil && !apierrors.IsNotFound(err) {
return fmt.Errorf("failed to delete Keda scaled object: %w", err)
}
if err == nil {
logging.FromContext(ctx).Infoln("Keda scaled object deleted")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@converge sorry I missed this earlier, but this should also be in the if metav1.IsControlledBy() statement, as we don't want to delete the scaled object if we don't control it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@converge sorry I missed this earlier, but this should also be in the if metav1.IsControlledBy() statement, as we don't want to delete the scaled object if we don't control it

make sense! I have just updated, tks!

Copy link
Member

@Cali0707 Cali0707 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thanks for all your persistence on this @converge 🎉

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Feb 28, 2024
Copy link

knative-prow bot commented Feb 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Cali0707, converge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 28, 2024
@converge
Copy link
Member Author

/retest

2 similar comments
@converge
Copy link
Member Author

/retest

@converge
Copy link
Member Author

converge commented Mar 1, 2024

/retest

@pierDipi
Copy link
Member

pierDipi commented Apr 3, 2024

/retest-required

@pierDipi
Copy link
Member

pierDipi commented Apr 4, 2024

/retest

@pierDipi
Copy link
Member

/retest-required

@Cali0707
Copy link
Member

Cali0707 commented May 1, 2024

/test reconciler-tests-namespaced-broker

1 similar comment
@pierDipi
Copy link
Member

pierDipi commented May 9, 2024

/test reconciler-tests-namespaced-broker

@pierDipi
Copy link
Member

/retest-required

@pierDipi
Copy link
Member

/test reconciler-tests-namespaced-broker

Copy link

knative-prow bot commented May 15, 2024

@converge: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
reconciler-tests-namespaced-broker_eventing-kafka-broker_main ffcae53 link true /test reconciler-tests-namespaced-broker

Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

disabling Keda scale doesn't take effect
3 participants