New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implements Gateway Controller #3805
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3805 +/- ##
==========================================
- Coverage 75.81% 74.72% -1.10%
==========================================
Files 107 110 +3
Lines 7340 7537 +197
==========================================
+ Hits 5565 5632 +67
- Misses 1654 1780 +126
- Partials 121 125 +4
|
6b45a4e
to
83d4841
Compare
#3812 created to track the need for gateway controller integration tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly looking pretty good, but I think that we have to use the Scheduled
Condition, not just remove it. Scheduled
means that the resource is valid, has been accepted for processing, and infrastructure is being created. Ready
means that infrastructure is created and ready to route traffic.
Obviously, until we are handling Envoy management, we can't guarantee Ready
, but we should still be updating Scheduled
. I guess the upstream docs could use some improvement to make this flow clearer?
internal/controller/gateway.go
Outdated
// The gateway is safe to process, so check if it's valid. | ||
errs := validation.ValidateGateway(ctx, r.client, gw) | ||
if errs != nil { | ||
r.log.WithField("name", request.Name).WithField("namespace", request.Namespace). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also be setting the Scheduled
Condition to status: false
here, with a NotReconciled
Reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Gateway has been reconciled, but no infra is being provisioned since it's invalid. I will set Scheduled=false
and use an InvalidGateway
reason with the message indicating why the Gateway is invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it work if not provisioning infra? Need to keep that use-case in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not provisioning infra, Scheduled
is roughly equivalent to the Valid
condition we use on HTTPProxy, it indicates that whether the object is syntactically valid. Should this be more than one Condition? Possibly, but I wasn't able to make that argument convincingly enough upstream.
internal/controller/gateway.go
Outdated
r.log.WithField("name", request.Name).WithField("namespace", request.Namespace). | ||
Error("invalid gateway: ", int_errors.ParseFieldErrors(errs)) | ||
} else { | ||
// TODO: Finalize the gateway when managed infrastructure is added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should set Scheduled
to status: true
at this point, with a Reason of Scheduled
, I think? Reference is https://github.com/kubernetes-sigs/gateway-api/blob/16876c7362d58cae211e5599a6ef939ed1b859e9/apis/v1alpha2/gateway_types.go#L574-L614
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently status.SyncGateway()
sets Ready=true
, but I'm updating that to Scheduled=true
since the controller is not provisioning Envoy infra yet. I have also updated ensureClassFinalizer()
to addClassFinalizer()
so the func does not update the object. The object's status will now be sync'ed before being updated on the API server.
internal/controller/gateway.go
Outdated
// TODO: Ensure the gateway by creating manage infrastructure, i.e. the Envoy service. | ||
} | ||
|
||
if err := status.SyncGateway(ctx, r.client, gw, errs); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this status Sync pattern common for controller-runtime controllers? In any case, we need to ensure that the Scheduled
Condition is populated correctly (it signals that Contour has accepted the Gateway for processing and, eventually, started the process of creating infrastructure). We should add Ready
with status: false
when we add scheduled
.
Then, Ready
can be set to status: true
when the infrastructure is provisioned.
I think we may need to change SyncGateway()
to update Scheduled
first maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this status Sync pattern common for controller-runtime controllers?
It's a pattern I have seen with other controllers, essentially reconcile the object and calculate status before updating the object on the API server.
I think we may need to change SyncGateway() to update Scheduled first maybe?
Ack. I was still mentally operating in the operator world where the infra is provisioned. I have updated computeGatewayReadyCondition()
to computeGatewayScheduledCondition()
to manage the "Scheduled" condition type. When Envoy infra management is added, we can re-introduce computeGatewayReadyCondition()
to calculate the "Ready" condition type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we manage setting status from the result of the DAG build vs the controller? Seems like we need to move the status setting to the DAG processor like it is today, and leave the gateway controller to manage the Envoy instances (if desired).
internal/validation/gateway.go
Outdated
} | ||
|
||
// validateGatewayNamespace validates whether other gateways exist in the namespace of gw. | ||
func validateGatewayNamespace(ctx context.Context, cli client.Client, gw *gatewayapi_v1alpha1.Gateway, path *field.Path) field.ErrorList { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now all validation is done in the DAG builder processor. How do we manage validation here on top of validation there? There are a bunch of bits that could be pulled out and pushed upstream, however, other things, like specific configuration items can't have generic validation and need to set status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO validation is one of the first steps that a controller should perform on the object it reconciles. If utilizing the controller-runtime caches is the long-term direction, I think it's best to not add any more functionality to the DAG. As-is, the DAG will continue to perform validation, status mgt, etc. for xRoutes and Gateway listeners. The controller-runtime controllers will do the same for the objects they reconcile. When the xRoute controllers are implemented, we migrate the validation, status mgt, etc. from the Gateway API DAG processor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't go down that road right now, I'd let the DAG builder validate its stuff and keep everything together.
To me, this new controller should just manage Envoy instances, everything else should stay consistent as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But don't let me distract, if other's don't agree then go with that. =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good chance to start pulling functionality out of the DAG, and decomposing it a bit. I think of Contour as having three stages: Ingestion of objects into the DAG, the DAG run, and then export to Envoy. I think that it's okay to have some validation at the ingestion stage, as long as we keep in mind what that means for the DAG. It's just a data model, and if we can split complexity out of it, it will be easier to code and maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to either process a full dag to find some of the validation errors. One simple one is a missing/invalid service reference should set status on the Gateway. Probably for now we'll need to have the Gateway Controller get feedback from the result of a dag run and see if errors exist that it cares about to set proper status.
@stevesloka I thought we discussed moving away from the DAG and start utilizing the runtime controllers cache? @skriss did you mention that you were going to explore this in more detail? If the intent is to transition to the runtime cache, I thought it makes sense to have the controller manage GatewayClass/Gateway status and migrate the DAG's existing xRoute status management to xRoute controllers in the future. |
The current DAG build is critical to how Contour works. Moving away from that is a core fundamental change to Contour. It's doable but needs a ton of work + overhaul to the core internals. We talked I think about using the controller cache as opposed to our current custom caching, but still, the DAG needs to stay. |
fac6fbd
to
dcac033
Compare
I agree that the DAG needs to stay, but I think that filtering objects before they go in is pretty reasonable (which is what pre-DAG validation logic will do). |
We also need to be careful to make sure if a cluster is running and something causes the Gateway to become invalid for one specific reason (say an invalid service), that we still keep the Gateway in the Contour cache so that we don't break all other routes that would be valid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small things, but overall I think this is okay. I'll leave to @stevesloka to give the final approval.
} | ||
|
||
// See if the referenced gatewayclass is admitted. | ||
gcAdmitted := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this needs to be here because we don't update status on admission? If so, could you put a comment to that effect, maybe with a TODO to do something about it?
if p.source.gateway == nil { | ||
p.Error("Gateway is not defined!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the error should say something like "No gateway pointing to the GatewayClass whatever could be found"?
entries: make(map[string]map[types.NamespacedName]CacheEntry), | ||
proxyUpdates: make(map[types.NamespacedName]*ProxyUpdate), | ||
gatewayRef: gateway, | ||
gatewayUpdates: make(map[types.NamespacedName]*GatewayConditionsUpdate), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base Gateway status is actually pretty simple, but the Listener status is not. It's where all the Route stuff shows up, currently, so I think that having a way to distinguish between those is good.
entries: make(map[string]map[types.NamespacedName]CacheEntry), | ||
proxyUpdates: make(map[types.NamespacedName]*ProxyUpdate), | ||
gatewayRef: gateway, | ||
gatewayUpdates: make(map[types.NamespacedName]*GatewayConditionsUpdate), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should get this working and come back for code duplication later.
@youngnick that is the gatewayclass referenced by the gateway. If a gateway references a gatewayclass that is not admitted, then the gateway is considered invalid and the appropriate status condition is surfaced. |
e2e tests should consider appending a UID to gateway names for each test to avoid this issue. Gateways/GatewayClasses will use finalizers so deletion may take time, especially as Contour manages Envoy infra. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm to get this in and unblock a bunch of other issues once we get CI passing.
The plan is to move to having each test create a Gateway in its own namespace, to avoid test pollution like this. Requires #3803 to be able to easily update Contour config to point to the right gateway. |
Only failing CI is the codecov one, I think this is good to merge @danehans. |
Signed-off-by: Daneyon Hansen <daneyonhansen@gmail.com>
Signed-off-by: Daneyon Hansen <daneyonhansen@gmail.com>
Signed-off-by: Daneyon Hansen <daneyonhansen@gmail.com>
Signed-off-by: Daneyon Hansen <daneyonhansen@gmail.com>
Signed-off-by: Daneyon Hansen <daneyonhansen@gmail.com>
Signed-off-by: Daneyon Hansen <daneyonhansen@gmail.com>
Signed-off-by: Daneyon Hansen <daneyonhansen@gmail.com>
Signed-off-by: Daneyon Hansen <daneyonhansen@gmail.com>
Signed-off-by: Daneyon Hansen <daneyonhansen@gmail.com>
Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: Steve Sloka <slokas@vmware.com>
Signed-off-by: Daneyon Hansen <daneyonhansen@gmail.com>
Looks like we've got some merge conflicts; @stevesloka were you going to take this one over? |
Yup I can figure this one out @skriss. |
@stevesloka if you can clear the merge conflicts and CI is still passing, then I think this one is good to go for v1.17? |
@youngnick nope there's an issue with finalizers. I'd like to pick this apart into pieces so we can make sure to test everything and find the flake. Personally, I'd delay the release, but need to chat about that. Otherwise would land in 1.18. |
Implemented in #3858, closing and will follow the new PR for updates. |
Adds an initial Gateway controller implementation.
cmd/contour/serve.go
: UpdatesNewGatewayController()
to accept a gatewayclass controller string.ControllerName
is now a required param, so the nil check is removed.examples/*
: Updates examples for Gateway support. Note that gateway/gatewayclass RBAC added the 'update' verb to support adding/removing the finalizer.internal/controller/gateway.go
: Adds a controller to reconcile Gateway objects owned by Contour. The controller validates, finalizes the referenced gatewayclass (if needed), and updates the status of managedGateway
objects.internal/gateway/*
: Adds functions for managing gateway objects.internal/k8s/informers.go
: Updates RBAC for managing status of Gateway objects.internal/slice/*
: Introduces a slice package for adding/removing a string from a []string.internal/status/*
: Adds support for managing Gateway conditions.internal/validation/*
: Adds initial support for validating Gateway objects.internal/validation/gatewayclass_test.go
Changes the scope of existing variables to support Gateway tests.pkg/config/parameters_test.go
UpdatescontrollerName
from optional to required.test/scripts/install-contour-working.sh
: Adds the controller name of the test gatewayclass to the contour config.Signed-off-by: Daneyon Hansen daneyonhansen@gmail.com