Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an Internal Load Balancer if configured #1222

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bfournie
Copy link
Contributor

@bfournie bfournie commented May 1, 2024

What type of PR is this?
/kind feature

What this PR does / why we need it:
Provide the ability to configure the types of Load Balancers to be created (Internal and/or External). By default, an External Proxy Load Balancer will be created per the current implementation. If set for an Internal Load Balancer, an Internal Passthrough Load Balancer will be created using resources in the specified region.

Which issue(s) this PR fixes :
Fixes #1221

Special notes for your reviewer:

TODOs:

  • [x ] squashed commits
  • [x ] includes documentation
  • [ x] adds unit tests

Release note:
-->

Added a new `LoadBalancerTypes` field to `LoadBalancerSpec` that defines the type of Load Balancers (External or Internal) which should be created. By default a Global External Proxy Load Balancer will be created as is currently implemented. 
Also added an `InternalLoadBalancer` field to optionally configure the name and subnet of the Internal Load Balancer.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 1, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @bfournie. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 1, 2024
@k8s-ci-robot k8s-ci-robot requested a review from cpanato May 1, 2024 15:42
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bfournie
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from dims May 1, 2024 15:42
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 1, 2024
Copy link

netlify bot commented May 1, 2024

Deploy Preview for kubernetes-sigs-cluster-api-gcp ready!

Name Link
🔨 Latest commit eb4a0ad
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-gcp/deploys/664f4e5274ebca000856f350
😎 Deploy Preview https://deploy-preview-1222--kubernetes-sigs-cluster-api-gcp.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 1, 2024
Copy link
Member

@cpanato cpanato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 2, 2024
@bfournie bfournie force-pushed the internal-load-balancer branch 3 times, most recently from 2d101b7 to e312d9c Compare May 2, 2024 20:19
// APIInternalAddress is the IPV4 regional address assigned to the
// internal Load Balancer.
// +optional
APIInternalAddress *string `json:"apiInternalIpAddress,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is a breaking change, but should we consider a structure to hold these values? The added values are the same as those for the api server, so a lot of duplicated values appear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason I didn't do that was because it is a breaking change, I tried to keep the same API for consistency and just add the additional fields.

If the consensus is that its OK to change this API I can make it an array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense as we are adding this to add as a struct

wdyt @damdo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, how would we handle the API in this case though? Would the existing fields - APIServerAddress, APIServerHealthCheck etc need to be marked as deprecated but still handled for a few releases until the deprecation is complete? It seems like it will add a lot of complexity to this API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with @bfournie here.
I'd probably stick with a non-breaking change, and then schedule a full API re-review for a new v1beta2 where we can improve the situation and plan the breaking changes we want to make to the API to clean it up.

api/v1beta1/types.go Outdated Show resolved Hide resolved
api/v1beta1/types.go Outdated Show resolved Hide resolved
cloud/scope/cluster.go Outdated Show resolved Hide resolved
@damdo
Copy link
Member

damdo commented May 9, 2024

cc. @nrb

@bfournie
Copy link
Contributor Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 14, 2024
@bfournie bfournie force-pushed the internal-load-balancer branch 4 times, most recently from 6db2752 to f5670bf Compare May 15, 2024 19:26
bfournie added a commit to bfournie/installer that referenced this pull request May 15, 2024
Remove the configuration of the GCP Internal Load Balancer from the
installer as its now being done in the CAPG provider. This allows
only the Internal LB to be created to support Private Clusters.

This depends on kubernetes-sigs/cluster-api-provider-gcp#1222
which is still under review.
@bfournie
Copy link
Contributor Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 16, 2024
@cpanato
Copy link
Member

cpanato commented May 20, 2024

/test ls

@k8s-ci-robot
Copy link
Contributor

@cpanato: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-provider-gcp-build
  • /test pull-cluster-api-provider-gcp-e2e-test
  • /test pull-cluster-api-provider-gcp-make
  • /test pull-cluster-api-provider-gcp-test
  • /test pull-cluster-api-provider-gcp-verify

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-provider-gcp-apidiff
  • /test pull-cluster-api-provider-gcp-capi-e2e
  • /test pull-cluster-api-provider-gcp-conformance
  • /test pull-cluster-api-provider-gcp-conformance-ci-artifacts
  • /test pull-cluster-api-provider-gcp-coverage
  • /test pull-cluster-api-provider-gcp-e2e-workload-upgrade

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-provider-gcp-apidiff
  • pull-cluster-api-provider-gcp-build
  • pull-cluster-api-provider-gcp-e2e-test
  • pull-cluster-api-provider-gcp-make
  • pull-cluster-api-provider-gcp-test
  • pull-cluster-api-provider-gcp-verify

In response to this:

/test ls

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@cpanato
Copy link
Member

cpanato commented May 20, 2024

/test pull-cluster-api-provider-gcp-conformance

@jwmay2012
Copy link

jwmay2012 commented May 20, 2024

Thanks for working on this @bfournie. We would really like this as well!

I pulled this down and when running it with INTERNAL_ONLY it seems to get stuck on deletion.
It was only deleting global resources like s.forwardingrules.Delete but the existing resources were created as regionalforwardingrules.

When I added stuff like

if err := s.regionalforwardingrules.Delete(ctx, key); err != nil && !gcperrors.IsNotFound(err) {
	log.Error(err, "Error updating a regional forwardingrule", "name", spec.Name)
	return err
}

to loadbalancers/reconcile.go functions

  • deleteForwardingRule()
  • deleteBackendService()
  • deleteHealthCheck()

I was then able to reliably get cluster deletion to work without manual intervention or getting stuck.

I'm also noticing the Internal Passthrough LB is getting created with 6443 as the kube api endpoint, but the kubeconfig generated is expecting 443 and never finishes setting up the cluster. I think I understand that passthrough LBs don't allow port mapping so would the solution be to either?--

  • change the kubeconfig port from 443 to 6443
  • change the kube api service and LB from 6443 to 443

I haven't found the code to change the generated kubeconfig port yet, but I think ClusterNetwork.APIServerPort is where I can change the kube api port.
Not sure what the plans are here, but I'll continue to play round with it.

// LoadBalancerType defines the type of Load Balancer that should be created.
// If not set, a Global External Proxy Load Balancer will be created by default.
// +optional
LoadBalancerType *LoadBalancerType `json:"loadBalancerType,omitempty"`
Copy link
Contributor Author

@bfournie bfournie May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought about but felt that type of API presents too much ambiguity and potential for error. For example with that type of array you could have the following potential combinations, all of which are in error:

  • empty (no LBs defined)
  • INTERNAL/INTERNAL
  • EXTERNAL/EXTERNAL

Explicitly defining the Load Balancer configuration options seems more useful.

Name *string `json:"name,omitempty"`

// Subnet is the name of the subnet to use for a regional Load Balancer.
Subnet *string `json:"subnet,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the Internal Load Balancer be created without a subnet? Afaik, it can't. So, omitempty is not a valid value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it must be created with a subnet. Currently it uses the first subnet if one is not defined here, so setting it in the API is optional.
https://github.com/kubernetes-sigs/cluster-api-provider-gcp/pull/1222/files#diff-a7274316ed5d60f25cc61e1bcb771a612c3566832a59ecc3a13b279a849c95ebR728-R730

I can change this to require it to be explicitly defined if that seems like the best direction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it must be created with a subnet. Currently it uses the first subnet if one is not defined here, so setting it in the API is optional.
https://github.com/kubernetes-sigs/cluster-api-provider-gcp/pull/1222/files#diff-a7274316ed5d60f25cc61e1bcb771a612c3566832a59ecc3a13b279a849c95ebR728-R730

I'd explicitly write this in the comment for this field.

@bfournie
Copy link
Contributor Author

Thanks for working on this @bfournie. We would really like this as well!

I pulled this down and when running it with INTERNAL_ONLY it seems to get stuck on deletion. It was only deleting global resources like s.forwardingrules.Delete but the existing resources were created as regionalforwardingrules.

When I added stuff like

if err := s.regionalforwardingrules.Delete(ctx, key); err != nil && !gcperrors.IsNotFound(err) {
	log.Error(err, "Error updating a regional forwardingrule", "name", spec.Name)
	return err
}

to loadbalancers/reconcile.go functions

  • deleteForwardingRule()
  • deleteBackendService()
  • deleteHealthCheck()

I was then able to reliably get cluster deletion to work without manual intervention or getting stuck.

I'm also noticing the Internal Passthrough LB is getting created with 6443 as the kube api endpoint, but the kubeconfig generated is expecting 443 and never finishes setting up the cluster. I think I understand that passthrough LBs don't allow port mapping so would the solution be to either?--

  • change the kubeconfig port from 443 to 6443
  • change the kube api service and LB from 6443 to 443

I haven't found the code to change the generated kubeconfig port yet, but I think ClusterNetwork.APIServerPort is where I can change the kube api port. Not sure what the plans are here, but I'll continue to play round with it.

Thank you for testing this! I wasn't seeing an error on deletion but agree that the regional interfaces were not being explicitly called, so I've changed this to do explicit deletions on the regional resources.

Regarding the port being used, the ControlPlaneEndpoint should be set to 443 if not overridden by APIServerPort https://github.com/kubernetes-sigs/cluster-api-provider-gcp/blob/main/cloud/scope/cluster.go#L148-L154 and this should be used in the Internal only case here https://github.com/kubernetes-sigs/cluster-api-provider-gcp/pull/1222/files#diff-a7274316ed5d60f25cc61e1bcb771a612c3566832a59ecc3a13b279a849c95ebR235-R240 but perhaps its not. I will investigate this.

Provide the ability to configure the types of Load Balancers
to be created (Internal and/or External). By default, an External
Proxy Load Balancer will be created per the current implementation.
If set for an Internal Load Balancer, an Internal Passthrough
Load Balancer will be created using resources in the specified
region.
@k8s-ci-robot
Copy link
Contributor

@bfournie: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-gcp-apidiff eb4a0ad link false /test pull-cluster-api-provider-gcp-apidiff

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 23, 2024
Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bfournie, great effort with this feature!
I left my review, most are nitpicks and some questions.

Also were you planning on writing E2E tests for this?
I think this is a big enough feature that could warrant that.

Comment on lines +141 to +152
// ExternalLoadBalancerOnly creates a Global External Proxy Load Balancer
// to manage traffic to backends in multiple regions. This is the default Load
// Balancer and will be created if no LoadBalancerType is defined.
ExternalLoadBalancerOnly = LoadBalancerType("EXTERNAL_ONLY")

// InternalLoadBalancerOnly creates a Regional Internal Passthrough Load
// Balancer to manage traffic to backends in the configured region.
InternalLoadBalancerOnly = LoadBalancerType("INTERNAL_ONLY")

// DualLoadBalancer creates both External and Internal Load Balancers to provide
// separate endpoints for managing both external and internal traffic.
DualLoadBalancer = LoadBalancerType("Dual")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd try and stick to values that are following the API Conventions:

values should be PascalCase and should be equivalent to the camelCase field name (json tag)

So I'd drop the _ONLY (IMO it is already clear without it) and change them to:

External, Internal, InternalExternal (Dual could be ambiguous if we add a third option or different combination in the future),

// will be used. For an Internal Load Balancer service the default
// name is "api-internal".
// +kubebuilder:validation:Optional
// +kubebuilder:validation:MaxLength=32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a limitation on the AWS side?

Name *string `json:"name,omitempty"`

// Subnet is the name of the subnet to use for a regional Load Balancer.
Subnet *string `json:"subnet,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it must be created with a subnet. Currently it uses the first subnet if one is not defined here, so setting it in the API is optional.
https://github.com/kubernetes-sigs/cluster-api-provider-gcp/pull/1222/files#diff-a7274316ed5d60f25cc61e1bcb771a612c3566832a59ecc3a13b279a849c95ebR728-R730

I'd explicitly write this in the comment for this field.

Comment on lines +88 to +107

// APIInternalAddress is the IPV4 regional address assigned to the
// internal Load Balancer.
// +optional
APIInternalAddress *string `json:"apiInternalIpAddress,omitempty"`

// APIInternalHealthCheck is the full reference to the health check
// created for the internal Load Balancer.
// +optional
APIInternalHealthCheck *string `json:"apiInternalHealthCheck,omitempty"`

// APIInternalBackendService is the full reference to the backend service
// created for the internal Load Balancer.
// +optional
APIInternalBackendService *string `json:"apiInternalBackendService,omitempty"`

// APIInternalForwardingRule is the full reference to the forwarding rule
// created for the internal Load Balancer.
// +optional
APIInternalForwardingRule *string `json:"apiInternalForwardingRule,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these fields all required when the Internal Load Balancer is set?
If not are they defaulted to something in code?

Comment on lines +32 to +46
// loadBalancingMode describes the load balancing mode that the backend performs.
type loadBalancingMode string

const (
// Utilization determines how the traffic load is spread based on the
// utilization of instances.
loadBalancingModeUtilization = loadBalancingMode("UTILIZATION")

// Connection determines how the traffic load is spread based on the
// total number of connections that a backend can handle. This is
// only mode available for passthrough Load Balancers.
loadBalancingModeConnection = loadBalancingMode("CONNECTION")

loadBalanceTrafficInternal = "INTERNAL"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here let's adhere to the PascalCase conventions

Comment on lines +120 to +143
if err := s.deleteForwardingRule(ctx, name); err != nil {
return err
}
s.scope.Network().APIServerForwardingRule = nil

if err := s.deleteAddress(ctx, name); err != nil {
return err
}
s.scope.Network().APIServerAddress = nil

if err := s.deleteTargetTCPProxy(ctx); err != nil {
return err
}
s.scope.Network().APIServerTargetProxy = nil

if err := s.deleteBackendService(ctx, name); err != nil {
return err
}
s.scope.Network().APIServerBackendService = nil

if err := s.deleteHealthCheck(ctx, name); err != nil {
return err
}
s.scope.Network().APIServerHealthCheck = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add an fmt.Errorf( wrap for all these errors for easier debugging when following the log error chain

Comment on lines +152 to +169
return err
}
s.scope.Network().APIInternalForwardingRule = nil

if err := s.deleteInternalAddress(ctx, name); err != nil {
return err
}
s.scope.Network().APIInternalAddress = nil

if err := s.deleteRegionalBackendService(ctx, name); err != nil {
return err
}
s.scope.Network().APIInternalBackendService = nil

if err := s.deleteRegionalHealthCheck(ctx, name); err != nil {
return err
}
s.scope.Network().APIInternalHealthCheck = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error wrapping here too

})
}
if mode == loadBalancingModeConnection {
be.MaxConnections = int64(2 ^ 32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment specifying where this value is coming from?

spec.Region = s.scope.Region()
spec.BackendService = backendSvc.SelfLink
// Ports are used instead or PortRange for passthrough Load Balancer
spec.Ports = []string{"6443", "22623"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are these values coming from?

return nil, err
}
// Return subnet that matches configuration or first one if not configured
if cfgSubnet == "" || strings.HasSuffix(subnet.Name, cfgSubnet) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for using HasSuffix instead of a full match?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow for creation of additional load balancer for internal traffic
8 participants