Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NE-1517: Set EIP for NLB Ingress controller. #1593

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

miheer
Copy link
Contributor

@miheer miheer commented Mar 12, 2024

This EP describes how to set EIP for NLB ingress controller.

Epic link: https://issues.redhat.com/browse/NE-1274

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 12, 2024
@openshift-ci openshift-ci bot requested review from candita and frobware March 12, 2024 11:04
@miheer miheer force-pushed the eip branch 7 times, most recently from bc40070 to 4b14bb9 Compare March 22, 2024 09:46
@miheer miheer changed the title WIP: Set EIP for NLB Ingress controller. Set EIP for NLB Ingress controller. Mar 22, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 22, 2024
@miheer miheer changed the title Set EIP for NLB Ingress controller. NE-1517: Set EIP for NLB Ingress controller. Mar 22, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 22, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 22, 2024

@miheer: This pull request references NE-1517 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

This EP describes how to set EIP for NLB ingress controller.

Epic link: https://issues.redhat.com/browse/NE-1274

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@miheer miheer force-pushed the eip branch 3 times, most recently from 9e62c06 to 45ad01d Compare March 24, 2024 20:52
@candita
Copy link
Contributor

candita commented Mar 27, 2024

/assign @frobware
/assign @Miciah
/assign

Copy link
Contributor

@frobware frobware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concerns revolve around the expectation that updating EIP allocation IDs will automatically update an existing NLB. From my brief experimentation, it seems that the system only recognises these allocations, as specified by the annotation, at the time of creation. The document implies at several points that changing EIP allocation IDs is possible and that such changes will be reconciled with the existing ELB without necessitating its recreation. It would be beneficial to conduct further tests to determine if modifications to the annotation result in immediate updates on a functioning ELB.

This is a feature request to enhance the IngressController API to be able to support static IPs during
- Install time
- Custom NLB ingress controller creation
- Reconfiguration of the router.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As elastic IP allocations can only be added at creation time, what does reconfiguration imply recreation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how the router is involved at all. Should this say "Updating the IngressController"? Andy's question then applies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the IngressController is accurate.


- As an administrator, I want to provision EIPs and use them with an NLB IngressController.
- As a user or admin, I want to ensure EIPs are used with NLB on default router at install time.
- As a user, I want to reconfigure default router to use EIPs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my previous comment: reconfiguration would have to be destruction followed by creation as elastic allocation IDs need to be done at creation time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the workflow has changed. We will recreate the LB.

Comment on lines 85 to 89
eip-allocations: eipalloc-<redacted>,eipalloc-<redacted>,eipalloc-<redacted>,eipalloc-<redacted>,eipalloc-<redacted>
subnets:
- subnet-1
- subnet-2
- subnet-3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the proposed design, will there be a check to ensure the count of EIP allocation IDs matches the number of public subnets? Without this, one might only discover that the ingress controller cannot deploy after much progress in the installation process...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option is we can add a field to mention the number of public subnets. Then using CEL at API level we could try to compare the number of eip-allocations they provided and the the number of public subnets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I think we may put this as a Non-Goal for this EP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Andy said "in the installation process". Validation to ensure installation succeeds can be done by the installer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this point in the Open Questions

Comment on lines 96 to 117
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe remove the noise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

aws:
type: NLB
networkLoadBalancer:
eip-allocations: eipalloc-<redacted>,eipalloc-<redacted>,eipalloc-<redacted>,eipalloc-<redacted>,eipalloc-<redacted>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we update the example to use sequential numbers for the EIP allocation IDs? This would make it clearer. Suggesting: "eipalloc-12345, eipalloc-12346, eipalloc-12347, eipalloc-12348, eipalloc-12349". Ditto for the other usage where we use .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment on lines 210 to 764
- The API field for `eip-allocations` field in the Ingress Controller CR needs to be set as follows
in the file `operator/v1/types_ingress.go`:
```go
// AWSNetworkLoadBalancerParameters holds configuration parameters for an
// AWS Network load balancer.
type AWSNetworkLoadBalancerParameters struct {
EIPAllocations EIPAllocations `json:"eip-allocations,omitempty"`
}

type EIPAllocations string

```
Copy link
Contributor

@frobware frobware Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be duplication. Aren't the types also described above?

Copy link
Contributor Author

@miheer miheer Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do something like this

// AWSNetworkLoadBalancerParameters holds configuration parameters for an
// AWS Network load balancer.
type AWSNetworkLoadBalancerParameters struct {
	EIPAllocations configv1.EIPAllocations `json:"eip-allocations,omitempty"`
}

Copy link
Contributor

@candita candita Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In operator/v1/types_ingress.go we already have a type ProviderLoadBalancerParameters with an AWSLoadBalancerParameters. Is that where this is supposed to go? https://github.com/openshift/api/blob/master/operator/v1/types_ingress.go#L473

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, my confusion was because the struct was empty to begin with. I thought you were adding a new struct type. Btw, your api PR shows this as a slice:

type AWSNetworkLoadBalancerParameters struct {
	EIPAllocations []configv1.EIPAllocations `json:"eipAllocations,omitempty"`
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so I hope your question is answered.

The admin sets the `eip-allocations` in the Ingress Controller CR. Then same process
as per point 1.c.ii and 1.c.iii is followed by the ingress operator.

#### 3. Update the existing EIP for the IngressController:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As previously mentioned, will changes to EIP allocations actually be reconciled after creation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frobware we need to recreate the LB service.

#### 3. Update the existing EIP for the IngressController:

When the admin changes the eip-allocations field in the Ingress Controller CR, the watch on the
`eip-allocations` field by the ingress operator scales a new deployment for the Ingress Controller CR.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so if you change the EIP allocations then you get a "brand new" ingresscontroller? That might be surprising as it will drop all traffic. Seems subtle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frobware I need to test again. It looks like if the service is updated with the annotation the kcm should update the eips. https://github.com/brooksgarrett/kubernetes/blob/72dd54ec4856ccb7bd2b08258d95f3d0b932fb11/pkg/cloudprovider/providers/aws/aws.go#L4157

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like https://github.com/openshift/enhancements/compare/45ad01df72b72174f9347fcb54179af4661cbd19..1137a8acba5a4c72ae496c34fa93b97dd18d3990 resolved this question by removing "scales a new deployment" and instead describing the workflow for recreating the service.

in the Ingress Controller CR and then test if the Ingress Controller
with service type load balancer with the annotation `service.beta.kubernetes.io/aws-load-balancer-eip-allocations`
has value of `eip-allocations` field from the Ingress Controller CR was updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan for testing when the EIP allocations are plumbed in the install-config.yaml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be unit tests in the installer.


## Upgrade / Downgrade Strategy

During upgrade, user should be able to populate the eip_allocations_field via installer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If EIP allocations are only reconciled at creation time can this work? The cluster is upgrading which implies it has a deployed ingresscontroller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right. During the upgrade we won't have this option for populating the field. So, the cluster admin will have to set the eip-allocations field once the cluster is installed.

Comment on lines 43 to 46
- As an administrator, I want to provision EIPs and use them with an NLB IngressController.
- As a user or admin, I want to ensure EIPs are used with NLB on default router at install time.
- As a user, I want to reconfigure default router to use EIPs.
- As a user of OpenShift on AWS (ROSA), I want to use static IPs (and therefore AWS Elastic IPs) so that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These roles are ambiguous. Is the administrator or user in all cases the cluster administrator? If there are multiple roles, can you define them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. From my understanding we shall have only one user i.e the cluster-admin as he is the one who is responsible for creating routers.

Comment on lines 41 to 48
### User Stories

- As an administrator, I want to provision EIPs and use them with an NLB IngressController.
- As a user or admin, I want to ensure EIPs are used with NLB on default router at install time.
- As a user, I want to reconfigure default router to use EIPs.
- As a user of OpenShift on AWS (ROSA), I want to use static IPs (and therefore AWS Elastic IPs) so that
I can configure appropriate firewall rules.
I want the default AWS Load Balancer that they use (NLB) for their router to use these EIPs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another one:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@miheer
Copy link
Contributor Author

miheer commented Mar 29, 2024

My main concerns revolve around the expectation that updating EIP allocation IDs will automatically update an existing NLB. From my brief experimentation, it seems that the system only recognises these allocations, as specified by the annotation, at the time of creation. The document implies at several points that changing EIP allocation IDs is possible and that such changes will be reconciled with the existing ELB without necessitating its recreation. It would be beneficial to conduct further tests to determine if modifications to the annotation result in immediate updates on a functioning ELB.

@frobware #1593 (comment)

- ""
---

# Set AWS EIP For NLB Ingress Controller
Copy link
Contributor

@candita candita Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please define EIP and NLB at least once in this proposal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes added under "Proposal"

### Goals
- Users are able to use EIP for a NLB Ingress Controller.
- Users are able to create a NLB default Ingress Controller during install time with the EIPs specified during install.
- Any existing ingress controller should be able to be reconfigured to use EIPs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned earlier, "reconfigure" should really be "update with restart", right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that https://github.com/openshift/enhancements/compare/1137a8acba5a4c72ae496c34fa93b97dd18d3990..79d098963268d49c7e4b7d64ffe1e25878c2b6c0 implemented this suggestion.

To me, "restart" implies that the router pod has to be restarted. How about this?

- The ELB of an existing IngressController should be able to be reconfigured to use specific EIPs by updating the IngressController CR and recreating the ELB. 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added with slight changes.

In this enhancement we would be adding API fields in the installer and the ingress controller CRD
to set the AWS EIP for the NLB Ingress Controller. The cluster ingress operator will then create
the ingress controller and get the EIP assigned to the service LoadBalancer with the annotation
`service.beta.kubernetes.io/aws-load-balancer-eip-allocations`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note about why the annotation is in kubernetes.io rather than openshift.io?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this was addressed in https://github.com/openshift/enhancements/compare/1137a8acba5a4c72ae496c34fa93b97dd18d3990..79d098963268d49c7e4b7d64ffe1e25878c2b6c0. However, that change uses service.beta.kubernetes.io/aws-load-balancer-eip-allocations as the target of a link, which will not work. Better to use https://kubernetes.io/docs/reference/labels-annotations-taints/#service-beta-kubernetes-io-aws-load-balancer-eip-allocations as a documentation reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes done

cluster. A cluster administrator also has the rights to modify the cluster level components.

#### Set EIP using an installer:
1. Cluster administrator creates adds the following in the install-config.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Cluster administrator creates adds the following in the install-config.yaml
1. Cluster administrator creates or adds the following in the install-config.yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in a different way

```sh
oc edit ingresscontroller default -o yaml
```
The yaml of the default ingresscontroller should look like this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to see what it looks like prior to the edit. Please remove this for clarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment on lines 979 to 629
Add the `ingress.operator.openshift.io/auto-delete-load-balancer:` to the ingress controller.
```sh
% oc -n openshift-ingress-operator annotate ingresscontrollers/default ingress.operator.openshift.io/auto-delete-load-balancer=
ingresscontroller.operator.openshift.io/default annotated
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't they add this annotation at the same time that they add the EIP allocations?

Copy link
Contributor Author

@miheer miheer May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can using oc edit but I am showing it through oc patch commands.

service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2"
service.beta.kubernetes.io/aws-load-balancer-type: nlb
traffic-policy.network.alpha.openshift.io/local-with-fallback: ""
creationTimestamp: "2024-04-10T08:04:41Z"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to see anything after the annotations in the Service yaml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated it.

Comment on lines 1070 to 1073
2. What happens to the subnets added in the installer config?

Addition of subnets is a different EP. However, the eips should match the number of public subnets which is a user responsibility.
More details on how to check the number of subnets are mentioned [here](#Set EIP using an installer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment on lines 1075 to 1078
3. What happens when someone updates the eipAllocations for the load balancer service of an Ingress Controller ?

It is not supported to manually edit the eipAllocations for the load balancer service of an Ingress Controller.
However, if someone does that then they will find the error in status as follows in the IngressController:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. What happens when someone updates the eipAllocations for the load balancer service of an Ingress Controller ?
It is not supported to manually edit the eipAllocations for the load balancer service of an Ingress Controller.
However, if someone does that then they will find the error in status as follows in the IngressController:
3. What happens when someone updates the `eipAllocations` directly in the load balancer service of an Ingress Controller ?
It is not supported to directly edit the `eipAllocations` for the load balancer service of an IngressController, and this will put the IngressController into an error state with the following status:

```yaml
- lastTransitionTime: "2024-04-17T03:09:32Z"
message: |-
The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error creating load balancer: "ResourceInUse: The allocation IDs are not available for use\n\tstatus code: 400, request id: 58136c42-e1bd-4c1a-ac38-7078da2b1d95"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some line breaks so this shows without scrolling.

Suggested change
The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error creating load balancer: "ResourceInUse: The allocation IDs are not available for use\n\tstatus code: 400, request id: 58136c42-e1bd-4c1a-ac38-7078da2b1d95"
The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load
balancer: failed to ensure load balancer: error creating load balancer: "ResourceInUse: The allocation IDs
are not available for use\n\tstatus code: 400, request id: 58136c42-e1bd-4c1a-ac38-7078da2b1d95"

...
- lastTransitionTime: "2024-04-17T03:09:32Z"
message: |-
One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error creating load balancer: "ResourceInUse: The allocation IDs are not available for use\n\tstatus code: 400, request id: 58136c42-e1bd-4c1a-ac38-7078da2b1d95"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error creating load balancer: "ResourceInUse: The allocation IDs are not available for use\n\tstatus code: 400, request id: 58136c42-e1bd-4c1a-ac38-7078da2b1d95"
One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed:
The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load
balancer: failed to ensure load balancer: error creating load balancer: "ResourceInUse: The allocation IDs
are not available for use\n\tstatus code: 400, request id: 58136c42-e1bd-4c1a-ac38-7078da2b1d95"


```

- Are there any pre-requisites for this EP ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are numbered questions above, please continue the numbering here and below. But also, these are not Open Questions if they have answers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 this was confusing me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


- lastTransitionTime: "2024-04-20T06:37:13Z"
message: |-
The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error creating load balancer: "AllocationIdNotFound: The allocation IDs 'eipalloc-0dba74f5f9888a37a', 'eipalloc-0436f6a636851e1ce', 'eipalloc-034f3c45b2b341752', 'eipalloc-094ada3572e093006', 'eipalloc-0eeb9f39681b1a015' do not exist (Service: AmazonEC2; Status Code: 400; Error Code: InvalidAllocationID.NotFound; Request ID: 50026879-8b24-4a0f-9606-87ac22f47692; Proxy: null)\n\tstatus code: 400, request id: 740d20ff-ae98-44d6-b86e-0762a1cf9c20"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error creating load balancer: "AllocationIdNotFound: The allocation IDs 'eipalloc-0dba74f5f9888a37a', 'eipalloc-0436f6a636851e1ce', 'eipalloc-034f3c45b2b341752', 'eipalloc-094ada3572e093006', 'eipalloc-0eeb9f39681b1a015' do not exist (Service: AmazonEC2; Status Code: 400; Error Code: InvalidAllocationID.NotFound; Request ID: 50026879-8b24-4a0f-9606-87ac22f47692; Proxy: null)\n\tstatus code: 400, request id: 740d20ff-ae98-44d6-b86e-0762a1cf9c20"
The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load
balancer: failed to ensure load balancer: error creating load balancer: "AllocationIdNotFound: The allocation
IDs 'eipalloc-0dba74f5f9888a37a', 'eipalloc-0436f6a636851e1ce', 'eipalloc-034f3c45b2b341752', 'eipalloc-
094ada3572e093006', 'eipalloc-0eeb9f39681b1a015' do not exist (Service: AmazonEC2; Status Code: 400;
Error Code: InvalidAllocationID.NotFound; Request ID: 50026879-8b24-4a0f-9606-87ac22f47692; Proxy:
null)\n\tstatus code: 400, request id: 740d20ff-ae98-44d6-b86e-0762a1cf9c20"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

type: DNSReady
- lastTransitionTime: "2024-04-20T06:37:16Z"
message: |-
One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error creating load balancer: "AllocationIdNotFound: The allocation IDs 'eipalloc-0dba74f5f9888a37a', 'eipalloc-0436f6a636851e1ce', 'eipalloc-034f3c45b2b341752', 'eipalloc-094ada3572e093006', 'eipalloc-0eeb9f39681b1a015' do not exist (Service: AmazonEC2; Status Code: 400; Error Code: InvalidAllocationID.NotFound; Request ID: 50026879-8b24-4a0f-9606-87ac22f47692; Proxy: null)\n\tstatus code: 400, request id: 740d20ff-ae98-44d6-b86e-0762a1cf9c20"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error creating load balancer: "AllocationIdNotFound: The allocation IDs 'eipalloc-0dba74f5f9888a37a', 'eipalloc-0436f6a636851e1ce', 'eipalloc-034f3c45b2b341752', 'eipalloc-094ada3572e093006', 'eipalloc-0eeb9f39681b1a015' do not exist (Service: AmazonEC2; Status Code: 400; Error Code: InvalidAllocationID.NotFound; Request ID: 50026879-8b24-4a0f-9606-87ac22f47692; Proxy: null)\n\tstatus code: 400, request id: 740d20ff-ae98-44d6-b86e-0762a1cf9c20"
One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed: The
service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer:
failed to ensure load balancer: error creating load balancer: "AllocationIdNotFound: The allocation IDs
'eipalloc-0dba74f5f9888a37a', 'eipalloc-0436f6a636851e1ce', 'eipalloc-034f3c45b2b341752', 'eipalloc-
094ada3572e093006', 'eipalloc-0eeb9f39681b1a015' do not exist (Service: AmazonEC2; Status Code: 400;
Error Code: InvalidAllocationID.NotFound; Request ID: 50026879-8b24-4a0f-9606-87ac22f47692; Proxy:
null)\n\tstatus code: 400, request id: 740d20ff-ae98-44d6-b86e-0762a1cf9c20"

```yaml
- lastTransitionTime: "2024-04-20T06:40:43Z"
message: |-
The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error creating load balancer: Must have same number of EIP AllocationIDs (4) and SubnetIDs (5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error creating load balancer: Must have same number of EIP AllocationIDs (4) and SubnetIDs (5)
The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load
balancer: failed to ensure load balancer: error creating load balancer: Must have same number of EIP
AllocationIDs (4) and SubnetIDs (5)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

type: DNSReady
- lastTransitionTime: "2024-04-20T06:40:43Z"
message: |-
One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error creating load balancer: Must have same number of EIP AllocationIDs (4) and SubnetIDs (5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error creating load balancer: Must have same number of EIP AllocationIDs (4) and SubnetIDs (5)
One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable:
The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with
message: Deployment does not have minimum availability.), LoadBalancerReady=False
(SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like:
Error syncing load balancer: failed to ensure load balancer: error creating load balancer: Must have same
number of EIP AllocationIDs (4) and SubnetIDs (5)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

status: "True"
type: Progressing
- lastTransitionTime: "2024-04-20T06:40:43Z"
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test to see what happens if a change was made to EIPs without the auto-delete annotation, and the controller goes into Pending, but then you add the auto-delete annotation? Does it automatically delete at that point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it does not. annotation should be present for automatic deletion.

Comment on lines 109 to 110
Here, the administrator needs to know how many public subnets the VPC ID of their cluster has.
They may check it by clicking the `Load Balancers` tab then under the details click `VPC ID` under VPC, then click the link under `Main route table` and then check the column `Explicit subnet associations` which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set EIP using an installer:
the administrator needs to know how many public subnets the VPC ID of their cluster has.
the Load Balancers tab then under the details click VPC ID under VPC

Is this assuming the VPC is created (which is created by installer)? or any other VPC that eventually exists in the account?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it is VPC tagged with cluster name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The steps listed here are assuming you are bringing your own VPC, that's already created. I think that needs to be stated here.

Shouldn't we also have a workflow for when the installer creates the VPC and public subnets? In that case, you'd need the cluster admin to find how many AZs are in the region they are installing into and express that the number of public subnets == the number of AZs in a region.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The steps listed here are assuming you are bringing your own VPC, that's already created. I think that needs to be stated here.

OK I will state it.

Shouldn't we also have a workflow for when the installer creates the VPC and public subnets? In that case, you'd need the cluster admin to find how many AZs are in the region they are installing into and express that the number of public subnets == the number of AZs in a region

I need to think on this. Lets have a call to discuss this. I did not get this express that the number of public subnets == the number of AZs in a region

kind: Service
metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-eip-allocations: eipalloc-0956fea34de4cb7ab,eipalloc-0e9a3077a70de050a,eipalloc-0b69fc4691f54cdd0,eipalloc-01e6ba6cbba1a391b,eipalloc-0e242df173f906112
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the documentation I can see:

  • This configuration is optional, and you can use it to assign static IP addresses to your NLB
  • You must specify the same number of eip allocations as load balancer subnets annotation

Is the subnet annotation service.beta.kubernetes.io/aws-load-balancer-subnets required when using service.beta.kubernetes.io/aws-load-balancer-eip-allocations or is it something implicit and only the len(aws-load-balancer-eip-allocations) must match the discovered subnets from the VPC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking of adding a validation in the installer on the install-config.yaml

436b954#diff-cfc19a244868c1f04e3b6abf6779c80f4938e934d4fb8913a4b72ada48b266aeR1108

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtulio : I believe the AWS LB controller doc you linked is a little misleading or it wants you to follow the subnet annotation link to see that the subnet discovery can be used too. ALBC uses the number of found subnets which can come either from the subnet annotation or from the subnet discovery.

However in the context of this EP we should check the aws cloud provider, which actually does the same thing as ALBC. It compares the number of EIPs to the number of discovered subnets. The discovered subnets in turn can come either from the annotation or from the discovery.

Is the subnet annotation service.beta.kubernetes.io/aws-load-balancer-subnets required when using service.beta.kubernetes.io/aws-load-balancer-eip-allocations or is it something implicit and only the len(aws-load-balancer-eip-allocations) must match the discovered subnets from the VPC?

So, either the annotation or the discovered. The latter may be more difficult to implement if we want to check this in the installer.

Comment on lines 135 to 137
type: NLB
networkLoadBalancer:
eipAllocations:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the spec.loadBalancer.platform.aws.networkLoadBalancer.eipAllocations path?

Suggested change
type: NLB
networkLoadBalancer:
eipAllocations:
type: NLB
networkLoadBalancer:
eipAllocations:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right. I changed it.

router-test LoadBalancer 172.30.230.109 a95d070adbbf045c9b356ed9db8d3cda-89fbf4dfc638408e.elb.us-east-1.amazonaws.com 80:30453/TCP,443:32704/TCP 77s
```

The annotation `service.beta.kubernetes.io/aws-load-balancer-eip-allocations:` is removed from the service as we updated the `eipAllocations` to an empty value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The annotation `service.beta.kubernetes.io/aws-load-balancer-eip-allocations:` is removed from the service as we updated the `eipAllocations` to an empty value.
The annotation `service.beta.kubernetes.io/aws-load-balancer-eip-allocations` is removed from the service as we updated the `eipAllocations` to an empty value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


If the admin wants to detect the change in the eipAllocations and update the LB service automatically.
rather than deleting the Service explicitly, it is possible to annotate the IngressController with
the newly defined ingress.operator.openshift.io/auto-delete-load-balancer annotation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the newly defined ingress.operator.openshift.io/auto-delete-load-balancer annotation.
the newly defined `ingress.operator.openshift.io/auto-delete-load-balancer` annotation.

minTLSVersion: VersionTLS12
```

Add the `ingress.operator.openshift.io/auto-delete-load-balancer:` to the ingress controller.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Add the `ingress.operator.openshift.io/auto-delete-load-balancer:` to the ingress controller.
Add the `ingress.operator.openshift.io/auto-delete-load-balancer` to the ingress controller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

I can configure appropriate firewall rules.
I want the default AWS Load Balancer that they use (NLB) for their router to use these EIPs.
- As a cluster administrator who has purchased a block of public IPv4 addresses, I want to use these IP addresses, so
that I can avoid Amazon's new charge for public IP addresses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always use External (Internet-facing) LB ?

Short answer yes - for this use case. From docs state:

An Elastic IP address is a public IPv4 address, which is reachable from the internet.

Private Network Load Balancers can assign private IP, but not through EIP. The ELB docs states:

For internal load balancers, you can specify one private IP address per subnet from the IPv4 range of the subnet.

FWIW additionally, there is another use case using Outpost - not applied to this EP as it is out of the scope to support Outpost/ALB:
EIPs can assign IPs from a Customer Owned IP (CoIP) Pool when using Outposts, which can be a private IP from a customer, not from AWS boarder group. From Customer-owned IP addresses docs, the example "Example: Internet connectivity through the on-premises networ" shares the use case of using an EIP allocated from CoIP with private IP:

  • The local gateway uses BGP advertisement to advertise the customer-owned IP address pool to the on-premises network.
  • An Elastic IP address association that maps 10.0.3.112 to 10.1.0.2.


### Non-Goals

- Creation of EIPs in AWS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT the use case of As a cluster administrator who has purchased a block of public IPv4 addresses, I want to use these IP addresses, so that I can avoid Amazon's new charge for public IP addresses. :

The installer is designing to provide BYO IP support consuming Public IPv4 that the user brought to AWS by setting the Pool ID on install-config.yaml platform.aws.publicIpv4Pool, then, all resources created by the installer will consume Public IPs (EIP) from that pool.

To provide that automation, the terraform (#7983) resources which claims EIP is setting that flag to consume from the pool when provided by the user. Furthermore, in the installer goal to migrate to CAPI (cluster-api), there is an implementation in progress to make the CAPA (cluster-api-provider-aws) to support the same capabilities performed on Terraform, which means, create the resources (NLB, Nat GW, instances) consuming from a custom pool, instead of delegating to the user to create the EIPs and associating it to the resources.
References on CAPA: request , Example API

Is there any chance the cluster ingress operator (or some controller) to consume an Public IPv4 Pool ID available on installer, calculate the subnets, and creates (manage lifecycle) of the Elastic IPs required to create a service load balancer to the default router? So the user, and CI, wont need extra steps prior installing a cluster.

cc @jupierce @patrickdillon @r4f4

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtulio that looks outside the scope of this enhancement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance the cluster ingress operator (or some controller) to consume an Public IPv4 Pool ID available on installer, calculate the subnets, and creates (manage lifecycle) of the Elastic IPs required to create a service load balancer to the default router? So the user, and CI, wont need extra steps prior installing a cluster.

No, this is not currently part of the epic. We may add it in the future if needed.

Comment on lines 98 to 105
lbType: NLB
networkLoadBalancer:
eipAllocations:
- eipalloc-1111
- eipalloc-2222
- eipalloc-3333
- eipalloc-4444
- eipalloc-5555
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the proposal A, or B?

  • A) platform.aws.networkLoadBalancer
  • B) platform.aws.lbType.networkLoadBalancer

edit: reading the EP it looks like A:

Suggested change
lbType: NLB
networkLoadBalancer:
eipAllocations:
- eipalloc-1111
- eipalloc-2222
- eipalloc-3333
- eipalloc-4444
- eipalloc-5555
lbType: NLB
networkLoadBalancer:
eipAllocations:
- eipalloc-1111
- eipalloc-2222
- eipalloc-3333
- eipalloc-4444
- eipalloc-5555

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtulio platform.aws.networkLoadBalancer I will fix the indentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

- cluster-ingress-operator
- installer

### This EP will also be covered by end to end tests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan to add an e2e to installer/install time/Day-0?

cc @patrickdillon @r4f4

Copy link
Contributor Author

@miheer miheer May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to add one for the installer. I need to check on this. Can you point me to some examples ?

Copy link
Contributor

@gcs278 gcs278 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - a couple things. Some nits but also some more important things. I feel like we are missing a unmanaged annotation migration workflow, support procedures is missing real life examples, and we should talk Installer validation now if we are trying to implement the Installer part now.

Pre-requisites: One must [allocate Elastic IP address](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html#using-instance-addressing-eips-allocating)
in their AWS environment which could be then used in the `eipAllocations` field of the IngressController CR.
Note: The AWS account has some cap on the number of EIPs per region. By default, all AWS accounts have a quota of five (5) Elastic IP addresses per Region, because public (IPv4)
internet addresses are a scarce public resource. For more details please check [Elastic IP address quota](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html#using-instance-addressing-limit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit punctuation.

Suggested change
internet addresses are a scarce public resource. For more details please check [Elastic IP address quota](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html#using-instance-addressing-limit)
internet addresses are a scarce public resource. For more details please check [Elastic IP address quota](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html#using-instance-addressing-limit).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

sshKey: ssh-ed25519 AAAA...
```
Here, the administrator needs to know how many public subnets the VPC ID of their cluster has.
They may check it by clicking the `Load Balancers` tab then under the details click `VPC ID` under VPC, then click the link under `Main route table` and then check the column `Explicit subnet associations` which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to specifying it's the AWS Cloud Console, that wasn't super obvious to me what we were talking about.

Also, don't you need to specify that you need to find your load balancer in your cluster? Clicking the Load Balancers tab just presents you with a list of all load balancers. These instructions are a bit hard to follow.

Comment on lines 113 to 202
Other way to check VPC ID and subnet is go `Your VPCs` under `Virtual Public CLoud` section, then search for the cluster name using the
command `oc get infrastructure cluster -o jsonpath='{.status.infrastructureName}'`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels disjoint with the steps above. This sentence here is an incomplete solution - it gets you to the your VPC view, but you still need to follow the steps above to get to the desired information (the public subnets). I get what you are saying, but it's confusing to me.

I think you should just pick one way to get to the VPC view to keep it simple. I think the one listed here is more direct than trying to go through and find a load balancer that belongs to your cluster.

in.

## Support Procedures

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you install a cluster with more or less EIPs than public subnets? What about if an EIP doesn't exist? I am very interested in how that will play out. Will cluster fail to install because the default IngressController is stuck in Progressing=True?

An installer-failure scenario seems quite likely and potentially very confusing, especially since we aren't validating EIP amounts.

I highly recommend adding a "installer" failure where EIPs are invalid as a support procedure here to clarify what will happen. You'll probably need to answer the open question about if you validating EIPs in the installer first.

Copy link
Contributor Author

@miheer miheer May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you install a cluster with more or less EIPs than public subnets? What about if an EIP doesn't exist? I am very interested in how that will play out. Will cluster fail to install because the default IngressController is stuck in Progressing=True?

The IngressController status will show an error message regarding either eips != subnets and the load balancer creation will go in Pending State.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but will it cause the installation to be stopped and marked unsuccessful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes because the cluster operators will degrade as they communicate with router

EIPs not present in the subnet of the VPC.
The number of EIP provided and the number of subnets of VPC ID don't match.

- If router pods fail to run then check the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the router pod should ever fail from a failure to provision a load balancer. I think this is extra/generic information that isn't super helpful for this EP. Correct me if I'm wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I don't see the point in documenting how to list the logs of the router pods as a support procedure when it wouldn't help solve any bugs with this EP.

Comment on lines 109 to 110
Here, the administrator needs to know how many public subnets the VPC ID of their cluster has.
They may check it by clicking the `Load Balancers` tab then under the details click `VPC ID` under VPC, then click the link under `Main route table` and then check the column `Explicit subnet associations` which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The steps listed here are assuming you are bringing your own VPC, that's already created. I think that needs to be stated here.

Shouldn't we also have a workflow for when the installer creates the VPC and public subnets? In that case, you'd need the cluster admin to find how many AZs are in the region they are installing into and express that the number of public subnets == the number of AZs in a region.

Comment on lines 1112 to 1113
Note: The following questions are applicable when `auto-delete-load-balancer` annotation is set in the IngressController as it tries to recreate the update Load Balancer.
If not set then the LoadBalancerProgressing Status will be set to "True" with an message as described in [subsection of Test Plan](#update-an-ingress-controller-eipallocations-by-adding-auto-delete-load-balancer-annotation-in-the-ingress-controller-cr-which-already-has-eipallocations-set)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which following questions is this talking about? You mean What happens when a cluster-admin adds invalid EIPs ?, What happens when a cluster-admin adds number of EIPs..., and - What happens when a cluster-admin adds nothing under eipAllocations ??

I don't understand. If you don't have auto-delete-load-balancer set, then you will get a LoadBalancerProgressing Status, yes I agree, but that status won't tell them the EIPs are invalid.

They are still going to proceed with deleting the service, and then the service will fail to provision. So what's the point of adding this predicate note of The following questions...? I don't think it changes anything right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

has value of `eipAllocations` field from the Ingress Controller CR was updated.
We will check the status of IngressController controller for LoadBalancer which is `LoadBalacerReady` and `LoadBalancerProgressing`
and the cloud controller manager logs.
The `LoadBalancerProgressing` should display `False` and message `"'One or more status conditions indicate progressing: LoadBalancerProgressing=True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this doesn't make sense to me. LoadBalancerProgressing should display False, and then message with LoadBalancerProgressing=True? Is that a mistake?

Also, this E2E test doesn't talk about deleting the service. The flow doesn't make sense, you say 1. Update eipAllocations 2. Check service annotation 3. Check LoadBalancerProgressing.

The service annotation isn't going to change if you haven't deleted the service.

It feels like there's a lot of fluff, but it's missing some key parts. I think you could sum this up in one sentence:

Update an IngressController with new `eipAllocations`, observe `LoadBalancerProgressing=True`,
deleting the LoadBalancer-type service, and observe value of
`service.beta.kubernetes.io/aws-load-balancer-subnets`.

Feel free to use my E2E tests section as reference: https://github.com/openshift/enhancements/pull/1595/files#diff-f8f6da2366b6e9f9eede942d6a878c8043690acb03727fad6bbf71d37c95fd92R475-R487

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this doesn't make sense to me. LoadBalancerProgressing should display False, and then message with LoadBalancerProgressing=True? Is that a mistake?

Yes, I will correct it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after update of eip, the e2e test will just check for status message instructing the admin to manually delete the service.

I will correct it. Thanks for pointing out.

Automatic deletion will happen with auto-delete-loadbalancer annotation for which an e2e will be written.


- lastTransitionTime: "2024-04-20T06:37:13Z"
message: |-
The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error creating load balancer: "AllocationIdNotFound: The allocation IDs 'eipalloc-0dba74f5f9888a37a', 'eipalloc-0436f6a636851e1ce', 'eipalloc-034f3c45b2b341752', 'eipalloc-094ada3572e093006', 'eipalloc-0eeb9f39681b1a015' do not exist (Service: AmazonEC2; Status Code: 400; Error Code: InvalidAllocationID.NotFound; Request ID: 50026879-8b24-4a0f-9606-87ac22f47692; Proxy: null)\n\tstatus code: 400, request id: 740d20ff-ae98-44d6-b86e-0762a1cf9c20"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in another comment, this error, and any error you can generate, should be in Support Procedures.

If you look at the enhancement template:

## Support Procedures
Describe how to
- detect the failure modes in a support situation, describe possible symptoms (events, metrics,
alerts, which log output in which component)

I don't think you need to list the error condition out here in Test Plan.


### Drawbacks

Currently, there are not any drawbacks as such for this EP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could add some draw backs:

Suggested change
Currently, there are not any drawbacks as such for this EP.
Requiring the cluster admin to delete the LoadBalancer-type
service leads to several minutes of ingress traffic disruption.
This enhancement brings additional engineering complexity for upgrade
scenarios because cluster admins have previously been allowed to directly
add this annotation on a service.
Debugging invalid EIP Allocations will be confusing for cluster admins and may
lead to extra support cases or bugs.
This design requires a cluster admin to check the IngressController's status after
updating `eipAllocations` for instructions on how to proceed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

observedGeneration: 3
selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=test
tlsProfile:
ciphers:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding point 2 done

Comment on lines 546 to 558
#### 1. Set EIP through installer for the default ingress controller:
a. The admin sets the `eipAllocations` in the installer using the install-config.yaml.

b. The installer then sets the ingress cluster object with the `eipAllocations`.

c. The cluster ingress operator then creates a default ingress controller by:

i. creating an default ingress controller CR.

ii. then scaling a deployment for the Ingress Controller CR which has the value of the `eipAllocations` from the
ingress cluster object.

iii. then creating a service of service type load balancer with the annotation `service.beta.kubernetes.io/aws-load-balancer-eip-allocations`,
which uses the value from the field `eipAllocations` of ingress controller CR.

Copy link
Contributor

@gcs278 gcs278 May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL When you indent a line a with 4-5 spaces, Github renders it as a panel. For example:

I'm indented 4 lines!

It's quite confusing to read. If you change the headers from a h4 ####, make it a number and just bold it, Github will render this exactly the way you want, 1.a.i.

Suggested change
#### 1. Set EIP through installer for the default ingress controller:
a. The admin sets the `eipAllocations` in the installer using the install-config.yaml.
b. The installer then sets the ingress cluster object with the `eipAllocations`.
c. The cluster ingress operator then creates a default ingress controller by:
i. creating an default ingress controller CR.
ii. then scaling a deployment for the Ingress Controller CR which has the value of the `eipAllocations` from the
ingress cluster object.
iii. then creating a service of service type load balancer with the annotation `service.beta.kubernetes.io/aws-load-balancer-eip-allocations`,
which uses the value from the field `eipAllocations` of ingress controller CR.
1. **Set EIP through installer for the default ingress controller:**
1. The admin sets the `eipAllocations` in the installer using the install-config.yaml.
2. The installer then sets the ingress cluster object with the `eipAllocations`.
3. The cluster ingress operator then creates a default ingress controller by
1. creating an default ingress controller CR.
2. then scaling a deployment for the Ingress Controller CR which has the value of the `eipAllocations` from the
ingress cluster object.
3. then creating a service of service type load balancer with the annotation `service.beta.kubernetes.io/aws-load-balancer-eip-allocations`,
which uses the value from the field `eipAllocations` of ingress controller CR.
[so on...]

Here's the preview:

  1. Set EIP through installer for the default ingress controller:
    1. The admin sets the eipAllocations in the installer using the install-config.yaml.
    2. The installer then sets the ingress cluster object with the eipAllocations.
    3. The cluster ingress operator then creates a default ingress controller by
      1. creating an default ingress controller CR.
      2. then scaling a deployment for the Ingress Controller CR which has the value of the eipAllocations from the
        ingress cluster object.
      3. then creating a service of service type load balancer with the annotation service.beta.kubernetes.io/aws-load-balancer-eip-allocations,
        which uses the value from the field eipAllocations of ingress controller CR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I got the suggestion wrong, if you look at my preview above, it works now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

- cluster-ingress-operator
- installer

### This EP will also be covered by end to end tests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit consistency

Suggested change
### This EP will also be covered by end to end tests.
### This EP will also be covered by end to end tests:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

3. Ingress Operator will create the default ingress controller with the service type load balancer having the service annotation `service.beta.kubernetes.io/aws-load-balancer-eip-allocations` having values of the field `eipAllocations`
from the Ingress Controller CR.

#### Create an Ingress Controller using Ingress Controller CR and assign the EIP allocation mentioned in the CR to the kubernetes Load Balancer service for the Ingress Controller.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit consistency.

Suggested change
#### Create an Ingress Controller using Ingress Controller CR and assign the EIP allocation mentioned in the CR to the kubernetes Load Balancer service for the Ingress Controller.
#### Create an Ingress Controller using Ingress Controller CR and assign the EIP allocation mentioned in the CR to the kubernetes Load Balancer service for the Ingress Controller:

Comment on lines 198 to 205
To check VPC ID:
Go to `Your VPCs` under `Virtual Public CLoud` section, then search for the VPC ID tagged with `cluster name` using the
command `oc get infrastructure cluster -o jsonpath='{.status.infrastructureName}'`
To check the public subnets for the `VPC ID` tagged with the cluster name we got from the above command:
Go the `AWS Cloud Console` click the `Load Balancers` tab then under the `details` click `VPC ID` under VPC, then click the link under `Main route table` and then check the column `Explicit subnet associations` which
will show you the number of public subnets available for that VPC ID.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could benefit from some organization updates, and I am struggling to repeat this procedure.

I just tried it out, and I didn't have a "Load Balancers" tab, but I wasn't sure why you needed that? You found the VPC, just click the VPC ID link, and you at the "Main Route Table". I don't think you need anything about a load balancer, that seemed like an extra step that can just be removed:

Suggested change
To check VPC ID:
Go to `Your VPCs` under `Virtual Public CLoud` section, then search for the VPC ID tagged with `cluster name` using the
command `oc get infrastructure cluster -o jsonpath='{.status.infrastructureName}'`
To check the public subnets for the `VPC ID` tagged with the cluster name we got from the above command:
Go the `AWS Cloud Console` click the `Load Balancers` tab then under the `details` click `VPC ID` under VPC, then click the link under `Main route table` and then check the column `Explicit subnet associations` which
will show you the number of public subnets available for that VPC ID.
To locate the public subnets:
1. Navigate to your AWS Web Console for the account that contains the cluster
2. Go to `Your VPCs` under the `Virtual Public Cloud` section
3. Search for the VPC ID tagged with `cluster name` using the command `oc get infrastructure cluster -o jsonpath='{.status.infrastructureName}'`
4. Click the VPC ID link for the VPC
5. Click the link under `Main route table` and then check the column `Explicit subnet associations` which
will show you the number of public subnets available for that VPC ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok done

@miheer miheer force-pushed the eip branch 2 times, most recently from aa57db5 to 266d7d7 Compare May 22, 2024 06:08
#### 2. Create a new ingress controller with EIPs:

The admin sets the `eipAllocations` in the Ingress Controller CR. Then same process
as per point 1.c.ii and 1.c.iii is followed by the ingress operator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your changes above, this needs to be:

Suggested change
as per point 1.c.ii and 1.c.iii is followed by the ingress operator.
as per point 1.3.ii and 1.3.iii is followed by the ingress operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be "as per point 1.iii.b and 1.iii.c is followed by the ingress operator. "

Copy link
Contributor

@alebedev87 alebedev87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 2 cents about ALBC vs cloud-provider-aws.

// The number of Allocation IDs must match the number of subnets that are used for the load balancer.
// service.beta.kubernetes.io/aws-load-balancer-eip-allocations: eipalloc-xxxxxxxxxxxxxxxxx,eipalloc-yyyyyyyyyyyyyyyyy
// https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html
// https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.7/guide/service/annotations/#eip-allocations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a link to AWS LB controller. The ingress controller uses the service controller from the cloud provider aws: https://cloud-provider-aws.sigs.k8s.io/service_controller/.

kind: Service
metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-eip-allocations: eipalloc-0956fea34de4cb7ab,eipalloc-0e9a3077a70de050a,eipalloc-0b69fc4691f54cdd0,eipalloc-01e6ba6cbba1a391b,eipalloc-0e242df173f906112
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtulio : I believe the AWS LB controller doc you linked is a little misleading or it wants you to follow the subnet annotation link to see that the subnet discovery can be used too. ALBC uses the number of found subnets which can come either from the subnet annotation or from the subnet discovery.

However in the context of this EP we should check the aws cloud provider, which actually does the same thing as ALBC. It compares the number of EIPs to the number of discovered subnets. The discovered subnets in turn can come either from the annotation or from the discovery.

Is the subnet annotation service.beta.kubernetes.io/aws-load-balancer-subnets required when using service.beta.kubernetes.io/aws-load-balancer-eip-allocations or is it something implicit and only the len(aws-load-balancer-eip-allocations) must match the discovered subnets from the VPC?

So, either the annotation or the discovered. The latter may be more difficult to implement if we want to check this in the installer.

// The number of Allocation IDs must match the number of subnets that are used for the load balancer.
// service.beta.kubernetes.io/aws-load-balancer-eip-allocations: eipalloc-xxxxxxxxxxxxxxxxx,eipalloc-yyyyyyyyyyyyyyyyy
// https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html
// https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.7/guide/service/annotations/#eip-allocations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark about the usage of the ALBC doc vs https://cloud-provider-aws.sigs.k8s.io/service_controller/.

Copy link
Contributor

openshift-ci bot commented May 22, 2024

@miheer: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments on the open questions regarding installer validation, but I think it would be helpful to clarify some of Grant's questions in #1593 (comment)

Comment on lines +723 to +724
- For validation for Installer Created Subnets:
number of EIPs must match number of AZs in region (installer team please confirm this).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this seems reasonable. Users can specify a subset of AZs in the installconfig, but it should still be relatively straightforward.

Things to check with the installer team:
- How to find the number of public subnets for the VPC(which will be created by the installer)? Shall
we check compare the eips with the number of Availability Zones ?
- When VPC is already created and we already know the VPC ID can we check the number of subnets using the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users don't provide a VPC ID in the installconfig, they provide the subnets, and then we use the subnets to determine the VPC ID.

Comment on lines +718 to +721
To summarize shall we proceed with the following ?(Installer team to confirm):
- For validation for BYOB subnets:
The logic as per [e2e tests](https://github.com/openshift/cluster-ingress-operator/pull/1034/commits/fba43803e232277a96565b1eb8875d19df9bf66e#diff-160d46f77f47e8e0d35737b019c27a6e4a6f7e7eae4f6e85c0dcdb43b20a2379R163) will work
if added to [pkg/types/aws/validation/platform.go](https://github.com/openshift/installer/blob/master/pkg/types/aws/validation/platform.go). Is the installer team OK with that ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard for me to parse from that pull request what is being proposed that we add to the installer validation. Can you summarize?

You are linking to the installer's static validation (validation that does not connect to the aws api). Instead you want to look here: https://github.com/openshift/installer/blob/master/pkg/asset/installconfig/aws/validation.go#L186

3. Are there any pre-requisites for this EP ?
Yes, please refer section `Pre-requisites` under [Workflow Description](#Workflow Description)

4. What happens when the cluster-admin specified invalid EIPs in install-config.yaml at installation time (day 0) ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes an EIP invalid? If we simply need to match the count of EIPs to the number of subnets that should be straightforward, but I don't think it's that simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants