Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling Missing Ingress Domain for Registry Operator #89

Merged
merged 9 commits into from May 21, 2024

Conversation

Jdubrick
Copy link
Contributor

@Jdubrick Jdubrick commented Apr 30, 2024

Please specify the area for this PR
Bug fix

What does does this PR do / why we need it:
This PR adds logic for handling the possible situation that the devfile registry is being deployed via the operator without the ingress domain field being set. If you were to deploy using the operator with the following:

cat <<EOF | kubectl apply -f -
apiVersion: registry.devfile.io/v1alpha1
kind: DevfileRegistry
metadata:
  name: devfile-registry
spec:
  devfileIndex:
    image: quay.io/devfile/devfile-index:next
  telemetry:
    registryName: test
EOF

You would receive this error log within the operator:

2024-04-12T15:37:32Z    ERROR   Reconciler error        {"controller": "devfileregistry", "controllerGroup": "registry.devfile.io",
 "controllerKind": "DevfileRegistry", "DevfileRegistry": {"name":"test-registry","namespace":"dr"}, "namespace": "dr", "name": 
"test-registry", "reconcileID": "0868b66d-16d7-48b0-90cc-380a4ed351dc", "error": "Ingress.extensions \"test-registry-devfile-
registry\" is invalid: spec.rules[0].host: Invalid value: \"test-registry-devfile-registry-dr.\": a lowercase RFC 1123 subdomain must 
consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 
'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')"}

This fix skips the attempted deployment of the Ingress for k8s environments if the ingress domain field is not set and logs an explanation regarding why it was skipped.

If you deploy the following with the updated operator:

cat <<EOF | kubectl apply -f -
apiVersion: registry.devfile.io/v1alpha1
kind: DevfileRegistry
metadata:
  name: devfile-registry
spec:
  devfileIndex:
    image: quay.io/devfile/devfile-index:next
  telemetry:
    registryName: test
EOF

You would observe that an Ingress is not created and the following log appears inside the operator:

2024-04-30T17:44:57Z    INFO    controllers.DevfileRegistry     Deploying registry      {"devfileregistry": {"name":"devfile-registry","namespace":"devfile-registry"}}
2024-04-30T17:44:57Z    INFO    controllers.DevfileRegistry     Ingress creation skipped due to missing IngressDomain field.    {"devfileregistry": {"name":"devfile-registry","namespace":"devfile-registry"}}

This was accomplished by adding a piece of login to GetDevfileRegistryIngress() so that it will only use the Hostname instead of trying to concatenate the Hostname with and empty IngressDomain. The concatenation was happening with a . separator, so when the IngressDomain was empty the result was <Hostname>., leaving a trailing dot at the end. Additionally, if the IngressDomain is not set it will direct the operator to output the correct log explaining why this occurred.

Which issue(s) this PR fixes:

Fixes devfile/api#1520

PR acceptance criteria:

  • Test Coverage
    • Are your changes sufficiently tested, and are any applicable test cases added or updated to cover your changes?
  • Gosec scans

Documentation

  • Does the registry operator documentation need to be updated with your changes?

Testing changes

Running Unit Tests

Running Integration Tests

Special notes to the reviewer:

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Copy link
Member

@michael-valdron michael-valdron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jdubrick In addition to my comment, I think since this differs from how the helm chart currently works, it uses a default value, we should mirror this behavior we are using here to fix the operator bug.

We can create a separate issue for this.

pkg/registry/ingress.go Outdated Show resolved Hide resolved
@Jdubrick
Copy link
Contributor Author

Jdubrick commented May 3, 2024

@Jdubrick In addition to my comment, I think since this differs from how the helm chart currently works, it uses a default value, we should mirror this behavior we are using here to fix the operator bug.

We can create a separate issue for this.

For reference which operator bug are you referring to in this case? If you're referencing this one are you saying we should open an issue to mirror the behaviour from helm vs operator?

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
@michael-valdron
Copy link
Member

@Jdubrick In addition to my comment, I think since this differs from how the helm chart currently works, it uses a default value, we should mirror this behavior we are using here to fix the operator bug.
We can create a separate issue for this.

For reference which operator bug are you referring to in this case? If you're referencing this one are you saying we should open an issue to mirror the behaviour from helm vs operator?

Yeah, whatever changes introduce differences with the helm chart we should also apply those changes to the helm chart. In this case, the helm chart should also skip the ingress when k8s ingress domain is not set and should use the localhost port on registry-viewer fqdn.

Copy link
Member

@michael-valdron michael-valdron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just this change: #89 (comment)

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
@Jdubrick
Copy link
Contributor Author

Jdubrick commented May 7, 2024

@Jdubrick In addition to my comment, I think since this differs from how the helm chart currently works, it uses a default value, we should mirror this behavior we are using here to fix the operator bug.
We can create a separate issue for this.

For reference which operator bug are you referring to in this case? If you're referencing this one are you saying we should open an issue to mirror the behaviour from helm vs operator?

Yeah, whatever changes introduce differences with the helm chart we should also apply those changes to the helm chart. In this case, the helm chart should also skip the ingress when k8s ingress domain is not set and should use the localhost port on registry-viewer fqdn.

Issue opened here to track this: devfile/api#1539

@Jdubrick
Copy link
Contributor Author

Jdubrick commented May 7, 2024

/retest

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Copy link
Member

@michael-valdron michael-valdron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested locally and spotted a reconcile error:

2024-05-07T21:03:17Z    ERROR   controllers.DevfileRegistry     Devfile registry server failed to start after 30 seconds, re-queueing...        {"devfileregistry": {"name":"devf
ile-registry","namespace":"dr"}, "error": "Get \"https://http//localhost:8080\": dial tcp: lookup http on 10.96.0.10:53: no such host"}

This is because the protocol was included in the local hostname constant which is expected to not yet contain a protocol in this spot.

Moving the constant protocol back into the registry viewer's devfile registries url field as suggested fixes this error.

Update: I also ran into the following error after fixing the previous:

2024-05-07T21:22:30Z    ERROR   controllers.DevfileRegistry     Devfile registry server failed to start after 30 seconds, re-queueing...        {"devfileregistry": {"name":"devfile-registry","namespace":"dr"}, "error": "Get \"https://localhost:8080\": http: server gave HTTP response to HTTPS client"}

I think the protocol for the devfile registry server hostname should be set to http and skip this check anytime the ingress is not created.

pkg/registry/constants.go Outdated Show resolved Hide resolved
pkg/registry/deployment.go Outdated Show resolved Hide resolved
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Copy link
Member

@johnmcollier johnmcollier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, a Devfile Registry deployed on Kubernetes without ingress isn't all that useful. Likewise, depending on how cluster environments are set up, users creating DevfileRegistry CRs may not have access to the operator deployment to check its logs.

I think a better approach would be to block deployment altogether, and report an error status condition in https://github.com/devfile/registry-operator/blob/main/api/v1alpha1/devfileregistry_types.go#L170

@Jdubrick
Copy link
Contributor Author

To be honest, a Devfile Registry deployed on Kubernetes without ingress isn't all that useful. Likewise, depending on how cluster environments are set up, users creating DevfileRegistry CRs may not have access to the operator deployment to check its logs.

I think a better approach would be to block deployment altogether, and report an error status condition in https://github.com/devfile/registry-operator/blob/main/api/v1alpha1/devfileregistry_types.go#L170

@johnmcollier so you think as soon as a deployment is attempted once it is realized that ingress is unset it kills the deployment and then updates the status to represent said stoppage? Probably once it enters reconcile here it would catch the ingress? https://github.com/devfile/registry-operator/blob/main/controllers/devfileregistry_controller.go#L59

@johnmcollier
Copy link
Member

@johnmcollier so you think as soon as a deployment is attempted once it is realized that ingress is unset it kills the deployment and then updates the status to represent said stoppage? Probably once it enters reconcile here it would catch the ingress? https://github.com/devfile/registry-operator/blob/main/controllers/devfileregistry_controller.go#L59

At (or near) the start of the reconcile before we've deployed anything, we probably should check if we're on OpenShift, and if not, check if the ingress domain has been specified. If not, we should error out here with an error message in the status condition set.

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
@Jdubrick
Copy link
Contributor Author

New changes contain the removal of some items that were no longer necessary with the blocking of the deployment (had to cherry pick them as they were mixed in other commits that had items to keep) as well as blocking the deployment if no ingress is set and deploying to a k8s environment.

The following log will show in the operator logs when the deployment is blocked:

2024-05-14T19:14:28Z	INFO	controllers.DevfileRegistry	Blocked deployment due to unset Ingress domain	{"devfileregistry": {"name":"devfile-registry","namespace":"devfile-registry"}}

The following will show in the status for the blocked devfile-registry for those without access to the operator logs:

Status:
  Conditions:
    Last Transition Time:  2024-05-14T19:14:28Z
    Message:               No Ingress domain set for Devfile Registry - Deployment Blocked
    Reason:                DeploymentBlocked
    Status:                Unknown
    Type:                  NoDeployDevfileRegistry
  URL:                     

@openshift-ci openshift-ci bot added the lgtm label May 17, 2024
@Jdubrick
Copy link
Contributor Author

@michael-valdron you had changes requested, are you good with this being merged?

Copy link
Member

@michael-valdron michael-valdron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link

openshift-ci bot commented May 17, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Jdubrick, johnmcollier, michael-valdron

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Jdubrick,johnmcollier,michael-valdron]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Jdubrick Jdubrick merged commit 91b7165 into devfile:main May 21, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants