Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReadyApp framework does not detect failed deployment #4639

Open
lastnico opened this issue Feb 1, 2024 · 15 comments
Open

ReadyApp framework does not detect failed deployment #4639

lastnico opened this issue Feb 1, 2024 · 15 comments

Comments

@lastnico
Copy link

lastnico commented Feb 1, 2024

Following issue #2069, that was closed suggesting usage of ReadyApp framework would solve the case, it unfortunately does not, after testing it.

When using this API (declaring true and calling weblogic.application.ready.ReadyLifecycleManager.getInstance().ready() in some Web Listener for instance), the startup does as following:

  • Weblogic server initializes, and is then ready to serve HTTP requests
  • At this stage, /weblogic/ready returns HTTP 200
  • We trigger a deployment of a webapp (using ReadyApp framework) - via command line, for this experiment
  • As expected, while webapp is initializing, /weblogic/ready now returns HTTP 503
  • At some point in the loading process, we interrupt the deployment via throwing an exception (hence, ReadyLifecycleManager.getInstance().ready() is never called)
  • Nevertheless, /weblogic/ready now returns HTTP 200, which makes the related pod to be used by the respective ingress object (and returning HTTP 404 on the expected webapp context path), and the replica set to never recreate a new replica to attempt a new startup.

This behavior probably comes from the fact the "applicationId" is unregistered from the ReadyApp framework, but, still, it means this API cannot be used to detect failed deployments.

I read that K8S startup probe is not used because the Weblogic K8S operator makes usage of the ReadyApp framework for fine control on deployment startup, but I'm afraid there are no proper ways for this operator to detect (and trigger creation of a new pod) of Weblogic managed server with failed deployment(s).

@jshum2479
Copy link
Member

We use the ready app framework and the WebLogic server handle if all the registered application is ready. I assumed you have set up the WebLogic deployment descriptor to register to the framework.

wls:ready-registrationtrue</wls:ready-registration>

For this to work, it must be successfully registered and initialized before you can call the ready() method to tell the server that this app is ready, if your application failed to initialize before then the framework will not be able to tell. Is it something that you are trying to simulate?

@lastnico
Copy link
Author

lastnico commented Feb 1, 2024

Hi @jshum2479!

Yes, I've tried different scenarios, but in all of them, the <ready-registration>true</wls:ready-registration> is defined at weblogic-application.xml (if not, anyway the ReadyLifecycleManager.getInstance().ready() crashes with an IllegalStateException, as documented)

However, even if it is defined, whenever the webapp crashes at deployment time, the ReadyApp framework /weblogic/ready endpoint changes from HTTP 503 (server not ready) to HTTP 200 (server ready) - while it just faced a webapp deployment crash.

I could understand this behavior, because, in a way, a failed deployment does not exactly mean the server is not "ready", but it this also means we cannot rely on this framework to detect managed server with improper state (= with expected webapp that failed to start), which means such pods could be used to serve traffic, while the webapp is down.

@robertpatrick
Copy link
Member

@lastnico You are going to need to file a Support case with WebLogic Server support. This has nothing to do with the operator.

@lastnico
Copy link
Author

lastnico commented Feb 1, 2024

Thanks. I indeed wondered where is the boundary between:

  • the Kubernetes weblogic operator to exactly identify if the containerised Weblogic server fits the enduser needs (including if the underlying webapps are properly started up) - because I read elsewhere that the Kubernetes startupProbe (that allows defining an HTTP endpoint to poll) was not supported by the Weblogic K8S operator, since the ReadyApp framework (at Weblogic level) was meant for such fine tuning.
  • the ReadyApp framework, which, as such, only defines ready / not ready statuses, and nothing in between like "will never be ready", when a webapp deployment fails.

I can (and will) open a Support ticket to the Weblogic support, but I'm just afraid they'll redirect me here, asking for the K8S weblogic operator to offer ways to identify when the Weblogic server is in an incorrect state (from deployment perspective)

Thanks again!

@robertpatrick
Copy link
Member

robertpatrick commented Feb 1, 2024

@lastnico Feel free to give the Oracle Support people my name and mention that I said it has nothing to do with WKO. They know me and should reach out to me if they have issues.

If you are not satisfied with the ReadyApp framework behavior, you can always use an endpoint for your app to handle readiness yourself...

@lastnico
Copy link
Author

lastnico commented Feb 1, 2024

Thanks for the advice.

About your last sentence, according to K8S operator doc:

https://oracle.github.io/weblogic-kubernetes-operator/managing-domains/domain-lifecycle/liveness-readiness-probe-customization/

Here are the options for customizing the readiness probe and its tuning:

By default, the readiness probe is configured to use the WebLogic Server ReadyApp framework. The ReadyApp > framework allows fine customization of the readiness probe by the application’s participation in the framework. For more > details, see [Using the ReadyApp Framework](https://docs.oracle.com/en/middleware/fusion-middleware/weblogic-> server/12.2.1.4/depgd/managing.html#GUID-C98443B1-D368-4CA4-A7A4-97B86FFD3C28). The readiness probe is used to >determine if the server is ready to accept user requests. The readiness is used to determine when a server should be >included in a load balancer’s endpoints, in the case of a rolling restart, when a restarted server is fully started, and for > various other purposes.

And the 5 properties under "readinessProbe"
https://github.com/oracle/weblogic-kubernetes-operator/blob/release/4.1/documentation/domains/Cluster.md#probe-tuning

failureThreshold
initialDelaySeconds
periodSeconds
successThreshold
timeoutSeconds

I don't find how we could assign another endpoint for readinessProbe. This is actually the reason why I started to investigate on the ReadyApp framework, which, in the end, does not fit detection of failed deployments.

@jshum2479
Copy link
Member

We do not allow configuration of a different probe path just the tuning parameters.

@robertpatrick
Copy link
Member

My mistake.

@rjeberhard The domain spec looks like this should be supported so why is it being prevented.

% kubectl explain domain.spec.serverPod.containers.readinessProbe
GROUP:      weblogic.oracle
KIND:       Domain
VERSION:    v9

FIELD: readinessProbe <Object>

DESCRIPTION:
    <empty>
FIELDS:
  exec	<Object>
    <no description>

  failureThreshold	<integer>
    <no description>

  grpc	<Object>
    <no description>

  httpGet	<Object>
    <no description>

  initialDelaySeconds	<integer>
    <no description>

  periodSeconds	<integer>
    <no description>

  successThreshold	<integer>
    <no description>

  tcpSocket	<Object>
    <no description>

  terminationGracePeriodSeconds	<integer>
    <no description>

  timeoutSeconds	<integer>
    <no description>

@rjeberhard
Copy link
Member

It has never been supported to override the liveness or readiness probes generated for the container running the WebLogic instance. The schema above is for the sidecar containers, if any, added to the pods. This is why there is validation to ensure that this functionality isn't used to try and add a sidecar container with the same name as the main container ("weblogic-server").

The only supported customizations are to timing-related fields, as described here: https://oracle.github.io/weblogic-kubernetes-operator/managing-domains/domain-lifecycle/liveness-readiness-probe-customization/#readiness-probe-customization

We could certainly look at an enhancement to support overriding more fields of the probes. @lastnico, was your idea that you would provide some other application endpoint that consulted the MBeans?

@robertpatrick
Copy link
Member

Or maybe something more basic such as exposing an endpoint in the app to ensure that the app is properly deployed. This is what WebLogic customers did prior to 12.2.1 being released (release that introduced ReadyApp).

@lastnico
Copy link
Author

lastnico commented Feb 2, 2024

@robertpatrick I initially thought of this, yes: A new httpGet property under readinessProbe, to allow overriding /weblogic/ready URL polling

Though, when multiple webapps are deployed on the same server, it'd mean multiple endpoints to poll, so, in this case, relying on the ReadyApp API calls on each webapp would probably be simpler.

However, returning HTTP 503 for failed deployed webapps at ReadyApp means it'll require to change how it behaves currently, so there's maybe lower chances this gets supported.

@robertpatrick
Copy link
Member

@lastnico let us know once you file a Support ticket with a reproducer and I will take a look. Note that your test case should not use Kubernetes…

@lastnico
Copy link
Author

lastnico commented Feb 5, 2024

@robertpatrick Sorry for asking, do you mean a Support case to WebLogic Server support regarding ReadyApp framework, or also/both regarding the K8S weblogic operator, to possibly get a new httpGet property to override readinessProbe?

@robertpatrick
Copy link
Member

do you mean a Support case to WebLogic Server support regarding ReadyApp framework

Yes

@lastnico
Copy link
Author

@robertpatrick It took a while, but the issue was finally submitted to Oracle Weblogic Support under
https://support.oracle.com/epmos/faces/SrDetail?srNumber=3-35814160481

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants