ReadyApp framework does not detect failed deployment #4639

lastnico · 2024-02-01T15:19:39Z

Following issue #2069, that was closed suggesting usage of ReadyApp framework would solve the case, it unfortunately does not, after testing it.

When using this API (declaring true and calling weblogic.application.ready.ReadyLifecycleManager.getInstance().ready() in some Web Listener for instance), the startup does as following:

Weblogic server initializes, and is then ready to serve HTTP requests
At this stage, /weblogic/ready returns HTTP 200
We trigger a deployment of a webapp (using ReadyApp framework) - via command line, for this experiment
As expected, while webapp is initializing, /weblogic/ready now returns HTTP 503
At some point in the loading process, we interrupt the deployment via throwing an exception (hence, ReadyLifecycleManager.getInstance().ready() is never called)
Nevertheless, /weblogic/ready now returns HTTP 200, which makes the related pod to be used by the respective ingress object (and returning HTTP 404 on the expected webapp context path), and the replica set to never recreate a new replica to attempt a new startup.

This behavior probably comes from the fact the "applicationId" is unregistered from the ReadyApp framework, but, still, it means this API cannot be used to detect failed deployments.

I read that K8S startup probe is not used because the Weblogic K8S operator makes usage of the ReadyApp framework for fine control on deployment startup, but I'm afraid there are no proper ways for this operator to detect (and trigger creation of a new pod) of Weblogic managed server with failed deployment(s).

jshum2479 · 2024-02-01T16:14:31Z

We use the ready app framework and the WebLogic server handle if all the registered application is ready. I assumed you have set up the WebLogic deployment descriptor to register to the framework.

wls:ready-registrationtrue</wls:ready-registration>

For this to work, it must be successfully registered and initialized before you can call the ready() method to tell the server that this app is ready, if your application failed to initialize before then the framework will not be able to tell. Is it something that you are trying to simulate?

lastnico · 2024-02-01T16:27:08Z

Hi @jshum2479!

Yes, I've tried different scenarios, but in all of them, the <ready-registration>true</wls:ready-registration> is defined at weblogic-application.xml (if not, anyway the ReadyLifecycleManager.getInstance().ready() crashes with an IllegalStateException, as documented)

However, even if it is defined, whenever the webapp crashes at deployment time, the ReadyApp framework /weblogic/ready endpoint changes from HTTP 503 (server not ready) to HTTP 200 (server ready) - while it just faced a webapp deployment crash.

I could understand this behavior, because, in a way, a failed deployment does not exactly mean the server is not "ready", but it this also means we cannot rely on this framework to detect managed server with improper state (= with expected webapp that failed to start), which means such pods could be used to serve traffic, while the webapp is down.

robertpatrick · 2024-02-01T16:41:36Z

@lastnico You are going to need to file a Support case with WebLogic Server support. This has nothing to do with the operator.

lastnico · 2024-02-01T17:25:18Z

Thanks. I indeed wondered where is the boundary between:

the Kubernetes weblogic operator to exactly identify if the containerised Weblogic server fits the enduser needs (including if the underlying webapps are properly started up) - because I read elsewhere that the Kubernetes startupProbe (that allows defining an HTTP endpoint to poll) was not supported by the Weblogic K8S operator, since the ReadyApp framework (at Weblogic level) was meant for such fine tuning.
the ReadyApp framework, which, as such, only defines ready / not ready statuses, and nothing in between like "will never be ready", when a webapp deployment fails.

I can (and will) open a Support ticket to the Weblogic support, but I'm just afraid they'll redirect me here, asking for the K8S weblogic operator to offer ways to identify when the Weblogic server is in an incorrect state (from deployment perspective)

Thanks again!

robertpatrick · 2024-02-01T17:34:59Z

@lastnico Feel free to give the Oracle Support people my name and mention that I said it has nothing to do with WKO. They know me and should reach out to me if they have issues.

If you are not satisfied with the ReadyApp framework behavior, you can always use an endpoint for your app to handle readiness yourself...

lastnico · 2024-02-01T19:13:44Z

Thanks for the advice.

About your last sentence, according to K8S operator doc:

https://oracle.github.io/weblogic-kubernetes-operator/managing-domains/domain-lifecycle/liveness-readiness-probe-customization/

Here are the options for customizing the readiness probe and its tuning:

By default, the readiness probe is configured to use the WebLogic Server ReadyApp framework. The ReadyApp > framework allows fine customization of the readiness probe by the application’s participation in the framework. For more > details, see [Using the ReadyApp Framework](https://docs.oracle.com/en/middleware/fusion-middleware/weblogic-> server/12.2.1.4/depgd/managing.html#GUID-C98443B1-D368-4CA4-A7A4-97B86FFD3C28). The readiness probe is used to >determine if the server is ready to accept user requests. The readiness is used to determine when a server should be >included in a load balancer’s endpoints, in the case of a rolling restart, when a restarted server is fully started, and for > various other purposes.

And the 5 properties under "readinessProbe"
https://github.com/oracle/weblogic-kubernetes-operator/blob/release/4.1/documentation/domains/Cluster.md#probe-tuning

failureThreshold
initialDelaySeconds
periodSeconds
successThreshold
timeoutSeconds

I don't find how we could assign another endpoint for readinessProbe. This is actually the reason why I started to investigate on the ReadyApp framework, which, in the end, does not fit detection of failed deployments.

jshum2479 · 2024-02-01T20:43:06Z

We do not allow configuration of a different probe path just the tuning parameters.

robertpatrick · 2024-02-01T22:59:21Z

My mistake.

@rjeberhard The domain spec looks like this should be supported so why is it being prevented.

% kubectl explain domain.spec.serverPod.containers.readinessProbe
GROUP:      weblogic.oracle
KIND:       Domain
VERSION:    v9

FIELD: readinessProbe <Object>

DESCRIPTION:
    <empty>
FIELDS:
  exec	<Object>
    <no description>

  failureThreshold	<integer>
    <no description>

  grpc	<Object>
    <no description>

  httpGet	<Object>
    <no description>

  initialDelaySeconds	<integer>
    <no description>

  periodSeconds	<integer>
    <no description>

  successThreshold	<integer>
    <no description>

  tcpSocket	<Object>
    <no description>

  terminationGracePeriodSeconds	<integer>
    <no description>

  timeoutSeconds	<integer>
    <no description>

rjeberhard · 2024-02-01T23:23:11Z

It has never been supported to override the liveness or readiness probes generated for the container running the WebLogic instance. The schema above is for the sidecar containers, if any, added to the pods. This is why there is validation to ensure that this functionality isn't used to try and add a sidecar container with the same name as the main container ("weblogic-server").

The only supported customizations are to timing-related fields, as described here: https://oracle.github.io/weblogic-kubernetes-operator/managing-domains/domain-lifecycle/liveness-readiness-probe-customization/#readiness-probe-customization

We could certainly look at an enhancement to support overriding more fields of the probes. @lastnico, was your idea that you would provide some other application endpoint that consulted the MBeans?

robertpatrick · 2024-02-01T23:51:10Z

Or maybe something more basic such as exposing an endpoint in the app to ensure that the app is properly deployed. This is what WebLogic customers did prior to 12.2.1 being released (release that introduced ReadyApp).

lastnico · 2024-02-02T07:03:27Z

@robertpatrick I initially thought of this, yes: A new httpGet property under readinessProbe, to allow overriding /weblogic/ready URL polling

Though, when multiple webapps are deployed on the same server, it'd mean multiple endpoints to poll, so, in this case, relying on the ReadyApp API calls on each webapp would probably be simpler.

However, returning HTTP 503 for failed deployed webapps at ReadyApp means it'll require to change how it behaves currently, so there's maybe lower chances this gets supported.

robertpatrick · 2024-02-02T12:05:46Z

@lastnico let us know once you file a Support ticket with a reproducer and I will take a look. Note that your test case should not use Kubernetes…

lastnico · 2024-02-05T06:44:57Z

@robertpatrick Sorry for asking, do you mean a Support case to WebLogic Server support regarding ReadyApp framework, or also/both regarding the K8S weblogic operator, to possibly get a new httpGet property to override readinessProbe?

robertpatrick · 2024-02-05T12:44:03Z

do you mean a Support case to WebLogic Server support regarding ReadyApp framework

Yes

lastnico · 2024-02-21T13:59:21Z

@robertpatrick It took a while, but the issue was finally submitted to Oracle Weblogic Support under
https://support.oracle.com/epmos/faces/SrDetail?srNumber=3-35814160481

Thanks!

robertpatrick added the WLS Support label Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReadyApp framework does not detect failed deployment #4639

ReadyApp framework does not detect failed deployment #4639

lastnico commented Feb 1, 2024

jshum2479 commented Feb 1, 2024

lastnico commented Feb 1, 2024 •

edited

robertpatrick commented Feb 1, 2024

lastnico commented Feb 1, 2024

robertpatrick commented Feb 1, 2024 •

edited

lastnico commented Feb 1, 2024

jshum2479 commented Feb 1, 2024

robertpatrick commented Feb 1, 2024

rjeberhard commented Feb 1, 2024

robertpatrick commented Feb 1, 2024

lastnico commented Feb 2, 2024

robertpatrick commented Feb 2, 2024

lastnico commented Feb 5, 2024

robertpatrick commented Feb 5, 2024

lastnico commented Feb 21, 2024

ReadyApp framework does not detect failed deployment #4639

ReadyApp framework does not detect failed deployment #4639

Comments

lastnico commented Feb 1, 2024

jshum2479 commented Feb 1, 2024

lastnico commented Feb 1, 2024 • edited

robertpatrick commented Feb 1, 2024

lastnico commented Feb 1, 2024

robertpatrick commented Feb 1, 2024 • edited

lastnico commented Feb 1, 2024

jshum2479 commented Feb 1, 2024

robertpatrick commented Feb 1, 2024

rjeberhard commented Feb 1, 2024

robertpatrick commented Feb 1, 2024

lastnico commented Feb 2, 2024

robertpatrick commented Feb 2, 2024

lastnico commented Feb 5, 2024

robertpatrick commented Feb 5, 2024

lastnico commented Feb 21, 2024

lastnico commented Feb 1, 2024 •

edited

robertpatrick commented Feb 1, 2024 •

edited