Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject task on worker still starting up #21921

Merged
merged 4 commits into from May 13, 2024

Conversation

findepi
Copy link
Member

@findepi findepi commented May 10, 2024

When worker node is restarted after a crash, coordinator may be still
unaware of the situation and may attempt to schedule tasks on it.
Ideally the coordinator should not schedule tasks on worker that is not
ready, but in pipelined execution there is currently no way to move a
task. Accepting a request too early will likely lead to some failure
and HTTP 500 (INTERNAL_SERVER_ERROR) response. The coordinator won't
retry on this. Send 503 (SERVICE_UNAVAILABLE) so that request is
retried.

Simpler alternative to #21744.
That other PR has the problem that after worker restarts queries fail for some time because the new instance ID is not propagated.

The test being added here should prove that we can close #21735.

`StartupStatus.startupComplete` was set in production code, but not in
tests.
When worker node is restarted after a crash, coordinator may be still
unaware of the situation and may attempt to schedule tasks on it.
Ideally the coordinator should not schedule tasks on worker that is not
ready, but in pipelined execution there is currently no way to move a
task.  Accepting a request too early will likely lead to some failure
and HTTP 500 (INTERNAL_SERVER_ERROR) response. The coordinator won't
retry on this.  Send 503 (SERVICE_UNAVAILABLE) so that request is
retried.
@findepi findepi marked this pull request as draft May 10, 2024 15:15
@findepi
Copy link
Member Author

findepi commented May 10, 2024

draft because of the TMP commit

@findepi
Copy link
Member Author

findepi commented May 10, 2024

stress test OK https://github.com/trinodb/trino/actions/runs/9034341743/job/24827391786?pr=21921

2024-05-10T17:14:30.7871255Z [INFO] Tests run: 286, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1489 s -- in io.trino.tests.TestWorkerRestart

(the actual number are probably a lie -- #20113)

@findepi findepi marked this pull request as ready for review May 10, 2024 17:38
@findepi
Copy link
Member Author

findepi commented May 10, 2024

cc @sajjoseph

@sajjoseph
Copy link
Contributor

I like the lighter version. Let me try this out with 447 release.
Thanks!

@sajjoseph
Copy link
Contributor

I tried out the new PR and so far all are positive signs.
A worker node was restarted. Once the node rejoined the cluster, pending queries (that had tasks on the failed worker) all failed quickly with the following error - REMOTE TASK MISMATCH.

Error Type	INTERNAL_ERROR
Error Code	REMOTE_TASK_MISMATCH (65544)
Stack Trace

io.trino.spi.TrinoException: Could not communicate with the remote task. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes. (<worker_node>:<port>). Expected taskInstanceId: cb26d883-31f5-48a6-b5d1-66a52e8e9750, received taskInstanceId: 59f166fe-4f4f-4b18-b35b-e22589819788
	at io.trino.operator.HttpPageBufferClient$1.onSuccess(HttpPageBufferClient.java:389)
	at io.trino.operator.HttpPageBufferClient$1.onSuccess(HttpPageBufferClient.java:367)
	at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1137)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1570)

Worker started processing fresh queries coming in.

There were few exceptions in the worker node's server log file. I don't think these have anything to do with the current PR. I just included them here to see if any zombie tasks status checks are still causing these errors in the log files. These entries are showing up even after worker node was up for 10+ minutes.

2024-05-11T21:31:09.957Z	WARN	http-client-exchange-1339	org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy	Task run failed
java.lang.NullPointerException: Cannot invoke "org.eclipse.jetty.client.transport.HttpReceiver$NotifiableContentSource.onDataAvailable()" because "this.contentSource" is null
	at org.eclipse.jetty.client.transport.HttpReceiver.responseContentAvailable(HttpReceiver.java:325)
	at org.eclipse.jetty.http2.client.transport.internal.HttpReceiverOverHTTP2.lambda$new$0(HttpReceiverOverHTTP2.java:53)
	at org.eclipse.jetty.util.thread.Invocable$ReadyTask.run(Invocable.java:105)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:478)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:426)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:293)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produce(AdaptiveExecutionStrategy.java:195)
	at org.eclipse.jetty.http2.HTTP2Connection.produce(HTTP2Connection.java:211)
	at org.eclipse.jetty.http2.HTTP2Connection.onFillable(HTTP2Connection.java:158)
	at org.eclipse.jetty.http2.HTTP2Connection$FillableCallback.succeeded(HTTP2Connection.java:449)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
	at org.eclipse.jetty.io.ssl.SslConnection$SslEndPoint.onFillable(SslConnection.java:574)
	at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:390)
	at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:150)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:478)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:441)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:293)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:201)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:311)
	at org.eclipse.jetty.util.thread.MonitoredQueuedThreadPool$1.run(MonitoredQueuedThreadPool.java:73)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:979)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1209)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1164)
	at java.base/java.lang.Thread.run(Thread.java:1570)
	
2024-05-11T21:41:02.461Z        ERROR   async-http-response-87  io.airlift.concurrent.BoundedExecutor   Task failed
java.lang.IllegalStateException: Response does not exist (likely recycled)
        at org.eclipse.jetty.ee10.servlet.ServletChannel.getResponse(ServletChannel.java:297)
        at org.eclipse.jetty.ee10.servlet.ServletApiResponse.getResponse(ServletApiResponse.java:97)
        at org.eclipse.jetty.ee10.servlet.ServletApiResponse.containsHeader(ServletApiResponse.java:117)
        at jakarta.servlet.http.HttpServletResponseWrapper.containsHeader(HttpServletResponseWrapper.java:68)
        at org.glassfish.jersey.servlet.internal.ResponseWriter.writeResponseStatusAndHeaders(ResponseWriter.java:135)
        at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:650)
        at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:387)
        at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:377)
        at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$3.run(ServerRuntime.java:913)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
        at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:266)
        at org.glassfish.jersey.server.ServerRuntime$AsyncResponder.resume(ServerRuntime.java:950)
        at org.glassfish.jersey.server.ServerRuntime$AsyncResponder.resume(ServerRuntime.java:901)
        at io.airlift.jaxrs.AsyncResponseHandler$1.onSuccess(AsyncResponseHandler.java:99)
        at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1137)
        at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:79)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1570)

	2024-05-11T21:41:02.463Z        ERROR   async-http-response-101 io.airlift.concurrent.BoundedExecutor   Task failed
java.lang.IllegalStateException: Response does not exist (likely recycled)
        at org.eclipse.jetty.ee10.servlet.ServletChannel.getResponse(ServletChannel.java:297)
        at org.eclipse.jetty.ee10.servlet.ServletApiResponse.getResponse(ServletApiResponse.java:97)
        at org.eclipse.jetty.ee10.servlet.ServletApiResponse.containsHeader(ServletApiResponse.java:117)
        at jakarta.servlet.http.HttpServletResponseWrapper.containsHeader(HttpServletResponseWrapper.java:68)
        at org.glassfish.jersey.servlet.internal.ResponseWriter.writeResponseStatusAndHeaders(ResponseWriter.java:135)
        at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:650)
        at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:387)
        at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:377)
        at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$3.run(ServerRuntime.java:913)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
        at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:266)
        at org.glassfish.jersey.server.ServerRuntime$AsyncResponder.resume(ServerRuntime.java:950)
        at org.glassfish.jersey.server.ServerRuntime$AsyncResponder.resume(ServerRuntime.java:901)
        at io.airlift.jaxrs.AsyncResponseHandler$1.onSuccess(AsyncResponseHandler.java:99)
        at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1137)
        at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:79)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1570)
        ```

@findepi
Copy link
Member Author

findepi commented May 13, 2024

There were few exceptions in the worker node's server log file. [...] These entries are showing up even after worker node was up for 10+ minutes.

They look wrong cc @wendigo . i think we can assume this is not related to this PR, as it only changes anything during the startup phase. We should have an issue about them.
@sajjoseph can you double check these occur also without any changes, on current master / 447?
Do these also happen with http/2 disabled?

@wendigo
Copy link
Contributor

wendigo commented May 13, 2024

These errors are use-after-free pattern that we saw in other cases. These basically means that the async response was ordered to complete (either with the response or an exception) but the response was already recycled - this can happen if the client disconnected, Jetty noticed that fact and closed/released underlying resources (which is a correct behaviour btw) but the application didn't notice that fact and tried to complete request anyway. It looks bad but it doesn't necessarily mean a bad thing.

@findepi
Copy link
Member Author

findepi commented May 13, 2024

It looks bad but it doesn't necessarily mean a bad thing.

Can it mean a bad thing? or should we just ignore these?

@wendigo
Copy link
Contributor

wendigo commented May 13, 2024

@findepi I think that we should ignore.

@findepi findepi merged commit 5b9ae0d into trinodb:master May 13, 2024
96 checks passed
@findepi findepi deleted the findepi/reject-too-early branch May 13, 2024 16:31
@github-actions github-actions bot added this to the 448 milestone May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Failed worker doesn't handle requests properly on recovery
4 participants