Reject task on worker still starting up #21921

findepi · 2024-05-10T14:13:24Z

When worker node is restarted after a crash, coordinator may be still
unaware of the situation and may attempt to schedule tasks on it.
Ideally the coordinator should not schedule tasks on worker that is not
ready, but in pipelined execution there is currently no way to move a
task. Accepting a request too early will likely lead to some failure
and HTTP 500 (INTERNAL_SERVER_ERROR) response. The coordinator won't
retry on this. Send 503 (SERVICE_UNAVAILABLE) so that request is
retried.

Simpler alternative to #21744.
That other PR has the problem that after worker restarts queries fail for some time because the new instance ID is not propagated.

The test being added here should prove that we can close #21735.

`StartupStatus.startupComplete` was set in production code, but not in tests.

When worker node is restarted after a crash, coordinator may be still unaware of the situation and may attempt to schedule tasks on it. Ideally the coordinator should not schedule tasks on worker that is not ready, but in pipelined execution there is currently no way to move a task. Accepting a request too early will likely lead to some failure and HTTP 500 (INTERNAL_SERVER_ERROR) response. The coordinator won't retry on this. Send 503 (SERVICE_UNAVAILABLE) so that request is retried.

findepi · 2024-05-10T15:15:52Z

draft because of the TMP commit

findepi · 2024-05-10T17:37:33Z

stress test OK https://github.com/trinodb/trino/actions/runs/9034341743/job/24827391786?pr=21921

2024-05-10T17:14:30.7871255Z [INFO] Tests run: 286, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1489 s -- in io.trino.tests.TestWorkerRestart

(the actual number are probably a lie -- #20113)

findepi · 2024-05-10T18:03:04Z

cc @sajjoseph

sajjoseph · 2024-05-10T20:33:41Z

I like the lighter version. Let me try this out with 447 release.
Thanks!

sajjoseph · 2024-05-11T21:48:41Z

I tried out the new PR and so far all are positive signs.
A worker node was restarted. Once the node rejoined the cluster, pending queries (that had tasks on the failed worker) all failed quickly with the following error - REMOTE TASK MISMATCH.

Error Type	INTERNAL_ERROR
Error Code	REMOTE_TASK_MISMATCH (65544)
Stack Trace

io.trino.spi.TrinoException: Could not communicate with the remote task. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes. (<worker_node>:<port>). Expected taskInstanceId: cb26d883-31f5-48a6-b5d1-66a52e8e9750, received taskInstanceId: 59f166fe-4f4f-4b18-b35b-e22589819788
	at io.trino.operator.HttpPageBufferClient$1.onSuccess(HttpPageBufferClient.java:389)
	at io.trino.operator.HttpPageBufferClient$1.onSuccess(HttpPageBufferClient.java:367)
	at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1137)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1570)

Worker started processing fresh queries coming in.

There were few exceptions in the worker node's server log file. I don't think these have anything to do with the current PR. I just included them here to see if any zombie tasks status checks are still causing these errors in the log files. These entries are showing up even after worker node was up for 10+ minutes.

2024-05-11T21:31:09.957Z	WARN	http-client-exchange-1339	org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy	Task run failed
java.lang.NullPointerException: Cannot invoke "org.eclipse.jetty.client.transport.HttpReceiver$NotifiableContentSource.onDataAvailable()" because "this.contentSource" is null
	at org.eclipse.jetty.client.transport.HttpReceiver.responseContentAvailable(HttpReceiver.java:325)
	at org.eclipse.jetty.http2.client.transport.internal.HttpReceiverOverHTTP2.lambda$new$0(HttpReceiverOverHTTP2.java:53)
	at org.eclipse.jetty.util.thread.Invocable$ReadyTask.run(Invocable.java:105)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:478)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:426)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:293)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produce(AdaptiveExecutionStrategy.java:195)
	at org.eclipse.jetty.http2.HTTP2Connection.produce(HTTP2Connection.java:211)
	at org.eclipse.jetty.http2.HTTP2Connection.onFillable(HTTP2Connection.java:158)
	at org.eclipse.jetty.http2.HTTP2Connection$FillableCallback.succeeded(HTTP2Connection.java:449)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
	at org.eclipse.jetty.io.ssl.SslConnection$SslEndPoint.onFillable(SslConnection.java:574)
	at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:390)
	at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:150)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:478)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:441)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:293)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:201)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:311)
	at org.eclipse.jetty.util.thread.MonitoredQueuedThreadPool$1.run(MonitoredQueuedThreadPool.java:73)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:979)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1209)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1164)
	at java.base/java.lang.Thread.run(Thread.java:1570)
	
2024-05-11T21:41:02.461Z        ERROR   async-http-response-87  io.airlift.concurrent.BoundedExecutor   Task failed
java.lang.IllegalStateException: Response does not exist (likely recycled)
        at org.eclipse.jetty.ee10.servlet.ServletChannel.getResponse(ServletChannel.java:297)
        at org.eclipse.jetty.ee10.servlet.ServletApiResponse.getResponse(ServletApiResponse.java:97)
        at org.eclipse.jetty.ee10.servlet.ServletApiResponse.containsHeader(ServletApiResponse.java:117)
        at jakarta.servlet.http.HttpServletResponseWrapper.containsHeader(HttpServletResponseWrapper.java:68)
        at org.glassfish.jersey.servlet.internal.ResponseWriter.writeResponseStatusAndHeaders(ResponseWriter.java:135)
        at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:650)
        at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:387)
        at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:377)
        at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$3.run(ServerRuntime.java:913)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
        at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:266)
        at org.glassfish.jersey.server.ServerRuntime$AsyncResponder.resume(ServerRuntime.java:950)
        at org.glassfish.jersey.server.ServerRuntime$AsyncResponder.resume(ServerRuntime.java:901)
        at io.airlift.jaxrs.AsyncResponseHandler$1.onSuccess(AsyncResponseHandler.java:99)
        at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1137)
        at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:79)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1570)

	2024-05-11T21:41:02.463Z        ERROR   async-http-response-101 io.airlift.concurrent.BoundedExecutor   Task failed
java.lang.IllegalStateException: Response does not exist (likely recycled)
        at org.eclipse.jetty.ee10.servlet.ServletChannel.getResponse(ServletChannel.java:297)
        at org.eclipse.jetty.ee10.servlet.ServletApiResponse.getResponse(ServletApiResponse.java:97)
        at org.eclipse.jetty.ee10.servlet.ServletApiResponse.containsHeader(ServletApiResponse.java:117)
        at jakarta.servlet.http.HttpServletResponseWrapper.containsHeader(HttpServletResponseWrapper.java:68)
        at org.glassfish.jersey.servlet.internal.ResponseWriter.writeResponseStatusAndHeaders(ResponseWriter.java:135)
        at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:650)
        at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:387)
        at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:377)
        at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$3.run(ServerRuntime.java:913)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
        at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:266)
        at org.glassfish.jersey.server.ServerRuntime$AsyncResponder.resume(ServerRuntime.java:950)
        at org.glassfish.jersey.server.ServerRuntime$AsyncResponder.resume(ServerRuntime.java:901)
        at io.airlift.jaxrs.AsyncResponseHandler$1.onSuccess(AsyncResponseHandler.java:99)
        at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1137)
        at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:79)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1570)
        ```

findepi · 2024-05-13T07:30:15Z

There were few exceptions in the worker node's server log file. [...] These entries are showing up even after worker node was up for 10+ minutes.

They look wrong cc @wendigo . i think we can assume this is not related to this PR, as it only changes anything during the startup phase. We should have an issue about them.
@sajjoseph can you double check these occur also without any changes, on current master / 447?
Do these also happen with http/2 disabled?

wendigo · 2024-05-13T07:44:33Z

These errors are use-after-free pattern that we saw in other cases. These basically means that the async response was ordered to complete (either with the response or an exception) but the response was already recycled - this can happen if the client disconnected, Jetty noticed that fact and closed/released underlying resources (which is a correct behaviour btw) but the application didn't notice that fact and tried to complete request anyway. It looks bad but it doesn't necessarily mean a bad thing.

findepi · 2024-05-13T10:05:54Z

It looks bad but it doesn't necessarily mean a bad thing.

Can it mean a bad thing? or should we just ignore these?

wendigo · 2024-05-13T10:48:27Z

@findepi I think that we should ignore.

Fail when there are 2+ Trino announcements

8bd7461

findepi requested review from electrum, wendigo and losipiuk May 10, 2024 14:13

cla-bot bot added the cla-signed label May 10, 2024

findepi force-pushed the findepi/reject-too-early branch 2 times, most recently from 7c05184 to 5e4a87c Compare May 10, 2024 14:17

findepi mentioned this pull request May 10, 2024

Reject task updates addressed to previous failed node #21744

Draft

findepi added 3 commits May 10, 2024 17:15

Test query execution while worker restarts

a7311cf

Fix StartupStatus in tests

b591db1

`StartupStatus.startupComplete` was set in production code, but not in tests.

findepi force-pushed the findepi/reject-too-early branch from 5e4a87c to 9be0554 Compare May 10, 2024 15:15

findepi marked this pull request as draft May 10, 2024 15:15

findepi force-pushed the findepi/reject-too-early branch from 9be0554 to 8fbbd54 Compare May 10, 2024 17:38

findepi marked this pull request as ready for review May 10, 2024 17:38

losipiuk approved these changes May 13, 2024

View reviewed changes

findepi merged commit 5b9ae0d into trinodb:master May 13, 2024
96 checks passed

findepi deleted the findepi/reject-too-early branch May 13, 2024 16:31

github-actions bot added this to the 448 milestone May 13, 2024

colebow mentioned this pull request May 15, 2024

Add Trino 448 release notes #21984

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reject task on worker still starting up #21921

Reject task on worker still starting up #21921

findepi commented May 10, 2024 •

edited

findepi commented May 10, 2024

findepi commented May 10, 2024

findepi commented May 10, 2024

sajjoseph commented May 10, 2024

sajjoseph commented May 11, 2024

findepi commented May 13, 2024

wendigo commented May 13, 2024

findepi commented May 13, 2024

wendigo commented May 13, 2024

Reject task on worker still starting up #21921

Reject task on worker still starting up #21921

Conversation

findepi commented May 10, 2024 • edited

findepi commented May 10, 2024

findepi commented May 10, 2024

findepi commented May 10, 2024

sajjoseph commented May 10, 2024

sajjoseph commented May 11, 2024

findepi commented May 13, 2024

wendigo commented May 13, 2024

findepi commented May 13, 2024

wendigo commented May 13, 2024

findepi commented May 10, 2024 •

edited