[JENKINS-44785] - Built-in request timeout support #174

oleg-nenashev · 2017-06-29T13:30:31Z

This is a PoC implementation of the timeout support in Remoting API, See one of the use-cases in JENKINS-44785

- Implement working PoC
- TBD make decision about switching to Java 8. The PoC needs it for the java.time.Duration class, but it can be replaced if we stay on Java 7
- If Java 8 is selected, implement new call() default implementations in interfaces (with TimeoutExceptions) to address the JENKINS-44785 request from @jglick
- channel#callAsync() should also support timeouts on the remote side
- TBD: If we stay on Java 7, stick to InterruptedException?
- Weaponize it! Set some default timeouts for user requests and RPC calls

@reviewbybees @jglick

oleg-nenashev · 2017-06-29T13:36:27Z

src/main/java/org/jenkinsci/remoting/util/Timeout.java

@@ -0,0 +1,108 @@
+/*


This class is an adapted version of jenkinsci/workflow-support-plugin@c810b38

ghost · 2017-06-29T15:24:26Z

This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation.

jglick · 2017-06-29T21:59:14Z

channel#callAsync() should also support timeouts on the remote side

Why? I see no need for it.

pom.xml

jglick · 2017-06-29T22:06:20Z

src/main/java/hudson/remoting/Channel.java

+
+    public <V,T extends Throwable>
+    V call(@Nonnull Callable<V,T> callable, @CheckForNull Duration performTimeout, @CheckForNull Duration executionTimeout) 
+            throws IOException, T, InterruptedException {


What, no TimeoutException?

The call gets interrupted now instead of timeout. See the PR TODO list

I interpreted the TODO in the PR as referring to methods added to VirtualChannel with default implementations. This is just being added to a class so there is no Java 8 dependency. You chose the throws clause for the method and I am arguing that it should be throwing TimeoutException. See signatures of Future.get and so on.

jglick · 2017-06-29T22:09:28Z

src/main/java/hudson/remoting/Channel.java

+    }
+
+    public <V,T extends Throwable>
+    V call(@Nonnull Callable<V,T> callable, @CheckForNull Duration performTimeout, @CheckForNull Duration executionTimeout) 


I do not think executionTimeout is necessary. Just keep it to performTimeout. No one cares about the difference. A caller simply wants to ensure that a network call does not block forever.

Yes and no. We want to prevent hanging of requests on both sides.

I do not see any need for special infrastructure to control delays on the remote side, i.e., in the body of the callable. That is just up to the discretion of whoever is writing that callable—if it is in fact doing something which might block indefinitely, it may instead select a method variant with an appropriate timeout. But from the perspective of an RPC caller, what is important is just that the calling thread is not blocked for too long in networking.

jglick · 2017-06-29T22:17:25Z

src/main/java/org/jenkinsci/remoting/util/Timeout.java

+                    t.setStackTrace(thread.getStackTrace());
+                    LOGGER.log(Level.FINE, "Interrupting " + thread + " after " + delay + " " + unit, t);
+                }
+                thread.interrupt();


I think this utility is inappropriate here.

I introduced it in Pipeline code (later elsewhere) because I was calling predefined APIs (Channel.call, ultimately) which offered no timeout (à la Future.get(long, TimeUnit)), and from reading the source code I knew that the implementation would typically be inside an Object.wait() loop in Request.call, which I had no control over and which waited with no bound except a response, channel closing, or a thread interruption. So this was certainly a workaround.

But if you are implementing timeouts in remoting itself, there is no reason to resort to that. You can simply make Request.call not wait forever. It can wait for the configured timeout, and not use the while loop.

But if you are implementing timeouts in remoting itself, there is no reason to resort to that. You can simply make Request.call not wait forever. It can wait for the configured timeout, and not use the while loop.

It will solve hanging of the calls only on one side of the channel. I want to address it on both

See above. I think this is not the right approach.

jglick · 2017-06-29T22:18:01Z

src/main/java/org/jenkinsci/remoting/util/Timer.java

+ * @since TODO
+ */
+@Restricted(NoExternalUse.class)
+public class Timer {


As above, unnecessary if you implement timeouts in the more natural way.

jglick · 2017-06-29T22:18:47Z

src/main/java/hudson/remoting/Channel.java

@@ -45,6 +45,7 @@
 import java.io.UnsupportedEncodingException;
 import java.lang.ref.WeakReference;
 import java.net.URL;
+import java.time.Duration;


Could use long + TimeUnit if you wanted to keep Java 7 compatibility. (But I see no reason not to move to 8 immediately.)

Yes, no reasons from the Jenkins community PoV since the LTS is on Java 8

jtnord · 2017-06-30T09:53:25Z

channel#callAsync() should also support timeouts on the remote side

Why? I see no need for it.

One reason is that you ask for something to happen on the remote side and it takes a while to process and if it can not return within the default you do not want to waste resources continuing to do it.
This may not be say an Agent where you may decide resources are cheap, but where remoting is used inter master.
For example - executing a groovy script on a bunch of inter-connected masters - but the groovy script accidentally uses "while (true)" without a break clause, using the timeout on the remote side the script can be interrupted whilst it is still running.

oleg-nenashev · 2017-06-30T10:08:48Z

I suppose @jglick just expected the timeout on one side. In this PR I have intentionally added timeouts on both sides. As @jtnord says, such timeout may be useful if the request hangs due to whatever reason, especially since we have a limited threadpool for them.

jglick · 2017-06-30T15:01:25Z

w.r.t. callAsync

you ask for something to happen on the remote side and it takes a while to process and if it can not return within the default you do not want to waste resources continuing to do it

So the body of the callable is free to impose any time limit it sees appropriate. For that matter a poorly written callable might be consuming hundreds of megs of heap for no good reason. This is no different from local method calls—not a concern of Remoting.

oleg-nenashev · 2017-07-03T06:53:31Z

This is no different from local method calls—not a concern of Remoting.

As a maintainer of Remoting, I think having a timeout for such calls is important. Do you block the PR due to that? Or just "IMHO YAGNI"?

stephenc · 2017-07-03T07:13:08Z

Imho I agree with @jglick

Timingboutvthe remote operation should not be a concern of the remoting api... (or at best it should be something that the caller opts-in e.g. By using a wrapper channel)

I am inclined to object, but at this point I will just give a stern look and ask you to critically self-review ;-)

oleg-nenashev · 2017-07-03T08:06:38Z

or at best it should be something that the caller opts-in

Currently it's opt-in since the default timeout is "no timeout"

stephenc · 2017-07-03T09:04:16Z

Currently it's opt-in since the default timeout is "no timeout"

But you are littering the API by multiplying methods.

When somebody needs a timeout on the remote execution they probably just want you to provide them with a wrapping callable that does the timeout for them.

That would simplify the API and simplify the implementation.

IMHO less methods to choose from is better. I could probably go so far as to provide a wrapper to the callable for the caller timeout too so that, in effect, the caller just adds the timeouts to their callables

ch.call(localTimeout(remoteTimeout(new Callable<...>(...) {...})));

but it is fine to keep the local timeout as a parameter

ch.call(withTimeout(new Callable<...>(...){...}), localTimeout);

as that way the withTimeout static callable decorator can also be reused locally without confusion

jglick · 2017-07-03T21:19:09Z

provide a wrapper to the callable for the caller timeout too

That is essentially what I attempted to do with the Timeout utility in Pipeline code, but it is not a good solution here—hence my RFE (which is overshot by this PR). More: #174 (comment)

Do you block the PR due to that?

Well, -0.5. I think this PR is solving problems it did not need to solve, and solving the problems it did need to solve the wrong way.

To reiterate, my original complaint in JENKINS-44785 was that, say,

FilePath f;
try {
    if (f.isDirectory()) {
        // …
    }
} catch (IOException | InterruptedException x) {
    // …
}

could block indefinitely due to problems in the network layer, being an example of one of the fundamental fallacies of distributed computing; and my request was for some variant like

FilePath f;
try {
    if (f.isDirectoryInterruptible()) {
        // …
    }
} catch (IOException | InterruptedException | TimeoutException x) {
    // …
}

or

FilePath f;
try {
    if (f.isDirectory(30, TimeUnit.SECONDS)) {
        // …
    }
} catch (IOException | InterruptedException | TimeoutException x) {
    // …
}

which would make explicit the fact that the network operation might hang for various arbitrary reasons and the caller must be prepared to deal with it.

Premature to approve or reject PR until there is consensus on goals.

# Conflicts: # pom.xml # src/main/java/hudson/remoting/Channel.java # src/main/java/hudson/remoting/RemoteInvocationHandler.java # src/main/java/hudson/remoting/Request.java # src/test/java/hudson/remoting/ChannelTest.java

oleg-nenashev

I merged it, almost no mistakes

oleg-nenashev · 2017-11-12T22:24:59Z

src/main/java/hudson/remoting/Channel.java

@@ -553,10 +555,10 @@ public void handle(Command cmd) {
                lastCommandReceivedAt = receivedAt;
                if (logger.isLoggable(Level.FINE)) {
                    logger.fine("Received " + cmd);
-                } else if (logger.isLoggable(Level.FINER)) {
-                    logger.log(Level.FINER, "Received command " + cmd, cmd.createdAt);


Merge defect To be fixed

timja · 2021-09-24T12:27:43Z

close?

jglick · 2021-09-24T13:31:54Z

Maybe, so long as stale-pr is added to the corresponding issue.

jeffret-b · 2021-09-24T13:36:54Z

I've been leaving it around as a reference for some possible ideas, thought the actual value is small.

[JENKINS-44785] - Remoting request built-in timeout PoC

a0f8b7b

oleg-nenashev added needs-review work-in-progress labels Jun 29, 2017

oleg-nenashev commented Jun 29, 2017

View reviewed changes

oleg-nenashev removed the work-in-progress label Jun 29, 2017

jglick self-requested a review June 29, 2017 15:25

jglick previously requested changes Jun 29, 2017

View reviewed changes

oleg-nenashev self-assigned this Nov 12, 2017

oleg-nenashev added work-in-progress and removed needs-review labels Nov 12, 2017

oleg-nenashev commented Nov 12, 2017

View reviewed changes

jglick marked this pull request as draft September 24, 2021 11:59

jglick removed the work-in-progress label Sep 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JENKINS-44785] - Built-in request timeout support #174

[JENKINS-44785] - Built-in request timeout support #174

oleg-nenashev commented Jun 29, 2017 •

edited

oleg-nenashev Jun 29, 2017

ghost commented Jun 29, 2017

jglick commented Jun 29, 2017

jglick Jun 29, 2017

oleg-nenashev Jun 30, 2017

jglick Jun 30, 2017

jglick Jun 29, 2017

oleg-nenashev Jun 30, 2017

jglick Jun 30, 2017

jglick Jun 29, 2017

oleg-nenashev Jun 30, 2017

jglick Jun 30, 2017

jglick Jun 29, 2017

jglick Jun 29, 2017

oleg-nenashev Jun 30, 2017

jtnord commented Jun 30, 2017

oleg-nenashev commented Jun 30, 2017

jglick commented Jun 30, 2017

oleg-nenashev commented Jul 3, 2017

stephenc commented Jul 3, 2017

oleg-nenashev commented Jul 3, 2017

stephenc commented Jul 3, 2017

jglick commented Jul 3, 2017

oleg-nenashev left a comment

oleg-nenashev Nov 12, 2017

timja commented Sep 24, 2021

jglick commented Sep 24, 2021

jeffret-b commented Sep 24, 2021

[JENKINS-44785] - Built-in request timeout support #174

Are you sure you want to change the base?

[JENKINS-44785] - Built-in request timeout support #174

Conversation

oleg-nenashev commented Jun 29, 2017 • edited

Choose a reason for hiding this comment

ghost commented Jun 29, 2017

jglick commented Jun 29, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtnord commented Jun 30, 2017

oleg-nenashev commented Jun 30, 2017

jglick commented Jun 30, 2017

oleg-nenashev commented Jul 3, 2017

stephenc commented Jul 3, 2017

oleg-nenashev commented Jul 3, 2017

stephenc commented Jul 3, 2017

jglick commented Jul 3, 2017

oleg-nenashev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timja commented Sep 24, 2021

jglick commented Sep 24, 2021

jeffret-b commented Sep 24, 2021

oleg-nenashev commented Jun 29, 2017 •

edited