Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-44785] - Built-in request timeout support #174

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

oleg-nenashev
Copy link
Member

@oleg-nenashev oleg-nenashev commented Jun 29, 2017

This is a PoC implementation of the timeout support in Remoting API, See one of the use-cases in JENKINS-44785

  • - Implement working PoC
  • - TBD make decision about switching to Java 8. The PoC needs it for the java.time.Duration class, but it can be replaced if we stay on Java 7
  • - If Java 8 is selected, implement new call() default implementations in interfaces (with TimeoutExceptions) to address the JENKINS-44785 request from @jglick
  • - channel#callAsync() should also support timeouts on the remote side
  • - TBD: If we stay on Java 7, stick to InterruptedException?
  • - Weaponize it! Set some default timeouts for user requests and RPC calls

@reviewbybees @jglick

@@ -0,0 +1,108 @@
/*
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is an adapted version of jenkinsci/workflow-support-plugin@c810b38

@ghost
Copy link

ghost commented Jun 29, 2017

This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation.

@jglick jglick self-requested a review June 29, 2017 15:25
@jglick
Copy link
Member

jglick commented Jun 29, 2017

channel#callAsync() should also support timeouts on the remote side

Why? I see no need for it.

jglick
jglick previously requested changes Jun 29, 2017
pom.xml Outdated Show resolved Hide resolved

public <V,T extends Throwable>
V call(@Nonnull Callable<V,T> callable, @CheckForNull Duration performTimeout, @CheckForNull Duration executionTimeout)
throws IOException, T, InterruptedException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What, no TimeoutException?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call gets interrupted now instead of timeout. See the PR TODO list

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I interpreted the TODO in the PR as referring to methods added to VirtualChannel with default implementations. This is just being added to a class so there is no Java 8 dependency. You chose the throws clause for the method and I am arguing that it should be throwing TimeoutException. See signatures of Future.get and so on.

}

public <V,T extends Throwable>
V call(@Nonnull Callable<V,T> callable, @CheckForNull Duration performTimeout, @CheckForNull Duration executionTimeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think executionTimeout is necessary. Just keep it to performTimeout. No one cares about the difference. A caller simply wants to ensure that a network call does not block forever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. We want to prevent hanging of requests on both sides.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any need for special infrastructure to control delays on the remote side, i.e., in the body of the callable. That is just up to the discretion of whoever is writing that callable—if it is in fact doing something which might block indefinitely, it may instead select a method variant with an appropriate timeout. But from the perspective of an RPC caller, what is important is just that the calling thread is not blocked for too long in networking.

t.setStackTrace(thread.getStackTrace());
LOGGER.log(Level.FINE, "Interrupting " + thread + " after " + delay + " " + unit, t);
}
thread.interrupt();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this utility is inappropriate here.

I introduced it in Pipeline code (later elsewhere) because I was calling predefined APIs (Channel.call, ultimately) which offered no timeout (à la Future.get(long, TimeUnit)), and from reading the source code I knew that the implementation would typically be inside an Object.wait() loop in Request.call, which I had no control over and which waited with no bound except a response, channel closing, or a thread interruption. So this was certainly a workaround.

But if you are implementing timeouts in remoting itself, there is no reason to resort to that. You can simply make Request.call not wait forever. It can wait for the configured timeout, and not use the while loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if you are implementing timeouts in remoting itself, there is no reason to resort to that. You can simply make Request.call not wait forever. It can wait for the configured timeout, and not use the while loop.

It will solve hanging of the calls only on one side of the channel. I want to address it on both

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above. I think this is not the right approach.

* @since TODO
*/
@Restricted(NoExternalUse.class)
public class Timer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, unnecessary if you implement timeouts in the more natural way.

@@ -45,6 +45,7 @@
import java.io.UnsupportedEncodingException;
import java.lang.ref.WeakReference;
import java.net.URL;
import java.time.Duration;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use long + TimeUnit if you wanted to keep Java 7 compatibility. (But I see no reason not to move to 8 immediately.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, no reasons from the Jenkins community PoV since the LTS is on Java 8

@jtnord
Copy link
Member

jtnord commented Jun 30, 2017

channel#callAsync() should also support timeouts on the remote side

Why? I see no need for it.

One reason is that you ask for something to happen on the remote side and it takes a while to process and if it can not return within the default you do not want to waste resources continuing to do it.
This may not be say an Agent where you may decide resources are cheap, but where remoting is used inter master.
For example - executing a groovy script on a bunch of inter-connected masters - but the groovy script accidentally uses "while (true)" without a break clause, using the timeout on the remote side the script can be interrupted whilst it is still running.

@oleg-nenashev
Copy link
Member Author

I suppose @jglick just expected the timeout on one side. In this PR I have intentionally added timeouts on both sides. As @jtnord says, such timeout may be useful if the request hangs due to whatever reason, especially since we have a limited threadpool for them.

@jglick
Copy link
Member

jglick commented Jun 30, 2017

w.r.t. callAsync

you ask for something to happen on the remote side and it takes a while to process and if it can not return within the default you do not want to waste resources continuing to do it

So the body of the callable is free to impose any time limit it sees appropriate. For that matter a poorly written callable might be consuming hundreds of megs of heap for no good reason. This is no different from local method calls—not a concern of Remoting.

@oleg-nenashev
Copy link
Member Author

This is no different from local method calls—not a concern of Remoting.

As a maintainer of Remoting, I think having a timeout for such calls is important. Do you block the PR due to that? Or just "IMHO YAGNI"?

@stephenc
Copy link
Member

stephenc commented Jul 3, 2017

Imho I agree with @jglick

Timingboutvthe remote operation should not be a concern of the remoting api... (or at best it should be something that the caller opts-in e.g. By using a wrapper channel)

I am inclined to object, but at this point I will just give a stern look and ask you to critically self-review ;-)

@oleg-nenashev
Copy link
Member Author

or at best it should be something that the caller opts-in

Currently it's opt-in since the default timeout is "no timeout"

@stephenc
Copy link
Member

stephenc commented Jul 3, 2017

Currently it's opt-in since the default timeout is "no timeout"

But you are littering the API by multiplying methods.

When somebody needs a timeout on the remote execution they probably just want you to provide them with a wrapping callable that does the timeout for them.

That would simplify the API and simplify the implementation.

IMHO less methods to choose from is better. I could probably go so far as to provide a wrapper to the callable for the caller timeout too so that, in effect, the caller just adds the timeouts to their callables

ch.call(localTimeout(remoteTimeout(new Callable<...>(...) {...})));

but it is fine to keep the local timeout as a parameter

ch.call(withTimeout(new Callable<...>(...){...}), localTimeout);

as that way the withTimeout static callable decorator can also be reused locally without confusion

@jglick
Copy link
Member

jglick commented Jul 3, 2017

provide a wrapper to the callable for the caller timeout too

That is essentially what I attempted to do with the Timeout utility in Pipeline code, but it is not a good solution here—hence my RFE (which is overshot by this PR). More: #174 (comment)

Do you block the PR due to that?

Well, -0.5. I think this PR is solving problems it did not need to solve, and solving the problems it did need to solve the wrong way.

To reiterate, my original complaint in JENKINS-44785 was that, say,

FilePath f;
try {
    if (f.isDirectory()) {
        // …
    }
} catch (IOException | InterruptedException x) {
    // …
}

could block indefinitely due to problems in the network layer, being an example of one of the fundamental fallacies of distributed computing; and my request was for some variant like

FilePath f;
try {
    if (f.isDirectoryInterruptible()) {
        // …
    }
} catch (IOException | InterruptedException | TimeoutException x) {
    // …
}

or

FilePath f;
try {
    if (f.isDirectory(30, TimeUnit.SECONDS)) {
        // …
    }
} catch (IOException | InterruptedException | TimeoutException x) {
    // …
}

which would make explicit the fact that the network operation might hang for various arbitrary reasons and the caller must be prepared to deal with it.

@jglick jglick dismissed their stale review July 4, 2017 16:33

Premature to approve or reject PR until there is consensus on goals.

# Conflicts:
#	pom.xml
#	src/main/java/hudson/remoting/Channel.java
#	src/main/java/hudson/remoting/RemoteInvocationHandler.java
#	src/main/java/hudson/remoting/Request.java
#	src/test/java/hudson/remoting/ChannelTest.java
Copy link
Member Author

@oleg-nenashev oleg-nenashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged it, almost no mistakes

@@ -553,10 +555,10 @@ public void handle(Command cmd) {
lastCommandReceivedAt = receivedAt;
if (logger.isLoggable(Level.FINE)) {
logger.fine("Received " + cmd);
} else if (logger.isLoggable(Level.FINER)) {
logger.log(Level.FINER, "Received command " + cmd, cmd.createdAt);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge defect To be fixed

@jglick jglick marked this pull request as draft September 24, 2021 11:59
@timja
Copy link
Member

timja commented Sep 24, 2021

close?

@jglick
Copy link
Member

jglick commented Sep 24, 2021

Maybe, so long as stale-pr is added to the corresponding issue.

@jeffret-b
Copy link
Contributor

I've been leaving it around as a reference for some possible ideas, thought the actual value is small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants