Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request Socket takes just over 30 seconds to send a message #912

Open
trumpetmonk opened this issue Feb 10, 2022 · 1 comment
Open

Request Socket takes just over 30 seconds to send a message #912

trumpetmonk opened this issue Feb 10, 2022 · 1 comment

Comments

@trumpetmonk
Copy link

We are experiencing an issue where occasionally, messages are taking just over 30 seconds to send on a simple request socket. This seems to occur when the system has many messages sent out at the same time. When we eventually DO send out the message, the response is received almost immediately.

  • We are running on a non-Oracle JVM that is compatible with jdk 1.8.
  • We previously were using JMQ v.3.4 and upgraded to v5.2 to be compatible with the rest of the system.
  • We are using 1 Zmq.Context in our application
  • When we want to send a message, we create a ClientSocketThread that gets a REQ socket from the context on instantiation and invokes the send after we call start() on the thread.
  • I've put in logging to try and determine if the send() or receive() call was taking a long time, but, due to the fact that the send() is non-blocking, I was able to determine the process running at the endpoint didn't receive our message for almost 30 due to their logging and when the send back their response.

Here is some code to give you an idea of what we're doing.

createClientSocketThread(serverPort, context, message, this).start();

Create the clientThread and start the thread. The instantiation looks like this:

	public ClientSocketThread(int port, OurZmqContext context, byte[] message, CallbackHandler handler, boolean sendResponseAsAnEvent) {
		super("ClientSocketThread on port " + port);

		this.port = port;
		this.message = message;
		this.handler = handler;
		this.context = context;
		this.sendResponseAsEvent = sendResponseAsAnEvent;

		SERVER_ENDPOINT = "localhost:" + this.port;

		requester = context.getSocket(SocketType.REQ);
	}

Once the thread starts, we do something like this:

	@Override
	public void run() {
		connectAndSendMessage();
	}

	private void connectAndSendMessage() {
			requester.connect(SERVER_ENDPOINT);
			sendRequestAndDispatchResponse(message);
			requester.close();
	}

Lastly, the send request function looks like this (with some debug logging added)

void sendRequestAndDispatchResponse(byte[] msg) {
		long timeBeforeSend = System.currentTimeMillis();
		long timeAfterSend;
		long timeAfterReceive;
		if (requester != null) {
			try {
				// requester.setReceiveTimeout(ZMQ_TIMEOUT); 
				requester.sendMessage(msg);
				timeAfterSend = System.currentTimeMillis();
				checkSendTime(timeBeforeSend, timeAfterSend);
				byte[] responseMessage = requester.receiveMessage();
				timeAfterReceive = System.currentTimeMillis();
				checkReceiveTime(timeAfterSend, timeAfterReceive);
				notifyHandler(responseMessage);
			} catch (ZMQException e) {
				throw new IllegalStateException("ZMQ Error when sending message. \n" + e);
			}
		} 
	}

The checkTimeout functions just check if the time difference is >30 seconds and log a message. When we were setting the ReceiveTimeout(30000), the socket would bail and return a null message to the handler but by letting the socket wait just a little longer, we were getting the response in 30150 milliseconds or so. Sometimes it takes up to 40 seconds, but, most instances of this issue show a time just a few hundred milliseconds longer than 30 seconds. The handler, in this case, calls back into the rest of our code to process the message response.

Some questions:

  1. Are we doing things horribly wrong by putting our asynchronous messages on a thread we create ourselves? My understanding was that the receive() call DOES block and so we don't want to tie down our main application code on that blocking call.
  2. Should our system have only 1 zmq context? We have it configured to use 2 IO threads, but, given the specs, I can't imagine it requires more.
  3. If the above are 'ok' but not ideal, what could possibly be causing the message to sit in the ZMQ code for 30+ seconds? This is where I'm stuck and can't seem to figure out why our messages take so long to get to their destination.

Thanks for any help you can provide and I'll happily share more information if possible.

@trumpetmonk
Copy link
Author

Further investigation and paying attention to when this issue occurs revealed the following info:

  • The issue happens when our system goes to 'sleep'
  • We currently run our java code on two different platforms, both of which go to 'sleep' differently.
    • In one case, the CPU is completely stopped, but in the other case, a sig stop (I believe) is sent to the Java process and its
      children processes.
  • I believe that if the Jeromq code used System.nanoTime, we would not encounter this issue on the first platform.
  • However, a simple java test showed that if a sig stop is sent, nanoTime and currentTimeMillis stay in sync.

Bottom line, is there a way to have a reasonable timeout set on a socket in JeroMQ that can account for the entire CPU going to sleep or an OS Sig stop sent to the java process?

Thanks for any insights you can provide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant