Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka Requester Produce/Consume Concerns #5

Open
eapache opened this issue Feb 13, 2016 · 2 comments
Open

Kafka Requester Produce/Consume Concerns #5

eapache opened this issue Feb 13, 2016 · 2 comments

Comments

@eapache
Copy link

eapache commented Feb 13, 2016

It's not super-clear to me exactly what kind of round-trip behaviour you're trying to model, but I suspect the kafka requester isn't doing exactly what you think it's doing (or what you want it to do) for a few reasons:

  • Consuming the produced message is a whole different thing from waiting for the producer request to be ACKed at the protocol level - if the ACK is all you're after, use the SyncProducer instead and drop the consumer entirely.
  • There's no guarantee that the consumer is returning the message you produced. I suppose if you lock down the cluster such that this is the only process talking to it, and you never call Request concurrently, then you're probably OK, but I'm not sure.
  • The consumer sends its consume requests to the server asynchronously where they are held until messages become available, so you're missing 1/2 of one RTT worth of network latency if you really did mean to measure two RTTs per request in the first place.
@kchristidis
Copy link

kchristidis commented Dec 23, 2016

I discovered this package earlier today (was reading your Benchmarking Commits Logs post), and after studying its source code, I had the same observation as @eapache regarding the producer that you are using.

I am not sure why you're going with an AsyncProducer, when you're explicitly looking for a synchronous request. (Also, as @eapache noted, "if the ACK is all you're after, use the SyncProducer instead and drop the consumer entirely.")

@eapache, regarding your second observation — This raised a flag here originally as well. If you inspect the code closely however, you'll see that every process is posting to its own topic. So effectively there is a "lock down" going on and you're guaranteed that the consumer is returning the message that was produced just before.

This brings me to my third observation — the NewBenchmark method has a connections argument. I am not sure why a connection translates to a Requester that posts in a different topic in the Kafka case. Perhaps the question is naive, but shouldn't all of these connections reach out to the same topic?

@tylertreat
Copy link
Owner

The Requester in this case is measuring the end-to-end latency from when a message is published to when it's read. It does this by publishing a message and then immediately waiting for it. IIRC there was no significant difference between using an AsyncProducer and a SyncProducer due to the nature of just publishing and waiting for the published message. The difference was more significant for the throughput test, but to make that meaningful we wait for all the acks to be received before considering the publisher finished (https://github.com/tylertreat/log-benchmarking/blob/master/cmd/throughput/benchmark/kafka.go).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants