Kafka Requester Produce/Consume Concerns #5

eapache · 2016-02-13T23:27:40Z

It's not super-clear to me exactly what kind of round-trip behaviour you're trying to model, but I suspect the kafka requester isn't doing exactly what you think it's doing (or what you want it to do) for a few reasons:

Consuming the produced message is a whole different thing from waiting for the producer request to be ACKed at the protocol level - if the ACK is all you're after, use the SyncProducer instead and drop the consumer entirely.
There's no guarantee that the consumer is returning the message you produced. I suppose if you lock down the cluster such that this is the only process talking to it, and you never call Request concurrently, then you're probably OK, but I'm not sure.
The consumer sends its consume requests to the server asynchronously where they are held until messages become available, so you're missing 1/2 of one RTT worth of network latency if you really did mean to measure two RTTs per request in the first place.

The text was updated successfully, but these errors were encountered:

kchristidis · 2016-12-23T01:49:05Z

I discovered this package earlier today (was reading your Benchmarking Commits Logs post), and after studying its source code, I had the same observation as @eapache regarding the producer that you are using.

I am not sure why you're going with an AsyncProducer, when you're explicitly looking for a synchronous request. (Also, as @eapache noted, "if the ACK is all you're after, use the SyncProducer instead and drop the consumer entirely.")

@eapache, regarding your second observation — This raised a flag here originally as well. If you inspect the code closely however, you'll see that every process is posting to its own topic. So effectively there is a "lock down" going on and you're guaranteed that the consumer is returning the message that was produced just before.

This brings me to my third observation — the NewBenchmark method has a connections argument. I am not sure why a connection translates to a Requester that posts in a different topic in the Kafka case. Perhaps the question is naive, but shouldn't all of these connections reach out to the same topic?

tylertreat · 2016-12-23T18:02:23Z

The Requester in this case is measuring the end-to-end latency from when a message is published to when it's read. It does this by publishing a message and then immediately waiting for it. IIRC there was no significant difference between using an AsyncProducer and a SyncProducer due to the nature of just publishing and waiting for the published message. The difference was more significant for the throughput test, but to make that meaningful we wait for all the acks to be received before considering the publisher finished (https://github.com/tylertreat/log-benchmarking/blob/master/cmd/throughput/benchmark/kafka.go).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kafka Requester Produce/Consume Concerns #5

Kafka Requester Produce/Consume Concerns #5

eapache commented Feb 13, 2016

kchristidis commented Dec 23, 2016 •

edited

tylertreat commented Dec 23, 2016

Kafka Requester Produce/Consume Concerns #5

Kafka Requester Produce/Consume Concerns #5

Comments

eapache commented Feb 13, 2016

kchristidis commented Dec 23, 2016 • edited

tylertreat commented Dec 23, 2016

kchristidis commented Dec 23, 2016 •

edited