Skip to content

Producer

Shai Sachs edited this page Jan 19, 2021 · 15 revisions

Avoid producing synchronously if at all possible

You should be aware that producing synchronously will reduce maximum throughput dramatically. For example, you might achieve a rate of about 500,000 msg/s if you don't wait on the result of each produce call before sending the next. If you do, you'll get at best ~300 msg/s.

This means you should typically not await your ProduceAsync calls, the exception being if you are in a highly concurrent scenario (such as a web request handler). In this case, although an individual call my be slow, it will not impact other calls happening simultaneously.

ProduceAsync vs Produce

The two methods are equivalent, but tailored to different usage patterns. The Produce method is more efficient, and you should care about that if your throughput is high (>~ 20k msgs/s). Even if your throughput is low, the difference between Produce and ProduceAsync will be negligible compared to whatever else you application is doing. As a general rule, Produce is recommended.

Another difference is that tasks returned by ProduceAsync compete on thread pool threads, so ordering is not guaranteed. By comparison, delivery report callbacks passed to Produce are all called on the same thread, so the ordering will exactly match the order they are known to be completed by the client.

Errors

Most importantly, you should be checking the result of each produce call. In the case of ProduceAsync, any error will be exposed in the form of a KafkaException if you await the call, else it will show up in the Tasks Exception property (with the IsFaulted property set to true). In the case of Produce, errors are exposed via the Error property on the DeliveryReport instance returned in the callback.

Note: when transactions become available in v1.4, you will typically be able to rely on the result of the transaction, commit - won't need to consider the produce delivery reports for errors if you are using transactions.

Events sent to the error event handler and log event handler are typically informational in nature - you can typically ignore them. In very rare cases, events on the error handler may be marked as fatal (IsFatal set to true). This can occur if idempotence or transaction semantics cannot be satisfied due to an unlikely scenario on the server (e.g. unexpected log truncation), or bug in the system.

The word 'error', and even 'fatal' are used a fair bit in the logs in benign situations. Don't be easily alarmed by this - if the producer is still operating, these are likely nothing to worry about.

Optimal LingerMs Setting

A commonly adjusted configuration property on the producer is LingerMs - the minimum time between batches of messages being sent to the cluster. Larger values allow for more batching, which increases throughput. Smaller values may reduce latency. The tradeoff is complicated somewhat as throughput increases though because a smaller LingerMs will result in more broker requests, putting greater load on the server, which in turn may reduce throughput.

A good general purpose setting is 5 (the default is 0.5).

Disabling Nagle

If your throughput is very low, and you really care about latency, you should probably set SocketNagleDisable to true.

Acks

The default value for the Acks configuration property is All (prior to v1.0, the default was 1). This means that if a delivery report returns without error, the message has been replicated to all replicas in the in-sync replica set.

If you have EnableIdempotence set to true, Acks must be all.

You should generally prefer having acks set to all. There's no real benefit to setting it lower. Note that this won't improve end-to-end latency because messages must be replicated to all in-sync replicas before they are available for consumption - you will just get to know whether the message has been successfully written to the leader replica a little earlier.

How can I ensure delivery of messages in the order specified?

Set EnableIdempotence to true. This has very little overhead - there's not much downside to having this on by default

Before the idempotent producer was available, you could achieve this by setting MaxInFlight to 1 (at the expense of reduced throughput). Messages are sent in the same order as produced. However, when idempotence is not enabled, in case of failure (temporary network failure for example), Confluent.Kafka will try to resend the data (if the Retries config property is set to > 0, the default is 2), which can cause reordering (note that this will not happen frequently, only in case of failure).

Custom Partitioners

Custom partitioners are not available in the .NET Client yet. You can get around this simply by producing to specific partitions in your produce call (i.e. partition outside the framework of the client).