Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadic crash in rd_kafka_buf_callback() #4673

Open
6 of 7 tasks
GerKr opened this issue Apr 5, 2024 · 0 comments
Open
6 of 7 tasks

Sporadic crash in rd_kafka_buf_callback() #4673

GerKr opened this issue Apr 5, 2024 · 0 comments

Comments

@GerKr
Copy link

GerKr commented Apr 5, 2024

Description

In some rare cases the librdkafka.dll crashes. The crashdump shows a bad memory-access
while EnterCriticalSection() is executed. For more details look below in the
section "How to reproduce".

How to reproduce

As it happens very rarely I could not reproduce it.
But I analysed the crashdump and saved following call-stack with some manually added notes.
The crashdump comes from the version v1.6.1 of librdkafka. So the line numbers correspond with this version.
The line marked with "===>" is never reached, when I tried to reproduce the error.

rd_kafka_broker_ops_serve() rdkafka_broker.c:3345 -> 3351
case RD_KAFKA_OP_TERMINATE
rd_kafka_broker_op_serve() rdkafka_broker.c:2950 -> 3276
rd_kafka_broker_fail(rkb, LOG_DEBUG, rdkafka_broker.c:520 -> 577
RD_KAFKA_RESP_ERR__DESTROY,
"Client is terminating");
rd_kafka_bufq_purge(..., 2. param: rd_kafka_bufq_t *rkbufq=&tmpq_waitresp, ...) rdkafka_buf.c:245 -> 256
TAILQ_FOREACH_SAFE(rkbuf, &rkbufq->rkbq_bufs, rkbuf_link, tmp) rdkafka_buf.c:255
===> rd_kafka_buf_callback(..., 5.param: rd_kafka_buf_t *request=rkbuf) rdkafka_buf.c:450 -> 495
rd_kafka_buf_destroy(rkbuf=request) rdkafka_buf.h:804 macro
=>
rd_refcnt_destroywrapper(REFCNT=&(rkbuf)->rkbuf_refcnt, ...) rd.h:355 macro
=>
rd_refcnt_sub(R=REFCNT) rd.h:401 macro
=>
rd_refcnt_sub0(rd_refcnt_t * R) rd.h:325 -> 328
mtx_lock(&R->lock)
EnterCriticalSection()

Additional info:
The crashdump withih the EnterCriticalSection() can exactly be reproduced with a simple program,
which calls the EnterCriticalSection() without calling the InitializeCriticalSection() before.
Exactly this seems to happen, when there are buffers available and the marked line of the call stack is executed.

IMPORTANT: Always try to reproduce the issue on the latest released version (see https://github.com/confluentinc/librdkafka/releases), if it can't be reproduced on the latest version the issue has been fixed.
As I don't know how to reproduce the situation, where buffers are available during the purge of kafka-bufq, I can't tell,
if the error is still available. A source compare of v1.6.1 against v2.3.0 did not show me, that anything was corrected in this direction.

Proposal for making the code more defensive:
In mtx_init() save, that the initialization has taken place.
In mtx_lock() check, if initialization has been done. If not, then implicitely do the initialization.

Checklist

IMPORTANT: We will close issues where the checklist has not been completed.

Please provide the following information:

  • librdkafka version v1.6.1 and most probably v2.3.0
  • Apache Kafka version: N/A
  • librdkafka client configuration: N/A
  • Operating system: Win Server2019, Win10, Win11
  • Provide logs: call stack - see in "How to reproduce"
  • Provide broker log excerpts: N/A
  • Critical issue: crash kills the complete application
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant