Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enormous amount of TIME_WAIT connections with non-buffered (direct) output #237

Open
TaLoN1x opened this issue Jan 10, 2019 · 7 comments
Open
Labels

Comments

@TaLoN1x
Copy link

TaLoN1x commented Jan 10, 2019

I am using this configuration:

<match **>
  @type kafka
  brokers somebrokers:9092
  default_topic 'sometopic'
  output_data_type json
  max_send_retries 300
  required_acks 0
</match>

And with this I do have a lot of TIME_WAIT connections to kafka node:

# netstat -tulnap | grep TIME_WAIT | wc -l
19699
# netstat -tulnap | grep TIME_WAIT
...
tcp        0      0 192.168.100.22:41907     192.168.100.55:9092      TIME_WAIT   -
tcp        0      0 192.168.100.22:43715     192.168.100.55:9092      TIME_WAIT   -
tcp        0      0 192.168.100.22:35498     192.168.100.55:9092      TIME_WAIT   -
tcp        0      0 192.168.100.22:56291     192.168.100.55:9092      TIME_WAIT   -
tcp        0      0 192.168.100.22:53716     192.168.100.55:9092      TIME_WAIT   -

What could be the cause of fluentd keeping those connections alive until timeout? As fas as I've managed to debug fluentd is not reusing any of established connection in this mode as well.... With this is produces unreasonable load on network devices.

@repeatedly
Copy link
Member

What could be the cause of fluentd keeping those connections alive until timeout?

fluent-plugin-kafka uses ruby-kafka library internally so it depends on ruby-kafka's setting.
out_kafka plugin is for testing purporse without buffering, 1 connection per 1 event in most cases, so I'm not sure this problem happens with buffer based out_kafka_buffered or out_kafka2 plugins.

@TaLoN1x
Copy link
Author

TaLoN1x commented Jan 10, 2019

with buffered connection it doesn't happen in such scale, but still handles connection closure in the wrong way. With buffered output there is a problem of buffer flush, as describes in here:
#101

I am trying to understand what configuration is most relevant for the use case, when I don't need to handle 100% messages and can accept drops, but without building up a buffer trail (fluentd not being able to handle buffer flushes and starts growing up the buffer) not bunch of TCP_WAITS.

Can u advise?

@repeatedly
Copy link
Member

repeatedly commented Jan 10, 2019

Can u advise?

What the advise do you want?
How to reduce TCP_WAITS or other?

@TaLoN1x
Copy link
Author

TaLoN1x commented Jan 10, 2019

Ok, I've tested out both buffered and non buffered options. To my surprise it had mostly the same result...

For the buffered output I used this configuration:

<match **>
    @type kafka2

    brokers             somebrokers:9092 # Set brokers directly

    default_topic sometopic
    <format>
      @type json
    </format>

    <buffer topic,time>
        timekey 10s
        timekey_wait 5s
        flush_at_shutdown true
        retry_wait 5s
        flush_mode immediate
        flush_thread_count 14
        overflow_action throw_exception
    </buffer>

    # ruby-kafka producer options
    max_send_retries 300
    required_acks 0
    max_bytes 1000000
    kafka_agg_max_bytes 1000000
    max_send_limit_bytes 1000000
    discard_kafka_delivery_failed false
    buffer_chunk_limit 1m

  </match>

It still falls to the same problem:

# netstat -tulnap | grep TIME_WAIT | wc -l
16384

I've gone deeper and inspected traffic...
The root cause is that while connection is being closed, Kafka sends FIN_WAIT2 and expects Fluentd to answer with FIN_ACK, but this never happens...

What makes it even more interesting, is that doing all the same with just ruby-kafka I could not replicate the issue.

@TaLoN1x
Copy link
Author

TaLoN1x commented Jan 10, 2019

just for reference. old one kafka_buffered worked just fine!

@repeatedly
Copy link
Member

Interesting. out_kafka2 and out_kafka_buffered uses same approches and behaviour should be same.
I will check out_kafka2 code.

@github-actions
Copy link

github-actions bot commented Jul 6, 2021

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

@github-actions github-actions bot added the stale label Jul 6, 2021
@kenhys kenhys added the bug label Jul 7, 2021
@github-actions github-actions bot removed the stale label Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants