Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pending records not sent when exiting #11

Open
piequi opened this issue Nov 9, 2023 · 1 comment · May be fixed by #12
Open

Pending records not sent when exiting #11

piequi opened this issue Nov 9, 2023 · 1 comment · May be fixed by #12

Comments

@piequi
Copy link

piequi commented Nov 9, 2023

Hi there!

We recently discovered that the last log lines of a short-living container were not sent on shutdown.

Fluentbit with the SQS plugin are collecting the logs from a side-car container in an ECS task. When the main container exists, the SIGTERM signal is sent to Fluentbit which makes it stop after the Grace period.

Looking at the plugin code, when FLBPluginExit() is invoked, the pending records in SqsRecords are not sent to SQS queue. A simple status is returned but nothing else is done :

func FLBPluginExit() int {

Padding the logs with 10 dummy lines at the end is our current workaround. Some of those padding logs are never sent obvisouly.

@piequi
Copy link
Author

piequi commented Dec 4, 2023

To describe the issue a bit differently, this output plugin is using its own buffering mechanism that may delay (a lot) sending some log records, waiting for BatchSize size to be reached.

At the moment, this plugin will send a SQS message batch when, and only when, there are BatchSize records to batch together. Which means if fluentbit flushes its memory buffer to send the corresponding in-flight records and if the records count isn't an exact multiple of BatchSize, there will be pending records in the plugin memory. When fluentbit will flush new records, those pending records will be part of the first batch sent (if any).

There are several concerns here :

  • if fluentbit stops or crashes while there are pending records in the plugin memory, they'll be lost
  • if the source is emitting logs very slowly, some logs may be sent very late; plugin could even wait indefinitely before sending them if BatchSize is never reached
  • fluentbit has many parameters to tweak buffering and backpressure, and the plugin logic should follow the same principles without introducing new ones (this is obviously largely opinionated 😄)

Doing some experiments around this issue, I came up with some code updates. PR is coming.

@piequi piequi linked a pull request Dec 4, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

1 participant