Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stackdriver output plugin broken on arm32v7 docker images since v3.0.0 #8785

Open
rmsaad opened this issue May 2, 2024 · 2 comments
Open

Comments

@rmsaad
Copy link

rmsaad commented May 2, 2024

Bug Report

Describe the bug

The stackdriver output plugin has been broken for arm32v7 release builds (ie. docker images) since v3.0.0.

I have done some digging and this does not seem to occur because of any recently introduced bugs. Instead it seems that previous to this commit: 71746b3 setting FLB_RELEASE=On wouldn't build a release binary unless FLB_DEBUG was also explicitly turned off, so the docker images always included a debug build of fluent-bit until v3.0.0.

To Reproduce

  1. Build release build (FLB_RELEASE=On) for arm32v7 on any commit since: 71746b3.
  2. Add the stackdriver as an output to you .conf file.
  3. fluent-bit will crash with SIGSEGV signal.
Fluent Bit v3.0.2
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

___________.__                        __    __________.__  __          ________
\_   _____/|  |  __ __   ____   _____/  |_  \______   \__|/  |_  ___  _\_____  \
 |    __)  |  | |  |  \_/ __ \ /    \   __\  |    |  _/  \   __\ \  \/ / _(__  <
 |     \   |  |_|  |  /\  ___/|   |  \  |    |    |   \  ||  |    \   / /       \
 \___  /   |____/____/  \___  >___|  /__|    |______  /__||__|     \_/ /______  /
     \/                     \/     \/               \/                        \/

[2024/05/02 15:51:25] [ info] [fluent bit] version=3.0.2, commit=33ce918351, pid=19704
[2024/05/02 15:51:25] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/05/02 15:51:25] [ info] [cmetrics] version=0.7.3
[2024/05/02 15:51:25] [ info] [ctraces ] version=0.4.0
[2024/05/02 15:51:25] [ info] [input:cpu:cpu.0] initializing
[2024/05/02 15:51:25] [ info] [input:cpu:cpu.0] storage_strategy='memory' (memory only)
[2024/05/02 15:51:25] [ info] [output:stackdriver:stackdriver.0] metadata_server set to http://metadata.google.internal
[2024/05/02 15:51:25] [ warn] [output:stackdriver:stackdriver.0] GOOGLE_APPLICATION_CREDENTIALS and GOOGLE_SERVICE_CREDENTIALS are both defined. Defaulting to GOOGLE_APPLICATION_CREDENTIALS
[2024/05/02 15:51:25] [ info] [oauth2] HTTP Status=200
[2024/05/02 15:51:25] [ info] [oauth2] access token from 'oauth2.googleapis.com:443' retrieved
[2024/05/02 15:51:25] [ info] [sp] stream processor started
[2024/05/02 15:51:25] [ info] [output:stackdriver:stackdriver.0] worker #0 started
[2024/05/02 15:52:24] [engine] caught signal (SIGSEGV)
Aborted (core dumped)

In the core dump output below the stack is corrupted and causes dns_ctx to get the illegal address: 0x17cbb6dd.

#0  __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47
#1  0x76a5ba20 in __libc_signal_restore_set (set=0x749f937c) at ../sysdeps/unix/sysv/linux/internal-signals.h:86
#2  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3  0x76a4c322 in __GI_abort () at abort.c:79
#4  0x005362c6 in flb_signal_handler (signal=<optimized out>) at /src/fluent-bit/src/fluent-bit.c:602
#5  <signal handler called>
#6  0x0059efd8 in flb_net_dns_lookup_context_cleanup (dns_ctx=dns_ctx@entry=0x17cbb6dd) at /src/fluent-bit/src/flb_network.c:613
#7  0x00599720 in output_thread (data=0x7608fb80) at /src/fluent-bit/src/flb_output_thread.c:329
#8  0x005a8a0c in step_callback (data=0x7607f1c0) at /src/fluent-bit/src/flb_worker.c:43
#9  0x76f4999e in start_thread (arg=0x273f295a) at pthread_create.c:477
#10 0x76ad202c in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Expected behavior

The stackdriver output plugin should work on arm32v7 release build or at least docker images work. I tested and this isn't a problem for arm64 or x86.

Your Environment

  • Version used: v3.0.2
  • Configuration:
[SERVICE]
    log_level    info
    Flush        60
    Daemon       Off
    HTTP_Server  Off

[INPUT]
    Name cpu
    Tag  gateway_cpu
    Interval_Sec 20

[FILTER]
    Name  modify
    Match *
    Add   labels.gateway_env development

[FILTER]
    Name nest
    Match *
    Operation nest
    Wildcard labels.*
    Nest_under logging.googleapis.com/labels
    Remove_prefix labels.

[OUTPUT]
    Name          stackdriver
    Match         *
    resource      generic_node
    namespace     ${DEV_CODE}
    node_id       ${DEV_ID}
    location      northamerica-northeast1-c
    severity_key  level
  • Operating System and version:
    Test on:
    - Raspbian GNU/Linux 11 (bullseye) (raspberry pi 2b)
    - Debian (bullseye) (embedded Linux pc)
  • Filters and plugins: See config above.

Additional context

I have attached valgrind output below:
valgrind-out.txt

I will be falling back to the v2.2 docker image for now.

@braydonk
Copy link
Contributor

braydonk commented May 2, 2024

FYI @edsiper @leonardo-albertovich @nokute78 this likely affects any threaded input plugin on this platform, not just out_stackdriver. The segfault occurs in the generic output thread loop:

flb_net_dns_lookup_context_cleanup(&dns_ctx);

Haven't actually run it but the only way a segfault makes sense in this stacktrace is if &dns_ctx is a bad address.

@rmsaad
Copy link
Author

rmsaad commented May 2, 2024

(gdb) break /src/fluent-bit/src/flb_output_thread.c:329
Breakpoint 1 at 0xe9718: file /src/fluent-bit/src/flb_output_thread.c, line 329.
(gdb) define print_sp
Type commands for definition of "print_sp".
End with a line saying just "end".
>x/40x $sp
>print &dns_ctx
>step
>print dns_ctx
>continue
>end
(gdb) run -c /etc/fluent/fluent-bit.conf

I created a command in gdb to print out stack memory, &dns_ctx, then step into flb_net_dns_lookup_context_cleanup() and print out dns_ctx. The stack memory looks weird right before the seg fault.

Thread 4 "flb-out-stackdr" hit Breakpoint 1, output_thread (data=0x760f7b80) at /src/fluent-bit/src/flb_output_thread.c:329
329     /src/fluent-bit/src/flb_output_thread.c: No such file or directory.
(gdb)
0x749f9920:     0x00890664      0x76170200      0x00000000      0x00000000
0x749f9930:     0x00000000      0x00000000      0x00000000      0x00000008
0x749f9940:     0x00000000      0x00000008      0x00000000      0x755d0000
0x749f9950:     0x761701c0      0x00000000      0x00000000      0xdeadbeef
0x749f9960:     0x760f7bdc      0x00000000      0x00000000      0x00000000
0x749f9970:     0x00000000      0x00000000      0x00000000      0x00000000
0x749f9980:     0x00000000      0x00000000      0x00000000      0x00000000
0x749f9990:     0x00000000      0x749f9994      0x749f9994      0x749f999c
0x749f99a0:     0x749f999c      0x00000023      0x00008000      0x00000001
0x749f99b0:     0x00000002      0x00000000      0x00000000      0x00000000
$47 = (struct flb_net_dns *) 0x749f9994
flb_net_dns_lookup_context_cleanup (dns_ctx=0x749f9994) at /src/fluent-bit/src/flb_network.c:613
613     /src/fluent-bit/src/flb_network.c: No such file or directory.
$48 = (struct flb_net_dns *) 0x749f9994

Thread 4 "flb-out-stackdr" hit Breakpoint 1, output_thread (data=0x760f7b80) at /src/fluent-bit/src/flb_output_thread.c:329
329     /src/fluent-bit/src/flb_output_thread.c: No such file or directory.
(gdb)
0x749f9920:     0x00890664      0x76170200      0x00000000      0x00000000
0x749f9930:     0x00000000      0x00000000      0x00000000      0x00000008
0x749f9940:     0x00000000      0x00000008      0x00000000      0x755d0000
0x749f9950:     0x761701c0      0x00000000      0x00000000      0xdeadbeef
0x749f9960:     0x760f7bdc      0x00000000      0x00000000      0x00000000
0x749f9970:     0x00000000      0x00000000      0x00000000      0x00000000
0x749f9980:     0x00000000      0x00000000      0x00000000      0x00000000
0x749f9990:     0x00000000      0x749f9994      0x749f9994      0x749f999c
0x749f99a0:     0x749f999c      0x00000023      0x00008000      0x00000001
0x749f99b0:     0x00000002      0x00000000      0x00000000      0x00000000
$49 = (struct flb_net_dns *) 0x749f9994
flb_net_dns_lookup_context_cleanup (dns_ctx=0x749f9994) at /src/fluent-bit/src/flb_network.c:613
613     /src/fluent-bit/src/flb_network.c: No such file or directory.
$50 = (struct flb_net_dns *) 0x749f9994

Thread 4 "flb-out-stackdr" hit Breakpoint 1, output_thread (data=0x760f7b80) at /src/fluent-bit/src/flb_output_thread.c:329
329     /src/fluent-bit/src/flb_output_thread.c: No such file or directory.
(gdb)
0x749f9920:     0x00890664      0x76170200      0x00000000      0x00000000
0x749f9930:     0x00000000      0x00000000      0x00000000      0x00000008
0x749f9940:     0x00000000      0x00000008      0x00000000      0x755d0000
0x749f9950:     0x761701c0      0x00000000      0x00000000      0xdeadbeef
0x749f9960:     0x760f7bdc      0x00000000      0x00000000      0x00000000
0x749f9970:     0x00000000      0x00000000      0x00000000      0x00000000
0x749f9980:     0x00000000      0x00000000      0x00000000      0x00000000
0x749f9990:     0x00000000      0x749f9994      0x749f9994      0x749f999c
0x749f99a0:     0x749f999c      0x00000023      0x00008000      0x00000001
0x749f99b0:     0x00000002      0x00000000      0x00000000      0x00000000
$51 = (struct flb_net_dns *) 0x749f9994
flb_net_dns_lookup_context_cleanup (dns_ctx=0x749f9994) at /src/fluent-bit/src/flb_network.c:613
613     /src/fluent-bit/src/flb_network.c: No such file or directory.
$52 = (struct flb_net_dns *) 0x749f9994

Thread 4 "flb-out-stackdr" hit Breakpoint 1, output_thread (data=0x760f7b80) at /src/fluent-bit/src/flb_output_thread.c:329
329     /src/fluent-bit/src/flb_output_thread.c: No such file or directory.
(gdb)
0x749f9920:     0x00890664      0x76170200      0x00000000      0x00000000
0x749f9930:     0x00000000      0x00000000      0x00000000      0x7552f000
0x749f9940:     0x760c1560      0x761b6000      0x760f7bec      0x755d0000
0x749f9950:     0x761701c0      0x00000000      0x00000000      0xdeadbeef
0x749f9960:     0x760f7bdc      0x00000000      0x760c1560      0x00000000
0x749f9970:     0x00000000      0x00006100      0x00000000      0x00000000
0x749f9980:     0x00000000      0x00000000      0x00000000      0x00000000
0x749f9990:     0x00000000      0x754e7074      0x754e7074      0x749f999c
0x749f99a0:     0x749f999c      0x00000023      0x00008000      0x00000001
0x749f99b0:     0x00000002      0x00000000      0x00000000      0x00000000
$53 = (struct flb_net_dns *) 0x749f9994
flb_net_dns_lookup_context_cleanup (dns_ctx=0x17cbc827) at /src/fluent-bit/src/flb_network.c:613
613     /src/fluent-bit/src/flb_network.c: No such file or directory.
$54 = (struct flb_net_dns *) 0x17cbc827

Thread 4 "flb-out-stackdr" received signal SIGSEGV, Segmentation fault.
0x004eefd8 in flb_net_dns_lookup_context_cleanup (dns_ctx=0x17cbc827) at /src/fluent-bit/src/flb_network.c:613

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants