Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zebra crashes when an external vxlan interface is added to a bridge #15564

Closed
2 tasks done
varesa opened this issue Mar 16, 2024 · 7 comments · Fixed by #15647
Closed
2 tasks done

Zebra crashes when an external vxlan interface is added to a bridge #15564

varesa opened this issue Mar 16, 2024 · 7 comments · Fixed by #15647
Labels
triage Needs further investigation

Comments

@varesa
Copy link

varesa commented Mar 16, 2024

Description

When a bridge with a member port that is a VXLAN interface with the external property is set, zebra segfaults. The segfault does not happen if an id is provided instead of external

Version

FRRouting 9.1 (frr-test-1) on Linux(5.14.0-284.11.1.el9_2.x86_64).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--localstatedir=/var' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc/frr' '--localstatedir=/var/run/frr' '--disable-static' '--disable-werror' '--enable-irdp' '--enable-multipath=256' '--enable-vtysh' '--enable-ospfclient' '--enable-ospfapi' '--enable-rtadv' '--enable-ldpd' '--enable-pimd' '--enable-pim6d' '--enable-pbrd' '--enable-nhrpd' '--enable-eigrpd' '--enable-babeld' '--enable-vrrpd' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-fpm' '--enable-watchfrr' '--disable-bgp-vnc' '--enable-isisd' '--enable-rpki' '--enable-bfdd' '--enable-pathd' '--enable-snmp' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig' 'CC=gcc' 'CXX=g++' 'LT_SYS_LIBRARY_PATH=/usr/lib64:'

From the FRR RPM repo:

frr-test-1 ~ # rpm -q frr
frr-9.1-01.el9.x86_64

Other versions which may or may not be relevant:

Kernel: 5.14.0-284.11.1.el9_2.x86_64
iproute: iproute-6.1.0-1.el9.x86_64

How to reproduce

Start FRR and run the following commands:

ip link add testbr type bridge
ip link set testbr up
ip link add vxlan1 type vxlan local $SOME_LOCAL_IP dstport 4789 nolearning external
ip link set vxlan1 master testbr

Expected behavior

Zebra/FRR does not segfault.

Actual behavior

Mar 16 21:11:42 frr-test-1 ZEBRA[15223]: Received signal 11 at 1710623502 (si_addr 0x0, PC 0x56031c01ccc4); aborting...
Mar 16 21:11:42 frr-test-1 ZEBRA[15223]: /lib64/libfrr.so.0(zlog_backtrace_sigsafe+0x71) [0x7fde82ebf061]
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: showing active allocations in memory group logging subsystem
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  syslog target                 :      1 *         56
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: showing active allocations in memory group libfrr
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  log thread-local buffer       :      6 *      24608
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  YANG module                   :      8 *         48
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Work queue name string        :      1 *         22
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Work queue item               :      1 *         24
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Work queue                    :      2 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  VTY server                    :      2 *         32
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  VTY                           :      2 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  VRF bit-map                   :      1 *          8
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  VRF                           :      1 *        216
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Vector index                  :  11561 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Vector                        :  11561 *         24
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Typed-heap array              :      1 *        576
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Typed-hash bucket             :     15 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Thread stats                  :     28 *        112
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Thread Poll Info              :     12 *       8192
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Thread master                 :     24 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Thread                        :     27 *        160
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Route node                    :     16 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Route table                   :     15 *         56
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Stream FIFO                   :      7 *         64
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Stream                        :      6 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Privilege information         :      4 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Prefix                        :      3 *         56
Mar 16 21:11:42 frr-test-1 ZEBRA[15223]: /lib64/libfrr.so.0(zlog_signal+0xf5) [0x7fde82ebf265]
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Northbound Configuration      :      2 *         24
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Northbound Node               :    693 *       1192
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  NetNS Name                    :      1 *         18
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  NetNS Context                 :      2 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Nexthop                       :      4 *        152
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Bitfield memory               :      1 *       2052
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Temporary memory              :      1 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Link Node                     :    105 *         24
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Link List                     :     46 *         40
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Connected                     :      3 *         48
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Interface                     :      4 *        280
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Hash Index                    :     64 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Hash Bucket                   :   1002 *         32
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Hash                          :    128 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Graph Node                    :   5749 *         32
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Graph                         :     31 *          8
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  POSIX sync primitives         :     10 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  FRR POSIX Thread              :     10 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  RCU thread                    :      5 *        128
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Command Argument Name         :   1158 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Command Token Help            :   3498 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Command Token Text            :   3498 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Command Tokens                :   4919 *         72
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Host config                   :      6 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Buffer                        :      5 *         24
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: showing active allocations in memory group zebra
Mar 16 21:11:42 frr-test-1 ZEBRA[15223]: /lib64/libfrr.so.0(+0xf36a5) [0x7fde82ef36a5]
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  ZClients                      :      3 *       3584
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Zebra neigh entry             :      2 *         80
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Zebra neigh table             :      1 *          8
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  MH global info                :      1 *        128
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  ZEBRA VRF                     :      1 *       5072
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Zebra VRF table               :      4 *         56
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  RIB table info                :      4 *         24
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  RIB destination               :      8 *         88
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Route Entry                   :      7 *        152
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  PTM BFD process reg table     :      1 *         32
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Zebra Name Space              :      1 *        480
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Nexthop Group Connected       :      4 *         40
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Nexthop Group Entry           :      4 *        144
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  DPlane NSes                   :      1 *         48
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Zebra DPlane Provider         :      1 *        248
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Zebra DPlane Ctx              :      3 *       2544
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Zebra Netlink buffers         :      5 * (variably sized)
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Zebra Interface Information   :      4 *        608
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: showing active allocations in memory group SRv6 Manager
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: showing active allocations in memory group Table Manager
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Table Manager Context         :      1 *         16
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: showing active allocations in memory group Label Manager
Mar 16 21:11:42 frr-test-1 frrinit.sh[15223]: core_handler: memstats:  Label Manager Chunk           :      1 *         20
Mar 16 21:11:42 frr-test-1 ZEBRA[15223]: /lib64/libc.so.6(+0x54df0) [0x7fde82a54df0]
Mar 16 21:11:42 frr-test-1 ZEBRA[15223]: /usr/lib/frr/zebra(zebra_if_dplane_result+0x1794) [0x56031c01ccc4]
Mar 16 21:11:42 frr-test-1 ZEBRA[15223]: /usr/lib/frr/zebra(+0xfb180) [0x56031c080180]
Mar 16 21:11:42 frr-test-1 ZEBRA[15223]: /lib64/libfrr.so.0(event_call+0x84) [0x7fde82f086a4]
Mar 16 21:11:42 frr-test-1 ZEBRA[15223]: /lib64/libfrr.so.0(frr_run+0xd8) [0x7fde82eb90d8]
Mar 16 21:11:42 frr-test-1 ZEBRA[15223]: /usr/lib/frr/zebra(main+0x3be) [0x56031c01069e]
Mar 16 21:11:42 frr-test-1 ZEBRA[15223]: /lib64/libc.so.6(+0x3feb0) [0x7fde82a3feb0]
Mar 16 21:11:42 frr-test-1 ZEBRA[15223]: /lib64/libc.so.6(__libc_start_main+0x80) [0x7fde82a3ff60]
Mar 16 21:11:42 frr-test-1 ZEBRA[15223]: /usr/lib/frr/zebra(_start+0x25) [0x56031c011095]
Mar 16 21:11:42 frr-test-1 ZEBRA[15223]: in thread rib_process_dplane_results scheduled from zebra/zebra_rib.c:4954 rib_dplane_results()
Mar 16 21:11:42 frr-test-1 systemd[1]: Started Process Core Dump (PID 15352/UID 0).
Mar 16 21:11:42 frr-test-1 systemd-coredump[15353]: Resource limits disable core dumping for process 15223 (zebra).
Mar 16 21:11:42 frr-test-1 systemd-coredump[15353]: Process 15223 (zebra) of user 92 dumped core.
Mar 16 21:11:42 frr-test-1 watchfrr[14941]: [HD38Q-0HBRT][EC 268435457] zebra state -> down : read returned EOF
Mar 16 21:11:42 frr-test-1 systemd[1]: systemd-coredump@26-15352-0.service: Deactivated successfully.
Thread 1 "zebra" received signal SIGSEGV, Segmentation fault.
0x00005555555ebcc4 in interface_bridge_vxlan_vlan_vni_map_update (ifp=0x5555559f0940, ctx=0x5555559f2c30) at zebra/interface.c:1797
1797		struct hash *vni_table = NULL;
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-60.el9.x86_64 json-c-0.14-11.el9.x86_64 libcap-2.48-8.el9.x86_64 libffi-3.4.2-7.el9.x86_64 libgcc-11.3.1-4.3.el9.alma.x86_64 libselinux-3.5-1.el9.x86_64 libxcrypt-4.4.18-3.el9.x86_64 libyang-2.1.128-1.el9.x86_64 openssl-libs-3.0.7-6.el9_2.x86_64 p11-kit-0.24.1-2.el9.x86_64 pcre2-10.40-2.el9.x86_64 protobuf-c-1.3.3-13.el9.x86_64 sssd-client-2.8.2-2.el9.x86_64 systemd-libs-252-13.el9_2.x86_64
(gdb) bt
#0  0x00005555555ebcc4 in interface_bridge_vxlan_vlan_vni_map_update (ifp=0x5555559f0940, ctx=0x5555559f2c30) at zebra/interface.c:1797
#1  interface_bridge_vxlan_update (ifp=0x5555559f0940, ctx=0x5555559f2c30) at zebra/interface.c:1864
#2  interface_bridge_handling (zif_type=<optimized out>, ifp=0x5555559f0940, ctx=0x5555559f2c30) at zebra/interface.c:1933
#3  zebra_if_dplane_ifp_handling (ctx=0x5555559f2c30) at zebra/interface.c:2034
#4  zebra_if_dplane_result (ctx=0x5555559f2c30) at zebra/interface.c:2313
#5  0x000055555564f180 in rib_process_dplane_results (thread=<optimized out>) at zebra/zebra_rib.c:4893
#6  0x00007ffff7d086a4 in event_call (thread=0x7fffffffdfe0) at lib/event.c:1970
#7  0x00007ffff7cb90d8 in frr_run (master=0x55555577a770) at lib/libfrr.c:1213
#8  0x00005555555df69e in main (argc=1, argv=0x7fffffffe2c8) at zebra/main.c:486
(gdb) print vni_table
$1 = (struct hash *) 0x0

Additional context

Config is minimal:

frr-test-1# show running-config
Building configuration...

Current configuration:
!
frr version 9.1
frr defaults traditional
hostname frr-test-1
log syslog informational
!
end
frr-test-1#

Checklist

  • I have searched the open issues for this bug.
  • I have not included sensitive information in this report.
@varesa varesa added the triage Needs further investigation label Mar 16, 2024
@varesa

This comment was marked as outdated.

@varesa

This comment was marked as outdated.

@varesa
Copy link
Author

varesa commented Mar 16, 2024

After a bit more experimentation, it seems to me that the issue is that dplane_ctx_get_ifp_vxlan_vni_array(ctx) returns a null pointer, after which vniarray->count fails

(gdb) print vniarray
$3 = (const struct zebra_vxlan_vni_array *) 0x0
(gdb) call dplane_ctx_get_ifp_vxlan_vni_array(ctx)
$4 = (const struct zebra_vxlan_vni_array *) 0x0

@varesa
Copy link
Author

varesa commented Mar 25, 2024

Also reproduced with 6ee9610.

Zebra no longer crashes when vlan_tunnel on is set and a VNI to VLAN mapping is added. However it seems that there is no route from "no interfaces existing" to "single VLAN-aware bridge + single VXLAN interface configured with VNI to VLAN mapping" that doesn't involve the intermediate step of "interfaces configured without mapping" which kills zebra

@c-po
Copy link
Contributor

c-po commented Apr 1, 2024

@varesa I experience the exact same issue on 9.1-106-g13f8f5eff https://vyos.dev/T6167#181765 while working on VyOS.

My commands to reproduce this issue:

ip link add red type vrf table 100
ip link set dev red up
vtysh -c "conf t" -c "vrf red" -c "vni 10000"

ip link add dev br1 type bridge
echo 1 > /sys/class/net/br1/bridge/vlan_filtering
ip link set dev br1 up

ip link add vxlan1 type vxlan dstport 4789 external df unset tos inherit ttl 16 nolearning local 10.0.0.1
ip link set dev vxlan1 master br1

ip link set dev vxlan1 up

The issue is caused by this line

for (i = 0; i < vniarray->count; i++) {
where a NULLptr dereference happens

EDIT: Oh you also referenced that line ;)

@mjstapp
Copy link
Contributor

mjstapp commented Apr 1, 2024

I've opened a simple PR that looks like it protects against this condition.

@c-po
Copy link
Contributor

c-po commented Apr 1, 2024

I've opened a simple PR that looks like it protects against this condition.

Thanks. I can confirm this fixes the issue on my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants