You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are trying to use FRR/bgpd to originate a large number of prefixes and have run into issues where it occasionally starts up with incomplete config, especially when the system is loaded:
On our test system, the issue happens occasionally during a normal restart but is 100% reproducible when the system is CPU loaded (I run one instance of yes > /dev/null & for every CPU core).
While trying to debug this, I noticed that the problem goes away if I add --no-fork to the vtysh -b command in /usr/lib/frr/frrcommon.sh.
Version
Reproduced with both 9.1 and and a fresh clone from master
How to reproduce
Only bgpd is enabled in /etc/frr/daemons:
zebra_options=" -A 127.0.0.1 -s 90000000"
bgpd_options=" -A 127.0.0.1 --no_kernel"
frr_profile="traditional"
frr.conf I've been using to repro, with a lot of synthetic test /32s removed:
After debugging this some, I see watchfrr is giving up because it takes too long to read the config. From systemd journal:
Apr 22 17:09:03 ip-10-0-1-65 watchfrr[19556]: [ZE9RA-19PS5] restart all child process 19557 still running after 20 seconds, sending signal 15
Apr 22 17:09:03 ip-10-0-1-65 watchfrr[19556]: [SK7QP-A2GT9] restart all process 19557 terminated due to signal 15
Essentially, vtysh -b is being killed before config load completes, leaving a partially-configured system behind.
It looks like watchfrr has code that's supposed to handle this - reading_configuration is set to true once watchfrr has finished its config load. Unfortunately, now that vtysh -b forks, watchfrr may process its config long before the other daemons, causing the above timeout.
I have been able to work-around this by setting watchfrr_options="--restart-timeout=60" in /etc/frr/daemons. Editing frrcommon.sh to pass --no-fork to vtysh -b also works, but is of course slower.
Description
We are trying to use FRR/bgpd to originate a large number of prefixes and have run into issues where it occasionally starts up with incomplete config, especially when the system is loaded:
On our test system, the issue happens occasionally during a normal restart but is 100% reproducible when the system is CPU loaded (I run one instance of
yes > /dev/null &
for every CPU core).While trying to debug this, I noticed that the problem goes away if I add
--no-fork
to thevtysh -b
command in/usr/lib/frr/frrcommon.sh
.Version
How to reproduce
Only bgpd is enabled in /etc/frr/daemons:
frr.conf I've been using to repro, with a lot of synthetic test /32s removed:
Expected behavior
FRR should start with a complete set of config (
vtysh -c 'sh run' | wc -l
should return > 24,000 lines)Actual behavior
FRR starts without all config loaded.
Additional context
No response
Checklist
The text was updated successfully, but these errors were encountered: