Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement proper ECMP route handling #432

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

KanjiMonster
Copy link
Contributor

@KanjiMonster KanjiMonster commented Apr 30, 2024

Implement ECMP route handling:

  • use set -> ecmp group id matching, for stable groups regardless of reachability of nexthops
  • make nh_{un}reachable_notification update ecmp routes, and not just mark them as (un)routable

TODO:

  • reuse old ecmp ids like we do for l3 interface ids (to avoid running out of ids) - may uncover bugs
  • more testing

@KanjiMonster KanjiMonster changed the title implement proper ECMP route handlind implement proper ECMP route handling Apr 30, 2024
@KanjiMonster KanjiMonster linked an issue Apr 30, 2024 that may be closed by this pull request
@KanjiMonster KanjiMonster force-pushed the jogo_multipath_routing branch 2 times, most recently from 7fcd1e4 to a39899b Compare May 8, 2024 08:52
@KanjiMonster KanjiMonster force-pushed the jogo_multipath_routing branch 2 times, most recently from ce9d99f to 76b9ba0 Compare May 29, 2024 14:57
Currently we unconditionally delete the IPv6LL route when it get's
deleted from the routing table. But this is a per-interface route, so
just because one route is removed does not mean that there are none
left.

So the following sequence

    $ ip link set up port1
    $ ip link set up port2
    $ ip link set down port1

will result in the IPv6LL route being disabled, although port1 still has
one.

So add a check on deletion if there are any remaining interfaces left
with the IPv6LL route, and only actually delete it if there aren't any.

Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
L3 neighbours are always directly connected, so they need to be on link
scoped routes.

Fixes neighbours being mistakenly assumed as routable.

Fixes: 0fdba0d ("nl_l3: only route l3 neighs if we have a route for them")
Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
L3 Neighbours are always on-link, so we only need to update their state
when handling link scope routes.

Fixes enabling/disabling neighbours when handling default routes.

Fixes: 0fdba0d ("nl_l3: only route l3 neighs if we have a route for them")
Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Like we do when adding a route, ignore routes with no nexthops.

Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
We already rejected routes with nnhs == 0 earlier, so nnhs cannot be 0
at this stage. Instead, we have only two options, nnhs == 1 (single
nexthop or on-link route), or nnhs > 1 (ecmp route).

So convert the switch to a if-else and make the 0 for on-link and routes
with unresolved neighbor explicit.

Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
We used the number of total nexthops for determining ecmp routes on
creation, so we need to do the same on deletion.

Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
To allow dynamically adding or removing nexthops based on their
reachability, allow modifying L3 ECMP groups instead of just adding and
removing them.

Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
we have multiple places where we want to add a l3 egress for l3
neighbor, but need to resolve the port in case it is on a bridge.

So add a helper for that, significantly simplifying
nl_l3::add_l3_neigh_egress(), as we can now use the same path for both
cases.

Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
In preparation of moving ECMP group mapping from l3 interface ids to
nh_stubs, split collecting nh_stubs for the nexthops of a route into its
own function.

Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Let nl_l3::add_l3_unicast_route() use the new helper.

Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Add a std::set<nh_stub> to l3_ecmp_id mapping to allow mapping ecmp
groups based on their nominal nexthops, not their reachable nexthops.

Add an appropriate hashing function for nh_stub to allow putting it in
map. Since this pushes the object size over 64 bytes, this triggers a
new warning, so adress this as well:

| ../git/src/netlink/nl_l3.cc:46:21: warning: loop variable 'v' creates a copy from type 'const basebox::nh_stub' [-Wrange-loop-construct]
|    46 |     for (const auto v : arg) {
|       |                     ^
| ../git/src/netlink/nl_l3.cc:46:21: note: use reference type to prevent copying
|    46 |     for (const auto v : arg) {
|       |                     ^
|       |                     &

Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Properly handle ECMP routes in nh_{un}reachable_notification(), and
update ECMP groups dynamically. Since we now call get_l3_interface_id()
for nexthops that may not exist, demote the warning if there is no l3
interface yet.

As a side effect, we now also handle VRF on nexthop getting unreachable.

Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Libnl ops that support updating objects may modify the object, even if
we haven't handled the notification yet, causing older stored
notifications to use an object "from the future".

Avoid this by cloning the new object. Both nl_object_clone() and
nl_object_put() are NULL-safe, so we don't need to do any nullptr
checks.

Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Add generic code for retrieving a route for a dst with a nexthop on a
certain link.

Since libnl may update route objects, make sure we clone it to ensure it
remains unchanged by concurrent updates.

Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Use the new function to ensure route lookups always retrieve the
expected route.

Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
When a route and its nexthop (neigh) get removed at the same time, we
may process the route removal before the neigh removal.

On route removal we try to unregister for nexthop notifications, and
remove any pending l3 interfaces.

Since cnetlink::get_neighbour() only asks the cache, it only returns
live neighs, but since the neigh was already removed from the cache,
it wont find any.

At the same time, since we did not process the neigh removal yet, we are
still registered for nh unreachable notifications, but since from the
perspective of the nl_l3::del_l3_route(), the nexthop is already
unreachable, so we fail to remove the unreachable notification.

Fix this by also going through the pending neigh deletions to find the
reference in case there is a neigh delete pending.

Fixes: dfe4485 ("add netlink functions")
Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
With a recent fix, libnl will now send the updated object instead of the
update as new for callback_v2(). Unfortuntely the mdb code relied on
getting incremental updates as new, so this change broke mdb handling.

To fix this, rework the mdb handling the following:

1. convert all entries of old and new into seperate, ordered sets
2. for any entry in the old, but not new leave the group
3. for any entry in the new, but not old join the group

This with empty sets for old respective new this can also be used new
and delete of the object.

Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ECMP only supports previously configured/learned next-hops
1 participant