Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traceroute in batfish says no route but traceroute from the router works fine #8981

Open
bharmarsameer opened this issue Mar 28, 2024 · 27 comments

Comments

@bharmarsameer
Copy link

Describe the bug and expected behavior
A clear and concise description of what the bug is and what you expect to happen instead.

Screenshot 2024-03-28 at 3 59 45 PM

In the above topology I am trying to trace from R1's lo0 to R6 lo0. When i login to the R1 directly and run traceroute using source lo0 I see the trace fine but batfish shows no route

R1(config-router-bgp)#traceroute 2.2.2.3 source lo0
traceroute to 2.2.2.3 (2.2.2.3), 30 hops max, 60 byte packets
1 192.168.1.2 (192.168.1.2) 0.088 ms 0.020 ms 0.017 ms
2 2.2.2.3 (2.2.2.3) 1.515 ms 1.922 ms 2.299 ms

Runnable example

import pandas as pd
from pybatfish.client.session import Session
from pybatfish.datamodel import *
from pybatfish.datamodel.answer import *
from pybatfish.datamodel.flow import *
%run startup.py
bf = Session(host="localhost")
# Initialize the example network and snapshot
NETWORK_NAME = "example_network"
BASE_SNAPSHOT_NAME = "base"
SNAPSHOT_PATH = "./snapshot"
bf.set_network(NETWORK_NAME)
bf.init_snapshot(SNAPSHOT_PATH, name=BASE_SNAPSHOT_NAME, overwrite=True)
tr_answer = bf.q.traceroute(startLocation='/R1$/[Loopback0]', headers=HeaderConstraints(dstIps='2.2.2.3/32'), maxTraces=3).answer()
show(tr_answer.frame())

Additional context
Add any other context about the problem here.
R1-config.txt
R2-config.txt
R3-config.txt
R4-config.txt
R5-config.txt
R6-config.txt
SW1-config.txt
Screenshot 2024-03-28 at 4 32 33 PM

@dhalperi
Copy link
Member

If you are using VRRP, you need to be supplying L1 topology. Are you doing that?

You may wish to run the vrrpProperties question to see what it's identified

@dhalperi
Copy link
Member

Same for the switch -- definitely need to supply L1 topology anytime L2 concepts are used!

@dhalperi
Copy link
Member

Looking a bit deeper -- Batfish disables management interfaces by default as the management network usually gets in the way of analysis. I saw that you had that on the switch.

@bharmarsameer
Copy link
Author

bharmarsameer commented Mar 28, 2024

but management interface is not used for any routing. Also I am specifying to use Lo0 in trace.
I just tried shutting down all management interfaces on. still doesnt work. :(

@dhalperi
Copy link
Member

I dug in a little bit, still only shallowly.

In Batfish on my machine with these configs, r4 is not advertising 2.2.2.3/32 to r1 because 2.2.2.3/32 is under RIB failure: it is the best BGP route, but the OSPF route is better.

Screenshot 2024-03-28 at 2 49 50 PM

What is happening in your emulator?

@bharmarsameer
Copy link
Author

bharmarsameer commented Mar 28, 2024

ospf is only for loopback distribution within the AS but outside the as everything is advertised using ebgp. i see the routes being advertised to R1 from R4. Routing looks all fine on the virtual switches itself.

R4#sh ip bgp nei 192.168.1.1 advertised-routes
BGP routing table information for VRF default
Router identifier 2.2.2.1, local AS number 2
Route status codes: s - suppressed contributor, * - valid, > - active, E - ECMP head, e - ECMP
                    S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast, q - Queued for advertisement
                    % - Pending BGP convergence
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI Origin Validation codes: V - valid, I - invalid, U - unknown
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  AIGP       LocPref Weight  Path
 * >      2.2.2.1/32             192.168.1.2           -       -          -       -       2 i
 * >      2.2.2.2/32             192.168.1.2           -       -          -       -       2 i
 * >      2.2.2.3/32             192.168.1.2           -       -          -       -       2 i
 * >      2.2.2.100/32           192.168.1.2           -       -          -       -       2 i

@dhalperi
Copy link
Member

Can you add show route for the main RIB?

@dhalperi
Copy link
Member

Also, note that I added ``` around your message so that it rendered correctly, not as markdown :)

@bharmarsameer
Copy link
Author

show ip route for all the devices or any specific ones?

@dhalperi
Copy link
Member

R4 and R1

@bharmarsameer
Copy link
Author

bharmarsameer commented Mar 28, 2024

R1#sh ip route

VRF: default
Codes: C - connected, S - static, K - kernel,
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B - Other BGP Routes,
       B I - iBGP, B E - eBGP, R - RIP, I L1 - IS-IS level 1,
       I L2 - IS-IS level 2, O3 - OSPFv3, A B - BGP Aggregate,
       A O - OSPF Summary, NG - Nexthop Group Static Route,
       V - VXLAN Control Service, M - Martian,
       DH - DHCP client installed default route,
       DP - Dynamic Policy Route, L - VRF Leaked,
       G  - gRIBI, RC - Route Cache Route,
       CL - CBF Leaked Route

Gateway of last resort is not reachable

 C        1.1.0.0/30 is directly connected, Ethernet1/1
 C        1.1.0.4/30 is directly connected, Ethernet2/1
 O        1.1.0.8/30 [110/20] via 1.1.0.2, Ethernet1/1
                              via 1.1.0.6, Ethernet2/1
 C        1.1.1.1/32 is directly connected, Loopback0
 O        1.1.1.2/32 [110/20] via 1.1.0.2, Ethernet1/1
 O        1.1.1.3/32 [110/20] via 1.1.0.6, Ethernet2/1
 C        1.1.1.100/32 is directly connected, Loopback100
 C        1.1.2.0/24 is directly connected, Vlan100
 B E      2.2.2.1/32 [200/0] via 192.168.1.2, Ethernet4/1
 B E      2.2.2.2/32 [200/0] via 192.168.1.2, Ethernet4/1
 B E      2.2.2.3/32 [200/0] via 192.168.1.2, Ethernet4/1
 B E      2.2.2.100/32 [200/0] via 192.168.1.2, Ethernet4/1
 C        192.168.1.0/30 is directly connected, Ethernet4/1

R4#sh ip route

VRF: default
Codes: C - connected, S - static, K - kernel,
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B - Other BGP Routes,
       B I - iBGP, B E - eBGP, R - RIP, I L1 - IS-IS level 1,
       I L2 - IS-IS level 2, O3 - OSPFv3, A B - BGP Aggregate,
       A O - OSPF Summary, NG - Nexthop Group Static Route,
       V - VXLAN Control Service, M - Martian,
       DH - DHCP client installed default route,
       DP - Dynamic Policy Route, L - VRF Leaked,
       G  - gRIBI, RC - Route Cache Route,
       CL - CBF Leaked Route

Gateway of last resort:
 S        0.0.0.0/0 [1/0] via 192.168.123.1, Management1

 B E      1.1.1.1/32 [200/0] via 192.168.1.1, Ethernet3/1
 B E      1.1.1.2/32 [200/0] via 192.168.1.1, Ethernet3/1
 B E      1.1.1.3/32 [200/0] via 192.168.1.1, Ethernet3/1
 B E      1.1.1.100/32 [200/0] via 192.168.1.1, Ethernet3/1
 B E      1.1.2.0/24 [200/0] via 192.168.1.1, Ethernet3/1
 C        2.2.0.0/30 is directly connected, Ethernet1/1
 C        2.2.0.4/30 is directly connected, Ethernet2/1
 O        2.2.0.8/30 [110/20] via 2.2.0.2, Ethernet1/1
                              via 2.2.0.6, Ethernet2/1
 C        2.2.2.1/32 is directly connected, Loopback0
 O        2.2.2.2/32 [110/20] via 2.2.0.2, Ethernet1/1
 O        2.2.2.3/32 [110/20] via 2.2.0.6, Ethernet2/1
 C        2.2.2.100/32 is directly connected, Loopback100
 C        192.168.1.0/30 is directly connected, Ethernet3/1
 C        192.168.123.0/24 is directly connected, Management1```

@dhalperi
Copy link
Member

This is very surprising. Can you attach show run all from r4?

@dhalperi
Copy link
Member

Sorry for not being explicit, but can you please include the all? show run does not have the hidden defaults I'm questioning :)

@bharmarsameer
Copy link
Author

Sorry. attached it as a file.
R4_sh_run_all.txt

@dhalperi
Copy link
Member

dhalperi commented Mar 28, 2024

So I'll tell you why I'm confused:

  1. R4 has the OSPF 2.2.2.3/32 in its main RIB
  2. R4 has the BGP 2.2.2.3/32 in its BGP RIB (IBGP route learned from R6)
  3. R4 has no bgp advertise-inactive in the show run all.

To me, that says that 2.2.2.3/32 IBGP route should NOT be advertised to R1 -- it's inactive.

But your R1 show data says it is. Can you explain the difference?

@bharmarsameer
Copy link
Author

bharmarsameer commented Mar 29, 2024

havent looked at the bgp advertise inactive. However the routing looks fine because OSPF is only used as IGP and then since R1 and R4 are ebgp neighbors the routes learned by R4 are advertised to R1. in the above comment at no.2 you mean learned from R6 right because R4 learns iBGP route for 2.2.2.3/32 from R6.

  • R4 has the BGP 2.2.2.3/32 in its BGP RIB (IBGP route learned from R6)

In fact we have this configuration everywhere in production with default no bgp advertise-inactive. I have tried to run batfish on prod devices configuration and it just runs fine. I see the expected results.

@dhalperi
Copy link
Member

Yes, I updated my prior comment to say R6.

@dhalperi
Copy link
Member

Can you add show ip bgp for R4?

@bharmarsameer
Copy link
Author

R4# sh ip bgp
BGP routing table information for VRF default
Router identifier 2.2.2.1, local AS number 2
Route status codes: s - suppressed contributor, * - valid, > - active, E - ECMP head, e - ECMP
                    S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast
                    % - Pending BGP convergence
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI Origin Validation codes: V - valid, I - invalid, U - unknown
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  AIGP       LocPref Weight  Path
 * >      1.1.1.1/32             192.168.1.1           0       -          500     0       1 i
 * >      1.1.1.2/32             192.168.1.1           0       -          500     0       1 i
 * >      1.1.1.3/32             192.168.1.1           0       -          500     0       1 i
 * >      1.1.1.100/32           192.168.1.1           0       -          500     0       1 i
 * >      1.1.2.0/24             192.168.1.1           0       -          500     0       1 i
 * >      2.2.2.1/32             -                     -       -          -       0       i
 * >      2.2.2.2/32             2.2.2.2               0       -          100     0       i
 * >      2.2.2.3/32             2.2.2.3               0       -          100     0       i
 * >      2.2.2.100/32           -                     -       -          -       0       i
 *        2.2.2.100/32           2.2.2.2               0       -          100     0       i

Here you go

@dhalperi
Copy link
Member

I can't understand why 2.2.2.3/32 is considered active in BGP (the >) given that it is not installed in the main RIB. I thought that was the definition of active :).

@bharmarsameer
Copy link
Author

it is installed in RIB using OSPF right. sh ip route does show that. its a loopback0 ip address of R6.

@dhalperi
Copy link
Member

Right. So here's the EOS documentation I'm referencing (which confirms what's in my head): https://www.arista.com/en/um-eos/eos-border-gateway-protocol-bgp

By default, BGP will advertise only those routes that are active in the switch’s RIB. This can contribute to dropped traffic. If a preferred route is available through another protocol (like OSPF), the BGP route will become inactive and not be advertised; if the preferred route is lost, there is no available route to the affected peers. Advertising inactive BGP routes minimizes traffic loss by providing alternative routes.

The bgp advertise-inactive command causes BGP to advertise inactive routes to BGP neighbors. Inactive route advertisement is configured globally, but the global setting can be overridden on a per-VRF basis.

Note the text I bolded: If a preferred route is available through another protocol (like OSPF), the BGP route will become inactive and not be advertised

@bharmarsameer
Copy link
Author

right but I think this only applies to iBGP which makes sense because admin distance of ibgp is 200 and ospf is 110. here it is ebgp (admin distance 20) between R1 <> R4. Thats the reason RIB is learning route via OSPF and then it advertises it to its ebgp neighbor which is expected behaviour in this kind of topology.

@dhalperi
Copy link
Member

But we're talking about whether R4 advertises it, not whether R1 does -- right? So on R4 110 < 200 which is why OSPF is in the R4's main rib.

@bharmarsameer
Copy link
Author

but that is fine. R4 advertises that to R1 which is eBGP not its ibgp neighbors. that whole R4, R5, R6 will learn it via ospf.

@dhalperi
Copy link
Member

On R4, this should be true:

If a preferred route is available through another protocol (like OSPF), the BGP route will become inactive and not be advertised

So it should not advertise the route to R1.

As far as I can tell, this is not specific to IBGP routes or EBGP routes, or the remote-as of the neighbor

@bharmarsameer
Copy link
Author

hmmmm AFAIK this is the standard design. and the fact that on the actual switch it shows the same. All trace / pings work as expected from the switch. If you want to advertise 2.2.2.3/32 outside the bgp domain, EBGP is the option to go. Thats what is happening here. all loopbacks are learned via ospf within the ospf domain and then using ebgp those loopbacks will be advertised to the ebgp neighbors. same happens from R5 > R2 as well. Take a look below

R5#sh ip bgp nei 192.168.2.1 advertised-routes 
BGP routing table information for VRF default
Router identifier 2.2.2.2, local AS number 2
Route status codes: s - suppressed contributor, * - valid, > - active, E - ECMP head, e - ECMP
                    S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast, q - Queued for advertisement
                    % - Pending BGP convergence
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI Origin Validation codes: V - valid, I - invalid, U - unknown
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  AIGP       LocPref Weight  Path
 * >      1.1.1.1/32             192.168.2.2           -       -          -       -       2 1 i
 * >      1.1.1.2/32             192.168.2.2           -       -          -       -       2 1 i
 * >      1.1.1.3/32             192.168.2.2           -       -          -       -       2 1 i
 * >      1.1.1.100/32           192.168.2.2           -       -          -       -       2 1 i
 * >      1.1.2.0/24             192.168.2.2           -       -          -       -       2 1 i
 * >      2.2.2.1/32             192.168.2.2           -       -          -       -       2 i
 * >      2.2.2.2/32             192.168.2.2           -       -          -       -       2 i
 * >      2.2.2.3/32             192.168.2.2           -       -          -       -       2 i
 * >      2.2.2.100/32           192.168.2.2           -       -          -       -       2 i
R5#sh ip route 

VRF: default
Codes: C - connected, S - static, K - kernel, 
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B - Other BGP Routes,
       B I - iBGP, B E - eBGP, R - RIP, I L1 - IS-IS level 1,
       I L2 - IS-IS level 2, O3 - OSPFv3, A B - BGP Aggregate,
       A O - OSPF Summary, NG - Nexthop Group Static Route,
       V - VXLAN Control Service, M - Martian,
       DH - DHCP client installed default route,
       DP - Dynamic Policy Route, L - VRF Leaked,
       G  - gRIBI, RC - Route Cache Route,
       CL - CBF Leaked Route

Gateway of last resort:
 S        0.0.0.0/0 [1/0] via 192.168.123.1, Management1

 B I      1.1.1.1/32 [200/0] via 2.2.0.1, Ethernet1/1
 B I      1.1.1.2/32 [200/0] via 2.2.0.1, Ethernet1/1
 B I      1.1.1.3/32 [200/0] via 2.2.0.1, Ethernet1/1
 B I      1.1.1.100/32 [200/0] via 2.2.0.1, Ethernet1/1
 B I      1.1.2.0/24 [200/0] via 2.2.0.1, Ethernet1/1
 C        2.2.0.0/30 is directly connected, Ethernet1/1
 O        2.2.0.4/30 [110/20] via 2.2.0.1, Ethernet1/1
                              via 2.2.0.10, Ethernet2/1
 C        2.2.0.8/30 is directly connected, Ethernet2/1
 O        2.2.2.1/32 [110/20] via 2.2.0.1, Ethernet1/1
 C        2.2.2.2/32 is directly connected, Loopback0
 O        2.2.2.3/32 [110/20] via 2.2.0.10, Ethernet2/1
 C        2.2.2.100/32 is directly connected, Loopback100
 C        192.168.2.0/30 is directly connected, Ethernet3/1
 C        192.168.123.0/24 is directly connected, Management1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants