Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QEMU vm on xcon hub stops receiving packets #238

Open
klambrec opened this issue Aug 27, 2020 · 5 comments
Open

QEMU vm on xcon hub stops receiving packets #238

klambrec opened this issue Aug 27, 2020 · 5 comments

Comments

@klambrec
Copy link

A related issue to #231

I'm building a vr-xcon TcpHub topology with 3 VMs part of the Cisco SD-WAN solution on the bridge:

  • vmanage
  • vsmart
  • vbond

I bring up the topology and everything works fine. I can ping between the 3 VMs without issue. I now start the registration process between the vmanage and the vsmart. The proprietary protocol exchange results in a stream of packets between those two components. More often than not, the outcome is that the vbond is no longer ping-able. Important to note here is that because this is a hub, the vbond is also receiving those packets from the hub.

I always see that the vbond QEMU instance closes the TCP connection towards the hub. I first fixed the hub to re-build any failed connections in #237. This means that the hub will rather quickly re-establish a socket with the failed QEMU. But I'm observing that despite the TCP connection being re-established, the vbond does not receive any of my ping requests inside the VM despite them arriving inside the container.

tcpdump output taken from the vbond taken around the time of the outage (i.e. time where the vmanage and vsmart started communicating:

13:54:29.809778 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 159933, win 1462, options [nop,nop,TS val 69154213 ecr 69154213], length 0
13:54:29.827789 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 159933:160067, ack 78841, win 229, options [nop,nop,TS val 69154217 ecr 69154213], length 134
13:54:29.827895 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 160067, win 1462, options [nop,nop,TS val 69154217 ecr 69154217], length 0
13:54:29.828422 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 160067:160201, ack 78841, win 229, options [nop,nop,TS val 69154217 ecr 69154217], length 134
13:54:29.828521 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 160201, win 1462, options [nop,nop,TS val 69154217 ecr 69154217], length 0
13:54:29.829839 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 160201:160319, ack 78841, win 229, options [nop,nop,TS val 69154218 ecr 69154217], length 118
13:54:29.829940 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 160319, win 1462, options [nop,nop,TS val 69154218 ecr 69154218], length 0
13:54:29.831553 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 160319:160469, ack 78841, win 229, options [nop,nop,TS val 69154218 ecr 69154218], length 150
13:54:29.831642 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 160469, win 1462, options [nop,nop,TS val 69154218 ecr 69154218], length 0
13:54:29.833381 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 160469:161003, ack 78841, win 229, options [nop,nop,TS val 69154219 ecr 69154218], length 534
13:54:29.833479 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 161003, win 1462, options [nop,nop,TS val 69154219 ecr 69154219], length 0
13:54:29.874050 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 161003:161073, ack 78841, win 229, options [nop,nop,TS val 69154229 ecr 69154219], length 70
13:54:29.874107 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 161073, win 1462, options [nop,nop,TS val 69154229 ecr 69154229], length 0
13:54:29.874412 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 161073:161207, ack 78841, win 229, options [nop,nop,TS val 69154229 ecr 69154229], length 134
13:54:29.874452 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 161207, win 1462, options [nop,nop,TS val 69154229 ecr 69154229], length 0
13:54:29.874704 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 161207:161277, ack 78841, win 229, options [nop,nop,TS val 69154229 ecr 69154229], length 70
13:54:29.874741 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 161277, win 1462, options [nop,nop,TS val 69154229 ecr 69154229], length 0
13:54:29.913378 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 161277:161459, ack 78841, win 229, options [nop,nop,TS val 69154239 ecr 69154229], length 182
13:54:29.913438 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 161459, win 1462, options [nop,nop,TS val 69154239 ecr 69154239], length 0
13:54:29.914774 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 161459:161593, ack 78841, win 229, options [nop,nop,TS val 69154239 ecr 69154239], length 134
13:54:29.914880 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 161593, win 1462, options [nop,nop,TS val 69154239 ecr 69154239], length 0
13:54:29.915538 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 161593:161759, ack 78841, win 229, options [nop,nop,TS val 69154239 ecr 69154239], length 166
13:54:29.915644 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 161759, win 1462, options [nop,nop,TS val 69154239 ecr 69154239], length 0
13:54:29.930676 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 161759:161893, ack 78841, win 229, options [nop,nop,TS val 69154243 ecr 69154239], length 134
13:54:29.930796 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 161893, win 1462, options [nop,nop,TS val 69154243 ecr 69154243], length 0
13:54:29.932247 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 161893:163941, ack 78841, win 229, options [nop,nop,TS val 69154243 ecr 69154243], length 2048
13:54:29.932355 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 163941, win 1462, options [nop,nop,TS val 69154243 ecr 69154243], length 0
13:54:29.932382 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [.], seq 163941:165389, ack 78841, win 229, options [nop,nop,TS val 69154243 ecr 69154243], length 1448
13:54:29.932411 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 165389:165989, ack 78841, win 229, options [nop,nop,TS val 69154243 ecr 69154243], length 600
13:54:29.932433 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 165989, win 1462, options [nop,nop,TS val 69154243 ecr 69154243], length 0
13:54:29.932516 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 165989:166123, ack 78841, win 229, options [nop,nop,TS val 69154243 ecr 69154243], length 134
13:54:29.932615 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 166123, win 1462, options [nop,nop,TS val 69154243 ecr 69154243], length 0
13:54:29.932648 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [.], seq 166123:167571, ack 78841, win 229, options [nop,nop,TS val 69154243 ecr 69154243], length 1448
13:54:29.932677 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 167571:168171, ack 78841, win 229, options [nop,nop,TS val 69154243 ecr 69154243], length 600
13:54:29.932694 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 168171, win 1462, options [nop,nop,TS val 69154243 ecr 69154243], length 0
13:54:29.932778 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 168171:170219, ack 78841, win 229, options [nop,nop,TS val 69154243 ecr 69154243], length 2048
13:54:29.932828 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 170219, win 1462, options [nop,nop,TS val 69154243 ecr 69154243], length 0
13:54:29.933866 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 170219:170359, ack 78841, win 229, options [nop,nop,TS val 69154244 ecr 69154243], length 140
13:54:29.933914 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 170359, win 1462, options [nop,nop,TS val 69154244 ecr 69154244], length 0
13:54:29.933993 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 170359:172407, ack 78841, win 229, options [nop,nop,TS val 69154244 ecr 69154244], length 2048
13:54:29.934034 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 172407, win 1462, options [nop,nop,TS val 69154244 ecr 69154244], length 0
13:54:29.934137 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 172407:174455, ack 78841, win 229, options [nop,nop,TS val 69154244 ecr 69154244], length 2048
13:54:29.934176 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 174455, win 1462, options [nop,nop,TS val 69154244 ecr 69154244], length 0
13:54:29.934272 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 174455:176503, ack 78841, win 229, options [nop,nop,TS val 69154244 ecr 69154244], length 2048
13:54:29.934311 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 176503, win 1462, options [nop,nop,TS val 69154244 ecr 69154244], length 0
13:54:29.934348 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 176503:176573, ack 78841, win 229, options [nop,nop,TS val 69154244 ecr 69154244], length 70
13:54:29.934385 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 176573, win 1462, options [nop,nop,TS val 69154244 ecr 69154244], length 0
13:54:29.934448 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 176573:178621, ack 78841, win 229, options [nop,nop,TS val 69154244 ecr 69154244], length 2048
13:54:29.934487 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [.], ack 178621, win 1462, options [nop,nop,TS val 69154244 ecr 69154244], length 0
13:54:29.934546 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [F.], seq 78841, ack 178621, win 1462, options [nop,nop,TS val 69154244 ecr 69154244], length 0
13:54:29.934550 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [P.], seq 178621:180669, ack 78841, win 229, options [nop,nop,TS val 69154244 ecr 69154244], length 2048
13:54:29.934598 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [R], seq 1102685821, win 0, length 0
13:54:29.934625 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [.], ack 78842, win 229, options [nop,nop,TS val 69154244 ecr 69154244], length 0
13:54:29.934644 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [R], seq 1102685822, win 0, length 0
13:54:29.934646 IP 192.168.96.6.35990 > 192.168.96.5.10001: Flags [.], seq 180669:182117, ack 78842, win 229, options [nop,nop,TS val 69154244 ecr 69154244], length 1448
13:54:29.934672 IP 192.168.96.5.10001 > 192.168.96.6.35990: Flags [R], seq 1102685822, win 0, length 0
13:54:29.935820 IP 192.168.96.6.36004 > 192.168.96.5.10001: Flags [S], seq 2990841023, win 29200, options [mss 1460,sackOK,TS val 69154244 ecr 0,nop,wscale 7], length 0
13:54:29.935858 IP 192.168.96.5.10001 > 192.168.96.6.36004: Flags [S.], seq 1095724192, ack 2990841024, win 28960, options [mss 1460,sackOK,TS val 69154199 ecr 69154244,nop,wscale 7], length 0
13:54:29.935897 IP 192.168.96.6.36004 > 192.168.96.5.10001: Flags [.], ack 1, win 229, options [nop,nop,TS val 69154244 ecr 69154199], length 0
13:54:29.936393 IP 192.168.96.6.36004 > 192.168.96.5.10001: Flags [P.], seq 1:2049, ack 1, win 229, options [nop,nop,TS val 69154244 ecr 69154199], length 2048
13:54:29.936413 IP 192.168.96.5.10001 > 192.168.96.6.36004: Flags [.], ack 2049, win 261, options [nop,nop,TS val 69154244 ecr 69154244], length 0
13:54:29.936490 IP 192.168.96.6.36004 > 192.168.96.5.10001: Flags [P.], seq 2049:2119, ack 1, win 229, options [nop,nop,TS val 69154244 ecr 69154244], length 70
13:54:29.936503 IP 192.168.96.5.10001 > 192.168.96.6.36004: Flags [.], ack 2119, win 261, options [nop,nop,TS val 69154244 ecr 69154244], length 0
13:54:29.936603 IP 192.168.96.6.36004 > 192.168.96.5.10001: Flags [P.], seq 2119:4167, ack 1, win 229, options [nop,nop,TS val 69154244 ecr 69154244], length 2048
13:54:29.936616 IP 192.168.96.5.10001 > 192.168.96.6.36004: Flags [.], ack 4167, win 293, options [nop,nop,TS val 69154244 ecr 69154244], length 0
13:54:29.936695 IP 192.168.96.6.36004 > 192.168.96.5.10001: Flags [P.], seq 4167:4439, ack 1, win 229, options [nop,nop,TS val 69154244 ecr 69154244], length 272
13:54:29.936707 IP 192.168.96.5.10001 > 192.168.96.6.36004: Flags [.], ack 4439, win 315, options [nop,nop,TS val 69154244 ecr 69154244], length 0
13:54:29.936763 IP 192.168.96.6.36004 > 192.168.96.5.10001: Flags [P.], seq 4439:4749, ack 1, win 229, options [nop,nop,TS val 69154244 ecr 69154244], length 310
13:54:29.936775 IP 192.168.96.5.10001 > 192.168.96.6.36004: Flags [.], ack 4749, win 338, options [nop,nop,TS val 69154244 ecr 69154244], length 0
13:54:29.937186 IP 192.168.96.6.36004 > 192.168.96.5.10001: Flags [P.], seq 4749:4819, ack 1, win 229, options [nop,nop,TS val 69154245 ecr 69154244], length 70
13:54:29.937202 IP 192.168.96.5.10001 > 192.168.96.6.36004: Flags [.], ack 4819, win 338, options [nop,nop,TS val 69154245 ecr 69154245], length 0
13:54:29.937557 IP 192.168.96.6.36004 > 192.168.96.5.10001: Flags [P.], seq 4819:4889, ack 1, win 229, options [nop,nop,TS val 69154245 ecr 69154245], length 70
13:54:29.937583 IP 192.168.96.5.10001 > 192.168.96.6.36004: Flags [.], ack 4889, win 338, options [nop,nop,TS val 69154245 ecr 69154245], length 0
13:54:29.970482 IP 192.168.96.6.36004 > 192.168.96.5.10001: Flags [P.], seq 4889:5199, ack 1, win 229, options [nop,nop,TS val 69154253 ecr 69154245], length 310
13:54:29.970502 IP 192.168.96.5.10001 > 192.168.96.6.36004: Flags [.], ack 5199, win 360, options [nop,nop,TS val 69154253 ecr 69154253], length 0
13:54:29.970578 IP 192.168.96.6.36004 > 192.168.96.5.10001: Flags [P.], seq 5199:5333, ack 1, win 229, options [nop,nop,TS val 69154253 ecr 69154253], length 134
13:54:29.970590 IP 192.168.96.5.10001 > 192.168.96.6.36004: Flags [.], ack 5333, win 383, options [nop,nop,TS val 69154253 ecr 69154253], length 0
13:54:30.009423 IP 192.168.96.6.36004 > 192.168.96.5.10001: Flags [P.], seq 5333:5415, ack 1, win 229, options [nop,nop,TS val 69154263 ecr 69154253], length 82
13:54:30.009437 IP 192.168.96.5.10001 > 192.168.96.6.36004: Flags [.], ack 5415, win 383, options [nop,nop,TS val 69154263 ecr 69154263], length 0
13:54:30.049829 IP 192.168.96.6.36004 > 192.168.96.5.10001: Flags [P.], seq 5415:5485, ack 1, win 229, options [nop,nop,TS val 69154273 ecr 69154263], length 70
13:54:30.049849 IP 192.168.96.5.10001 > 192.168.96.6.36004: Flags [.], ack 5485, win 383, options [nop,nop,TS val 69154273 ecr 69154273], length 0
13:54:30.368851 IP 192.168.96.6.36004 > 192.168.96.5.10001: Flags [P.], seq 5485:5587, ack 1, win 229, options [nop,nop,TS val 69154352 ecr 69154273], length 102
13:54:30.369198 IP 192.168.96.5.10001 > 192.168.96.6.36004: Flags [.], ack 5587, win 383, options [nop,nop,TS val 69154353 ecr 69154352], length 0
13:54:30.723499 IP 192.168.96.6.36004 > 192.168.96.5.10001: Flags [P.], seq 5587:5689, ack 1, win 229, options [nop,nop,TS val 69154441 ecr 69154353], length 102
13:54:30.723518 IP 192.168.96.5.10001 > 192.168.96.6.36004: Flags [.], ack 5689, win 383, options [nop,nop,TS val 69154441 ecr 69154441], length 0
^C

Communication can be restored by either:

  • Restarting xcon which resets all TCP connections obviously
  • Waiting a couple of minutes, probably 300 seconds as a TCP timeout on QEMU??

This is reproducible with QEMU 4.2.1 running all 3 containers.

@klambrec
Copy link
Author

@plajjan as discussed, a dedicated issue for this behavior, if need be I can share the entire setup with you, but I assume this should be reproducible with any 3 VMs and sufficient application traffic
@bdreisbach FYI

@bdreisbach
Copy link

thanks for looking into this a bit more. today i was playing with the patch in #188 and made it work with vr-bgp. i tested it all manually and didnt yet try running tests/sending a bunch of packets, but it "works"....i still need to write vr-bgp code for using native docker networking, but, i suspect if we can implement this it will work much better than the xcon/hub implementation.

@bdreisbach
Copy link

slight follow up...i tested a dot1q interface to 2 vr-bgp instances on 2 different vlans, as well as a "LAN"/"IX" interface with 2 vr-bgp instances on the lan.

RP/0/0/CPU0:xrv1#show int  bri
Thu Aug 27 20:40:03.702 UTC

               Intf       Intf        LineP              Encap  MTU        BW
               Name       State       State               Type (byte)    (Kbps)
--------------------------------------------------------------------------------
                Lo0          up          up           Loopback  1500          0
                Nu0          up          up               Null  1500          0
       Mg0/0/CPU0/0          up          up               ARPA  1514          0
          Gi0/0/0/0          up          up               ARPA  1514    1000000
     Gi0/0/0/0.2000          up          up             802.1Q  1518    1000000
     Gi0/0/0/0.2001          up          up             802.1Q  1518    1000000
          Gi0/0/0/1          up          up               ARPA  1514    1000000
          Gi0/0/0/2  admin-down  admin-down               ARPA  1514    1000000

RP/0/0/CPU0:xrv1#show int description
Thu Aug 27 20:40:17.721 UTC

Interface          Status      Protocol    Description
--------------------------------------------------------------------------------
Lo0                up          up
Nu0                up          up
Mg0/0/CPU0/0       up          up
Gi0/0/0/0          up          up          vr-bgp trunk
Gi0/0/0/0.2000     up          up          vr-bgp1
Gi0/0/0/0.2001     up          up          vr-bgp2
Gi0/0/0/1          up          up          vr-bgp LAN
Gi0/0/0/2          admin-down  admin-down

RP/0/0/CPU0:xrv1#show bgp sum
Thu Aug 27 20:40:21.441 UTC
BGP router identifier 2.2.2.1, local AS number 65000
BGP generic scan interval 60 secs
Non-stop routing is enabled
BGP table state: Active
Table ID: 0xe0000000   RD version: 2
BGP main routing table version 2
BGP NSR Initial initsync version 2 (Reached)
BGP NSR/ISSU Sync-Group versions 0/0
BGP scan interval 60 secs

BGP is operating in STANDALONE mode.


Process       RcvTblVer   bRIB/RIB   LabelVer  ImportVer  SendTblVer  StandbyVer
Speaker               2          2          2          2           2           0

Some configured eBGP neighbors (under default or non-default vrfs)
do not have both inbound and outbound policies configured for IPv4 Unicast
address family. These neighbors will default to sending and/or
receiving no routes and are marked with '!' in the output below.
Use the 'show bgp neighbor <nbr_address>' command for details.

Neighbor        Spk    AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down  St/PfxRcd
100.0.0.10        0  2000      80      80        2    0    0 01:17:18          0!
100.1.0.10        0  2001      80      80        2    0    0 01:17:30          0!
100.3.0.10        0  2002      31      31        2    0    0 00:28:15          0!
100.3.0.11        0  2003      30      30        2    0    0 00:27:47          0!

RP/0/0/CPU0:xrv1#

@plajjan
Copy link
Collaborator

plajjan commented Aug 28, 2020

I think there are multiple courses of action here;

  • build reproduction case to show problem to qemu folks
    • potentially find root cause and patch ourselves + send patch to qemu folks for upstreaming
  • implement an ethernet switch, i.e. add mac learning functionality
    • works around the problem by not forwarding VM1->VM2 packets to VM3, which we believe resets the connection in the first place
  • regardless of above, we need to fix reconnect code and add more test cases etc
  • also, vr-xcon is up for a rewrite... I've always wanted to implement it in async with the protocol support stuff - https://docs.python.org/3/library/asyncio-protocol.html

@bdreisbach
Copy link

i have been playing with the native docker networking patch. i wonder if we should just scrap xcon and use that stuff. i have it working with vr-bgp and topomachine as of earlier today. there are some missing bits to that patch, but, i am working on doing a complete implementation out of neccessity(sros mainly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants