Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize BMP module, notably when removing a peer #253

Open
vincentbernat opened this issue Nov 8, 2022 · 48 comments · Fixed by #254 or #255
Open

Optimize BMP module, notably when removing a peer #253

vincentbernat opened this issue Nov 8, 2022 · 48 comments · Fixed by #254 or #255
Labels
bug Something isn't working

Comments

@vincentbernat
Copy link
Member

The BMP module can prevent the inlet from working when removing a peer. Several solutions:

  • optimize for the several bottlenecks that may be present
  • don't hold the lock for too long and sleep a bit during large removals

See #241

@vincentbernat vincentbernat added the bug Something isn't working label Nov 8, 2022
@vincentbernat
Copy link
Member Author

After any BGP session flaps, the following situation is visible in the logs:

1. BMP module removing the peer

2. Error 503

3. ~10x error "dropping flow due to queue full (size 250000)"
{"level":"info","time":"2022-11-07T17:33:20Z","caller":"akvorado/inlet/bmp/events.go:104","module":"akvorado/inlet/bmp","message":"remove peer x.x.x.x for exporter x.x.x.x"}
{"level":"info","handler":"/api/","method":"GET","url":"/api/v0/healthcheck","ip":"127.0.0.1:51396","user-agent":"Go-http-client/1.1","status":503,"size":233,"duration":5000.73073,"time":"2022-11-07T17:33:55Z","caller":"akvorado/common/http/root.go:113","module":"akvorado/common/http","message":"HTTP request"}
{"level":"warn","worker":"12","listen":"0.0.0.0:2055","time":"2022-11-07T17:34:10Z","caller":"akvorado/inlet/flow/input/udp/root.go:196","module":"akvorado/inlet/flow/input/udp","message":"dropping flow due to queue full (size 250000)"}

It seems that this happens if we have more than 10k flows/s

vincentbernat added a commit that referenced this issue Nov 9, 2022
When the RIB is locked for too long, inlet is hung. Try to ensure give a
bit of time for the inlet to move forward between two flush of the RIB.
There are various knobs not documnted yet until we get better defaults:

- `inlet.bmp.peer-removal-max-time`: how long to keep the lock
- `inlet.bmp.peer-removal-sleep-interval`: how long to sleep between two
  runs if we were unable to flush the whole peer
- `inlet.bmp.peer-removal-max-queue`: maximum number of flush requests
- `inlet.bmp.peer-removal-min-routes`: minimum number of routes to flush
  before yielding

Nay fix #253
vincentbernat added a commit that referenced this issue Nov 9, 2022
When the RIB is locked for too long, inlet is hung. Try to ensure give a
bit of time for the inlet to move forward between two flush of the RIB.
There are various knobs not documnted yet until we get better defaults:

- `inlet.bmp.peer-removal-max-time`: how long to keep the lock
- `inlet.bmp.peer-removal-sleep-interval`: how long to sleep between two
  runs if we were unable to flush the whole peer
- `inlet.bmp.peer-removal-max-queue`: maximum number of flush requests
- `inlet.bmp.peer-removal-min-routes`: minimum number of routes to flush
  before yielding

May fix #253
vincentbernat added a commit that referenced this issue Nov 9, 2022
When the RIB is locked for too long, inlet is hung. Try to ensure give a
bit of time for the inlet to move forward between two flush of the RIB.
There are various knobs not documnted yet until we get better defaults:

- `inlet.bmp.peer-removal-max-time`: how long to keep the lock
- `inlet.bmp.peer-removal-sleep-interval`: how long to sleep between two
  runs if we were unable to flush the whole peer
- `inlet.bmp.peer-removal-max-queue`: maximum number of flush requests
- `inlet.bmp.peer-removal-min-routes`: minimum number of routes to flush
  before yielding

May fix #253
vincentbernat added a commit that referenced this issue Nov 9, 2022
When the RIB is locked for too long, inlet is hung. Try to ensure give a
bit of time for the inlet to move forward between two flush of the RIB.
There are various knobs not documnted yet until we get better defaults:

- `inlet.bmp.peer-removal-max-time`: how long to keep the lock
- `inlet.bmp.peer-removal-sleep-interval`: how long to sleep between two
  runs if we were unable to flush the whole peer
- `inlet.bmp.peer-removal-max-queue`: maximum number of flush requests
- `inlet.bmp.peer-removal-min-routes`: minimum number of routes to flush
  before yielding

May fix #253
@vincentbernat
Copy link
Member Author

vincentbernat commented Nov 9, 2022

@kostik2022 can you test if it works well enough for you? You can change the image to akvorado:main in your docker-compose to test and do a docker-compose up akvorado-inlet. The BMP metrics you get after a peer goes down will be useful to tune. I have added a few of them.

@kostik2022
Copy link

kostik2022 commented Nov 11, 2022

Yes, we switched to akvorado:main.
Here are new metrics:

# HELP akvorado_inlet_bmp_closed_connections_total Number of closed connections.
# TYPE akvorado_inlet_bmp_closed_connections_total counter
akvorado_inlet_bmp_closed_connections_total{exporter="10.0.0.1"} 1
akvorado_inlet_bmp_closed_connections_total{exporter="10.0.0.7"} 1
akvorado_inlet_bmp_closed_connections_total{exporter="10.0.0.3"} 1
akvorado_inlet_bmp_closed_connections_total{exporter="10.0.0.4"} 1
akvorado_inlet_bmp_closed_connections_total{exporter="10.0.0.5"} 1
akvorado_inlet_bmp_closed_connections_total{exporter="10.0.0.6"} 1
# HELP akvorado_inlet_bmp_locked_duration_seconds Duration during which the RIB is locked.
# TYPE akvorado_inlet_bmp_locked_duration_seconds summary
akvorado_inlet_bmp_locked_duration_seconds{reason="stale",quantile="0.5"} NaN
akvorado_inlet_bmp_locked_duration_seconds{reason="stale",quantile="0.9"} NaN
akvorado_inlet_bmp_locked_duration_seconds{reason="stale",quantile="0.99"} NaN
akvorado_inlet_bmp_locked_duration_seconds_sum{reason="stale"} 0.00018218700000000002
akvorado_inlet_bmp_locked_duration_seconds_count{reason="stale"} 5
# HELP akvorado_inlet_bmp_messages_received_total Number of BMP messages received.
# TYPE akvorado_inlet_bmp_messages_received_total counter
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.1",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.1",type="peer-down-notification"} 56
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.1",type="peer-up-notification"} 279
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.1",type="route-monitoring"} 1.3885835e+07
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.1",type="statistics-report"} 14272
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.1",type="termination"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.7",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.7",type="peer-down-notification"} 54
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.7",type="peer-up-notification"} 252
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.7",type="route-monitoring"} 1.1202851e+07
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.7",type="statistics-report"} 12666
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.7",type="termination"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.3",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.3",type="peer-down-notification"} 18
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.3",type="peer-up-notification"} 111
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.3",type="route-monitoring"} 5.513878e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.3",type="statistics-report"} 9500
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.3",type="termination"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.4",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.4",type="peer-down-notification"} 65
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.4",type="peer-up-notification"} 258
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.4",type="route-monitoring"} 1.0649005e+07
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.4",type="statistics-report"} 12350
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.4",type="termination"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.5",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.5",type="peer-down-notification"} 162
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.5",type="peer-up-notification"} 420
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.5",type="route-monitoring"} 1.6141111e+07
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.5",type="statistics-report"} 16760
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.5",type="termination"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.6",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.6",type="peer-down-notification"} 17
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.6",type="peer-up-notification"} 120
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.6",type="route-monitoring"} 8.765908e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.6",type="statistics-report"} 10215
akvorado_inlet_bmp_messages_received_total{exporter="10.0.0.6",type="termination"} 1
# HELP akvorado_inlet_bmp_opened_connections_total Number of opened connections.
# TYPE akvorado_inlet_bmp_opened_connections_total counter
akvorado_inlet_bmp_opened_connections_total{exporter="10.0.0.1"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.0.0.7"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.0.0.3"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.0.0.4"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.0.0.5"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.0.0.6"} 1
# HELP akvorado_inlet_bmp_peers_total Number of peers up.
# TYPE akvorado_inlet_bmp_peers_total gauge
akvorado_inlet_bmp_peers_total{exporter="10.0.0.1"} 0
akvorado_inlet_bmp_peers_total{exporter="10.0.0.7"} 0
akvorado_inlet_bmp_peers_total{exporter="10.0.0.3"} 0
akvorado_inlet_bmp_peers_total{exporter="10.0.0.4"} 0
akvorado_inlet_bmp_peers_total{exporter="10.0.0.5"} 0
akvorado_inlet_bmp_peers_total{exporter="10.0.0.6"} 0
# HELP akvorado_inlet_bmp_routes_total Number of routes up.
# TYPE akvorado_inlet_bmp_routes_total gauge
akvorado_inlet_bmp_routes_total{exporter="10.0.0.1"} 0
akvorado_inlet_bmp_routes_total{exporter="10.0.0.7"} 0
akvorado_inlet_bmp_routes_total{exporter="10.0.0.3"} 0
akvorado_inlet_bmp_routes_total{exporter="10.0.0.4"} 0
akvorado_inlet_bmp_routes_total{exporter="10.0.0.5"} 0
akvorado_inlet_bmp_routes_total{exporter="10.0.0.6"} 0

Looks like drops occurs at every new BMP connection added
Metrics with drops are:

# HELP akvorado_inlet_flow_input_udp_in_drops Dropped packets due to listen queue full.
# TYPE akvorado_inlet_flow_input_udp_in_drops gauge
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="0"} 2354
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="1"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="2"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="3"} 1814
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="5"} 4584
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="6"} 5300
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="7"} 337
# HELP akvorado_inlet_flow_input_udp_out_drops Dropped packets due to internal queue full.
# TYPE akvorado_inlet_flow_input_udp_out_drops counter
akvorado_inlet_flow_input_udp_out_drops{exporter="10.0.0.1",listener="0.0.0.0:2055",worker="5"} 1.86614e+06
akvorado_inlet_flow_input_udp_out_drops{exporter="10.0.0.1",listener="0.0.0.0:2055",worker="7"} 4159
akvorado_inlet_flow_input_udp_out_drops{exporter="10.0.0.2",listener="0.0.0.0:2055",worker="0"} 6.365746e+06
akvorado_inlet_flow_input_udp_out_drops{exporter="10.0.0.3",listener="0.0.0.0:2055",worker="2"} 5728
akvorado_inlet_flow_input_udp_out_drops{exporter="10.0.0.3",listener="0.0.0.0:2055",worker="3"} 2.517691e+06
akvorado_inlet_flow_input_udp_out_drops{exporter="10.0.0.4",listener="0.0.0.0:2055",worker="2"} 5534
akvorado_inlet_flow_input_udp_out_drops{exporter="10.0.0.4",listener="0.0.0.0:2055",worker="6"} 3.371851e+06
akvorado_inlet_flow_input_udp_out_drops{exporter="10.0.0.5",listener="0.0.0.0:2055",worker="3"} 42469
akvorado_inlet_flow_input_udp_out_drops{exporter="10.0.0.5",listener="0.0.0.0:2055",worker="7"} 4.481151e+06
akvorado_inlet_flow_input_udp_out_drops{exporter="10.0.0.6",listener="0.0.0.0:2055",worker="3"} 3056
akvorado_inlet_flow_input_udp_out_drops{exporter="10.0.0.6",listener="0.0.0.0:2055",worker="5"} 1.553003e+06

Kafka queue-size is 50000, internal flow queue-size is 300000.
Kafka has 21 partitions.

@vincentbernat
Copy link
Member Author

It's odd that you don't have peer_removal_done_total. Maybe try docker pull ghcr.io/akvorado/akvorado:main to ensure you use the latest version. However, the change only impact peer down lock times. If you also have issues when peer is going up, this is not fixed.

@vincentbernat vincentbernat reopened this Nov 11, 2022
@kostik2022
Copy link

kostik2022 commented Nov 11, 2022

We use main branch (just checked again).
With bmp - probably I exported not all of the metrics... here there are for BMP and for drops separately. Also sent you profiles via email

#### curl -s http://127.0.0.1:8080/api/v0/inlet/metrics | grep bmp

# HELP akvorado_inlet_bmp_locked_duration_seconds Duration during which the RIB is locked.
# TYPE akvorado_inlet_bmp_locked_duration_seconds summary
akvorado_inlet_bmp_locked_duration_seconds{reason="stale",quantile="0.5"} 0.052741393
akvorado_inlet_bmp_locked_duration_seconds{reason="stale",quantile="0.9"} 0.111159825
akvorado_inlet_bmp_locked_duration_seconds{reason="stale",quantile="0.99"} 0.232681078
akvorado_inlet_bmp_locked_duration_seconds_sum{reason="stale"} 69.47352256500002
akvorado_inlet_bmp_locked_duration_seconds_count{reason="stale"} 1000
# HELP akvorado_inlet_bmp_messages_received_total Number of BMP messages received.
# TYPE akvorado_inlet_bmp_messages_received_total counter
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="peer-down-notification"} 15
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="peer-up-notification"} 33
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="route-monitoring"} 139105
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="statistics-report"} 18
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="peer-down-notification"} 247
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="peer-up-notification"} 278
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="route-monitoring"} 147771
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="statistics-report"} 31
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="peer-down-notification"} 226
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="peer-up-notification"} 250
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="route-monitoring"} 143610
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="statistics-report"} 24
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="peer-up-notification"} 103
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="route-monitoring"} 154988
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="statistics-report"} 103
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="peer-down-notification"} 224
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="peer-up-notification"} 253
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="route-monitoring"} 144889
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="statistics-report"} 29
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="peer-down-notification"} 225
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="peer-up-notification"} 297
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="route-monitoring"} 148525
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="statistics-report"} 72
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="peer-up-notification"} 112
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="route-monitoring"} 150016
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="statistics-report"} 112
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="peer-down-notification"} 18
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="peer-up-notification"} 78
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="route-monitoring"} 143264
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="statistics-report"} 60
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.26",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.26",type="peer-down-notification"} 6
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.26",type="peer-up-notification"} 6
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="peer-down-notification"} 16
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="peer-up-notification"} 50
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="route-monitoring"} 146916
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="statistics-report"} 34
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="peer-down-notification"} 10
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="peer-up-notification"} 28
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="route-monitoring"} 146634
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="statistics-report"} 18
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="peer-down-notification"} 13
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="peer-up-notification"} 82
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="route-monitoring"} 143046
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="statistics-report"} 69
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="peer-up-notification"} 72
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="route-monitoring"} 98029
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="statistics-report"} 72
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="peer-up-notification"} 63
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="route-monitoring"} 104894
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="statistics-report"} 63
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.35",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.36",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.37",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.39",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.41",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.42",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="peer-up-notification"} 28
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="route-monitoring"} 148869
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="statistics-report"} 28
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="peer-up-notification"} 40
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="route-monitoring"} 152612
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="statistics-report"} 40
# HELP akvorado_inlet_bmp_opened_connections_total Number of opened connections.
# TYPE akvorado_inlet_bmp_opened_connections_total counter
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.1"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.10"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.19"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.20"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.21"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.22"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.23"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.25"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.26"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.27"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.30"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.31"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.33"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.34"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.35"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.36"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.37"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.39"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.41"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.42"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.43"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.44"} 1
# HELP akvorado_inlet_bmp_peer_removal_done_total Number of peers removed from the RIB.
# TYPE akvorado_inlet_bmp_peer_removal_done_total counter
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.1"} 15
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.10"} 247
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.19"} 226
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.21"} 224
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.22"} 225
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.25"} 18
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.26"} 6
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.27"} 16
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.30"} 10
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.31"} 13
# HELP akvorado_inlet_bmp_peers_total Number of peers up.
# TYPE akvorado_inlet_bmp_peers_total gauge
akvorado_inlet_bmp_peers_total{exporter="10.16.0.1"} 18
akvorado_inlet_bmp_peers_total{exporter="10.16.0.10"} 31
akvorado_inlet_bmp_peers_total{exporter="10.16.0.19"} 24
akvorado_inlet_bmp_peers_total{exporter="10.16.0.20"} 103
akvorado_inlet_bmp_peers_total{exporter="10.16.0.21"} 29
akvorado_inlet_bmp_peers_total{exporter="10.16.0.22"} 72
akvorado_inlet_bmp_peers_total{exporter="10.16.0.23"} 112
akvorado_inlet_bmp_peers_total{exporter="10.16.0.25"} 60
akvorado_inlet_bmp_peers_total{exporter="10.16.0.26"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.27"} 34
akvorado_inlet_bmp_peers_total{exporter="10.16.0.30"} 18
akvorado_inlet_bmp_peers_total{exporter="10.16.0.31"} 69
akvorado_inlet_bmp_peers_total{exporter="10.16.0.33"} 72
akvorado_inlet_bmp_peers_total{exporter="10.16.0.34"} 63
akvorado_inlet_bmp_peers_total{exporter="10.16.0.35"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.36"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.37"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.39"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.41"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.42"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.43"} 28
akvorado_inlet_bmp_peers_total{exporter="10.16.0.44"} 40
# HELP akvorado_inlet_bmp_routes_total Number of routes up.
# TYPE akvorado_inlet_bmp_routes_total gauge
akvorado_inlet_bmp_routes_total{exporter="10.16.0.1"} 535636
akvorado_inlet_bmp_routes_total{exporter="10.16.0.10"} 545492
akvorado_inlet_bmp_routes_total{exporter="10.16.0.19"} 504388
akvorado_inlet_bmp_routes_total{exporter="10.16.0.20"} 497837
akvorado_inlet_bmp_routes_total{exporter="10.16.0.21"} 570012
akvorado_inlet_bmp_routes_total{exporter="10.16.0.22"} 475002
akvorado_inlet_bmp_routes_total{exporter="10.16.0.23"} 469779
akvorado_inlet_bmp_routes_total{exporter="10.16.0.25"} 540520
akvorado_inlet_bmp_routes_total{exporter="10.16.0.26"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.27"} 524770
akvorado_inlet_bmp_routes_total{exporter="10.16.0.30"} 500309
akvorado_inlet_bmp_routes_total{exporter="10.16.0.31"} 568367
akvorado_inlet_bmp_routes_total{exporter="10.16.0.33"} 421992
akvorado_inlet_bmp_routes_total{exporter="10.16.0.34"} 432389
akvorado_inlet_bmp_routes_total{exporter="10.16.0.35"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.36"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.37"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.39"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.41"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.42"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.43"} 600631
akvorado_inlet_bmp_routes_total{exporter="10.16.0.44"} 694579


#### curl -s http://127.0.0.1:8080/api/v0/inlet/metrics | grep drop

# HELP akvorado_inlet_flow_input_udp_in_drops Dropped packets due to listen queue full.
# TYPE akvorado_inlet_flow_input_udp_in_drops gauge
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="1"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="10"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="11"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="12"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="13"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="16"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="17"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="19"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="2"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="20"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="21"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="22"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="23"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="25"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="26"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="28"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="29"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="3"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="31"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="32"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="33"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="35"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="36"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="37"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="38"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="4"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="40"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="41"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="43"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="44"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="46"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="47"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="48"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="49"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="5"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="50"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="51"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="52"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="53"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="54"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="55"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="56"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="57"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="59"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="6"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="7"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="8"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="9"} 0
# HELP akvorado_inlet_flow_input_udp_out_drops Dropped packets due to internal queue full.
# TYPE akvorado_inlet_flow_input_udp_out_drops counter
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.1",listener="0.0.0.0:2055",worker="22"} 251
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.1",listener="0.0.0.0:2055",worker="49"} 22
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.13",listener="0.0.0.0:2055",worker="35"} 14
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.13",listener="0.0.0.0:2055",worker="49"} 846
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.16",listener="0.0.0.0:2055",worker="4"} 701
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.16",listener="0.0.0.0:2055",worker="50"} 2
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.17",listener="0.0.0.0:2055",worker="29"} 783
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.17",listener="0.0.0.0:2055",worker="9"} 14
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.18",listener="0.0.0.0:2055",worker="20"} 877
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.18",listener="0.0.0.0:2055",worker="56"} 7
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.19",listener="0.0.0.0:2055",worker="13"} 1203
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.19",listener="0.0.0.0:2055",worker="52"} 2
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.20",listener="0.0.0.0:2055",worker="50"} 11
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.20",listener="0.0.0.0:2055",worker="56"} 466
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.21",listener="0.0.0.0:2055",worker="36"} 264
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.22",listener="0.0.0.0:2055",worker="47"} 17
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.22",listener="0.0.0.0:2055",worker="55"} 206
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.23",listener="0.0.0.0:2055",worker="12"} 8
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.23",listener="0.0.0.0:2055",worker="13"} 624
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.25",listener="0.0.0.0:2055",worker="31"} 500
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.25",listener="0.0.0.0:2055",worker="57"} 3
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.28",listener="0.0.0.0:2055",worker="3"} 199
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.28",listener="0.0.0.0:2055",worker="52"} 1
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.33",listener="0.0.0.0:2055",worker="35"} 5
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.33",listener="0.0.0.0:2055",worker="50"} 136
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.34",listener="0.0.0.0:2055",worker="29"} 6
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.34",listener="0.0.0.0:2055",worker="55"} 57
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.35",listener="0.0.0.0:2055",worker="25"} 9
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.36",listener="0.0.0.0:2055",worker="29"} 2
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.37",listener="0.0.0.0:2055",worker="13"} 19
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.37",listener="0.0.0.0:2055",worker="23"} 635
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.39",listener="0.0.0.0:2055",worker="2"} 16
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.39",listener="0.0.0.0:2055",worker="37"} 472
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.41",listener="0.0.0.0:2055",worker="8"} 56
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.42",listener="0.0.0.0:2055",worker="2"} 2
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.43",listener="0.0.0.0:2055",worker="33"} 35
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.43",listener="0.0.0.0:2055",worker="50"} 2
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.44",listener="0.0.0.0:2055",worker="4"} 2
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.44",listener="0.0.0.0:2055",worker="5"} 48

@vincentbernat
Copy link
Member Author

Unfortunately, I think the second profile was done after the internal RIB has been updated, so it does not show any hot path during RIB update. Note that when a BGP peer goes down, it is removed immediately by the RIB. When a BMP peer goes down, all the associated peers are kept for 5 minutes by default. This is inlet, bmp, keep. Try to set it to 10s, put a BMP peer down, then start the profiling. Since the profiling runs for 30s, it should capture the CPU time spent to remove peers.

Unrelated, but there is also a lot of CPU spent for classification. It should be easy to optimize this with a cache. How big are your classification rules?

@vincentbernat
Copy link
Member Author

The current implementation tries to minimize memory usage and heavily relies on locking. I think I need to rewrite it to have a better compromise. Can you run the profile on memory instead of cpu? go tool pprof http://localhost:6060/debug/pprof/heap. I'd like to know how much space use the tree inside the RIB in your case. Maybe its size is reasonable enough to have two copies of it (< 500MB). BTW, you can generate SVG directly if you want to attach them here (they should not contain sensitive information) with go tool pprof theprofile, then svg. But I am OK to continue receiving them by email.

Then, in the meantime, you can either increase 10× the internal pipeline or reduce the number of BMP sessions.

@vincentbernat
Copy link
Member Author

Unrelated, but there is also a lot of CPU spent for classification. It should be easy to optimize this with a cache. How big are your classification rules?

I have replaced the LRU cache by a TTL-based cache in 2e30798. I think this should help your use case (I suppose your cache was set too small, with the new commit, there is no size to be configured anymore).

@kostik2022
Copy link

Vincent, I compiled latest version (with commit about cache). BMP cache was already set to 10h. Still have drops. Metrics are:

# HELP akvorado_inlet_bmp_locked_duration_seconds Duration during which the RIB is locked.
# TYPE akvorado_inlet_bmp_locked_duration_seconds summary
akvorado_inlet_bmp_locked_duration_seconds{reason="peer-removal",quantile="0.5"} 19.691870483
akvorado_inlet_bmp_locked_duration_seconds{reason="peer-removal",quantile="0.9"} 19.691870483
akvorado_inlet_bmp_locked_duration_seconds{reason="peer-removal",quantile="0.99"} 19.691870483
akvorado_inlet_bmp_locked_duration_seconds_sum{reason="peer-removal"} 91.02272247700003
akvorado_inlet_bmp_locked_duration_seconds_count{reason="peer-removal"} 994
# HELP akvorado_inlet_bmp_messages_received_total Number of BMP messages received.
# TYPE akvorado_inlet_bmp_messages_received_total counter
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="peer-down-notification"} 15
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="peer-up-notification"} 33
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="route-monitoring"} 384565
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="statistics-report"} 18
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="peer-down-notification"} 247
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="peer-up-notification"} 278
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="route-monitoring"} 714385
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="statistics-report"} 31
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="peer-down-notification"} 226
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="peer-up-notification"} 250
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="route-monitoring"} 712684
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="statistics-report"} 24
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="peer-up-notification"} 103
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="route-monitoring"} 723841
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="statistics-report"} 103
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="peer-down-notification"} 224
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="peer-up-notification"} 253
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="route-monitoring"} 694541
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="statistics-report"} 29
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="peer-down-notification"} 225
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="peer-up-notification"} 297
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="route-monitoring"} 712860
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="statistics-report"} 72
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="peer-up-notification"} 112
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="route-monitoring"} 722911
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="statistics-report"} 112
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="peer-down-notification"} 18
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="peer-up-notification"} 78
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="route-monitoring"} 693124
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="statistics-report"} 60
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="peer-down-notification"} 16
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="peer-up-notification"} 50
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="route-monitoring"} 498637
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="statistics-report"} 34
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="peer-down-notification"} 10
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="peer-up-notification"} 28
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="route-monitoring"} 415384
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="statistics-report"} 18
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="peer-down-notification"} 13
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="peer-up-notification"} 82
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="route-monitoring"} 694864
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="statistics-report"} 69
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="peer-up-notification"} 72
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="route-monitoring"} 613860
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="statistics-report"} 72
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="peer-up-notification"} 63
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="route-monitoring"} 617371
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="statistics-report"} 63
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.35",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.36",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.37",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.39",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.41",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.42",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="peer-up-notification"} 28
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="route-monitoring"} 519802
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="statistics-report"} 28
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="peer-up-notification"} 40
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="route-monitoring"} 682344
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="statistics-report"} 40
# HELP akvorado_inlet_bmp_opened_connections_total Number of opened connections.
# TYPE akvorado_inlet_bmp_opened_connections_total counter
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.1"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.10"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.19"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.20"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.21"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.22"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.23"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.25"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.27"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.30"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.31"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.33"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.34"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.35"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.36"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.37"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.39"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.41"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.42"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.43"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.44"} 1
# HELP akvorado_inlet_bmp_peer_removal_done_total Number of peers removed from the RIB.
# TYPE akvorado_inlet_bmp_peer_removal_done_total counter
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.1"} 15
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.10"} 247
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.19"} 226
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.21"} 224
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.22"} 225
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.25"} 18
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.27"} 16
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.30"} 10
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.31"} 13
# HELP akvorado_inlet_bmp_peers_total Number of peers up.
# TYPE akvorado_inlet_bmp_peers_total gauge
akvorado_inlet_bmp_peers_total{exporter="10.16.0.1"} 18
akvorado_inlet_bmp_peers_total{exporter="10.16.0.10"} 31
akvorado_inlet_bmp_peers_total{exporter="10.16.0.19"} 24
akvorado_inlet_bmp_peers_total{exporter="10.16.0.20"} 103
akvorado_inlet_bmp_peers_total{exporter="10.16.0.21"} 29
akvorado_inlet_bmp_peers_total{exporter="10.16.0.22"} 72
akvorado_inlet_bmp_peers_total{exporter="10.16.0.23"} 112
akvorado_inlet_bmp_peers_total{exporter="10.16.0.25"} 60
akvorado_inlet_bmp_peers_total{exporter="10.16.0.27"} 34
akvorado_inlet_bmp_peers_total{exporter="10.16.0.30"} 18
akvorado_inlet_bmp_peers_total{exporter="10.16.0.31"} 69
akvorado_inlet_bmp_peers_total{exporter="10.16.0.33"} 72
akvorado_inlet_bmp_peers_total{exporter="10.16.0.34"} 63
akvorado_inlet_bmp_peers_total{exporter="10.16.0.35"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.36"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.37"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.39"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.41"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.42"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.43"} 28
akvorado_inlet_bmp_peers_total{exporter="10.16.0.44"} 40
# HELP akvorado_inlet_bmp_routes_total Number of routes up.
# TYPE akvorado_inlet_bmp_routes_total gauge
akvorado_inlet_bmp_routes_total{exporter="10.16.0.1"} 1.430478e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.10"} 2.583942e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.19"} 2.553701e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.20"} 2.723493e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.21"} 2.519282e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.22"} 2.320789e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.23"} 2.342266e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.25"} 2.160849e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.27"} 1.870293e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.30"} 1.594783e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.31"} 2.668936e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.33"} 2.619983e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.34"} 2.330674e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.35"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.36"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.37"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.39"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.41"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.42"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.43"} 2.397524e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.44"} 2.510341e+06

# HELP akvorado_inlet_flow_input_udp_out_drops Dropped packets due to internal queue full.
# TYPE akvorado_inlet_flow_input_udp_out_drops counter
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.1",listener="0.0.0.0:2055",worker="20"} 40
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.1",listener="0.0.0.0:2055",worker="34"} 697
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.13",listener="0.0.0.0:2055",worker="25"} 2628
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.13",listener="0.0.0.0:2055",worker="46"} 39
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.16",listener="0.0.0.0:2055",worker="13"} 2126
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.16",listener="0.0.0.0:2055",worker="41"} 16
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.17",listener="0.0.0.0:2055",worker="31"} 48
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.17",listener="0.0.0.0:2055",worker="47"} 2296
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.18",listener="0.0.0.0:2055",worker="48"} 3025
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.18",listener="0.0.0.0:2055",worker="59"} 30
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.19",listener="0.0.0.0:2055",worker="18"} 9
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.19",listener="0.0.0.0:2055",worker="9"} 3548
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.20",listener="0.0.0.0:2055",worker="41"} 13
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.20",listener="0.0.0.0:2055",worker="56"} 1276
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.21",listener="0.0.0.0:2055",worker="11"} 1134
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.21",listener="0.0.0.0:2055",worker="2"} 9
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.22",listener="0.0.0.0:2055",worker="30"} 842
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.22",listener="0.0.0.0:2055",worker="49"} 57
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.23",listener="0.0.0.0:2055",worker="0"} 1871
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.23",listener="0.0.0.0:2055",worker="8"} 4
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.25",listener="0.0.0.0:2055",worker="14"} 1714
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.25",listener="0.0.0.0:2055",worker="45"} 12
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.28",listener="0.0.0.0:2055",worker="32"} 3
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.28",listener="0.0.0.0:2055",worker="37"} 591
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.33",listener="0.0.0.0:2055",worker="16"} 44
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.33",listener="0.0.0.0:2055",worker="28"} 651
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.34",listener="0.0.0.0:2055",worker="44"} 231
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.34",listener="0.0.0.0:2055",worker="59"} 18
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.35",listener="0.0.0.0:2055",worker="40"} 58
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.36",listener="0.0.0.0:2055",worker="54"} 14
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.37",listener="0.0.0.0:2055",worker="13"} 48
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.37",listener="0.0.0.0:2055",worker="15"} 1824
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.39",listener="0.0.0.0:2055",worker="13"} 1241
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.39",listener="0.0.0.0:2055",worker="31"} 70
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.41",listener="0.0.0.0:2055",worker="35"} 303
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.42",listener="0.0.0.0:2055",worker="13"} 18
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.43",listener="0.0.0.0:2055",worker="27"} 172
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.43",listener="0.0.0.0:2055",worker="40"} 7
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.44",listener="0.0.0.0:2055",worker="14"} 164
akvorado_inlet_flow_input_udp_out_drops{exporter="172.31.0.44",listener="0.0.0.0:2055",worker="36"} 8

@kostik2022
Copy link

kostik2022 commented Nov 11, 2022

Profile graphs
Before drops
Before
Right after drop session
After_drop

@kostik2022
Copy link

Vincent, thank you for your efforts. We set internal queue (Kafka) to 1M (from 50k). Looks like that hepls. Will observe during weekend.

@vincentbernat
Copy link
Member Author

Thanks for the debug!

  • For the memory profile, I'll have to investigate a bit what the good course would be. Thanks to deduplication of route attributes, the attributes (AS paths, etc) take a small memory amount (72MB). The prefix tree is also small (96MB), however the non-deduplicated information attached to each prefix is big (2.6GB). I have pushed a set of commits to try to divide this number by two (or even a bit more).
  • There is some kind of leak with Kafka compression. 640MB is quite high for a dictionary. Maybe this could be fixed by feat: improve memory usage of zstd encoder by using our own pool management IBM/sarama#2375. This is not a problem right now.

If you can, I'll still welcome a CPU profile as well during a BMP session shutdown. Set keep to 10s (not 10h), shutdown one BMP session from a router, then quickly ask for the profile. The BMP session cleanup will happen 10s after the shutdown and will appear on the CPU profile. No need to do it now, this is just in case there are some other clues that could be actionable.

@vincentbernat
Copy link
Member Author

So, I have merged a partial redesign for the RIB moving away from locks. It would be great if you can test it with inlet.bmp.rib-mode set to performance. It will use a lot more memory (2 to 3 times) but thanks to memory optimizations in the previous commits, I think it should be half of that. It would be nice if you can run a memory profile to see how much it takes and if you can provide the metric values you get, notably the rib_copies_total that may help to see if it is worth capping the memory use or not.

@kostik2022
Copy link

Sure, here is a profile (mem):
profile001
Here are the metrics:

###### rib_copies_total ######
akvorado_inlet_bmp_rib_copies_total{timer="maximum",quantile="0.5"} 9.74e-07
akvorado_inlet_bmp_rib_copies_total{timer="maximum",quantile="0.9"} 1.1690000000000002e-06
akvorado_inlet_bmp_rib_copies_total{timer="maximum",quantile="0.99"} 1.1690000000000002e-06
akvorado_inlet_bmp_rib_copies_total_sum{timer="maximum"} 7.473900000000002e-05
akvorado_inlet_bmp_rib_copies_total_count{timer="maximum"} 22
akvorado_inlet_bmp_rib_copies_total{timer="minimum",quantile="0.5"} 2.01e-06
akvorado_inlet_bmp_rib_copies_total{timer="minimum",quantile="0.9"} 2.2499999999999996e-06
akvorado_inlet_bmp_rib_copies_total{timer="minimum",quantile="0.99"} 2.2499999999999996e-06
akvorado_inlet_bmp_rib_copies_total_sum{timer="minimum"} 0.0005784980000000001
akvorado_inlet_bmp_rib_copies_total_count{timer="minimum"} 158

##### BMP #####
# HELP akvorado_inlet_bmp_closed_connections_total Number of closed connections.
# TYPE akvorado_inlet_bmp_closed_connections_total counter
akvorado_inlet_bmp_closed_connections_total{exporter="10.16.0.35"} 9
akvorado_inlet_bmp_closed_connections_total{exporter="10.16.0.36"} 9
akvorado_inlet_bmp_closed_connections_total{exporter="10.16.0.37"} 9
akvorado_inlet_bmp_closed_connections_total{exporter="10.16.0.39"} 9
akvorado_inlet_bmp_closed_connections_total{exporter="10.16.0.41"} 9
akvorado_inlet_bmp_closed_connections_total{exporter="10.16.0.42"} 9
# HELP akvorado_inlet_bmp_errors_total Number of fatal errors while processing BMP messages.
# TYPE akvorado_inlet_bmp_errors_total counter
akvorado_inlet_bmp_errors_total{error="cannot read BMP header",exporter="10.16.0.35"} 9
akvorado_inlet_bmp_errors_total{error="cannot read BMP header",exporter="10.16.0.36"} 9
akvorado_inlet_bmp_errors_total{error="cannot read BMP header",exporter="10.16.0.37"} 9
akvorado_inlet_bmp_errors_total{error="cannot read BMP header",exporter="10.16.0.39"} 9
akvorado_inlet_bmp_errors_total{error="cannot read BMP header",exporter="10.16.0.41"} 9
akvorado_inlet_bmp_errors_total{error="cannot read BMP header",exporter="10.16.0.42"} 9
# HELP akvorado_inlet_bmp_messages_received_total Number of BMP messages received.
# TYPE akvorado_inlet_bmp_messages_received_total counter
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="peer-down-notification"} 15
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="peer-up-notification"} 33
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="route-monitoring"} 473799
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="statistics-report"} 54
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="peer-down-notification"} 249
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="peer-up-notification"} 280
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="route-monitoring"} 1.925204e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="statistics-report"} 93
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="peer-down-notification"} 228
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="peer-up-notification"} 252
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="route-monitoring"} 1.752022e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="statistics-report"} 72
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="peer-up-notification"} 103
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="route-monitoring"} 1.003746e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="statistics-report"} 309
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="peer-down-notification"} 225
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="peer-up-notification"} 254
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="route-monitoring"} 1.712057e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="statistics-report"} 87
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="peer-down-notification"} 228
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="peer-up-notification"} 300
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="route-monitoring"} 2.513387e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="statistics-report"} 216
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="peer-up-notification"} 112
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="route-monitoring"} 1.546599e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="statistics-report"} 336
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="peer-down-notification"} 18
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="peer-up-notification"} 78
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="route-monitoring"} 2.307539e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="statistics-report"} 180
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="peer-down-notification"} 16
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="peer-up-notification"} 50
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="route-monitoring"} 562104
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="statistics-report"} 102
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="peer-down-notification"} 10
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="peer-up-notification"} 28
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="route-monitoring"} 447042
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="statistics-report"} 54
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="peer-down-notification"} 15
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="peer-up-notification"} 84
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="route-monitoring"} 2.146264e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="statistics-report"} 211
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="peer-up-notification"} 72
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="route-monitoring"} 1.0747e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="statistics-report"} 216
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="peer-up-notification"} 63
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="route-monitoring"} 836135
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="statistics-report"} 189
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.35",type="initiation"} 10
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.36",type="initiation"} 10
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.37",type="initiation"} 10
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.39",type="initiation"} 10
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.41",type="initiation"} 10
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.42",type="initiation"} 10
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="peer-up-notification"} 28
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="route-monitoring"} 655748
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="statistics-report"} 84
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="peer-up-notification"} 40
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="route-monitoring"} 1.242163e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="statistics-report"} 120
# HELP akvorado_inlet_bmp_opened_connections_total Number of opened connections.
# TYPE akvorado_inlet_bmp_opened_connections_total counter
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.1"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.10"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.19"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.20"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.21"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.22"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.23"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.25"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.27"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.30"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.31"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.33"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.34"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.35"} 10
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.36"} 10
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.37"} 10
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.39"} 10
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.41"} 10
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.42"} 10
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.43"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.44"} 1
# HELP akvorado_inlet_bmp_peers_total Number of peers up.
# TYPE akvorado_inlet_bmp_peers_total gauge
akvorado_inlet_bmp_peers_total{exporter="10.16.0.1"} 18
akvorado_inlet_bmp_peers_total{exporter="10.16.0.10"} 31
akvorado_inlet_bmp_peers_total{exporter="10.16.0.19"} 24
akvorado_inlet_bmp_peers_total{exporter="10.16.0.20"} 103
akvorado_inlet_bmp_peers_total{exporter="10.16.0.21"} 29
akvorado_inlet_bmp_peers_total{exporter="10.16.0.22"} 72
akvorado_inlet_bmp_peers_total{exporter="10.16.0.23"} 112
akvorado_inlet_bmp_peers_total{exporter="10.16.0.25"} 60
akvorado_inlet_bmp_peers_total{exporter="10.16.0.27"} 34
akvorado_inlet_bmp_peers_total{exporter="10.16.0.30"} 18
akvorado_inlet_bmp_peers_total{exporter="10.16.0.31"} 69
akvorado_inlet_bmp_peers_total{exporter="10.16.0.33"} 72
akvorado_inlet_bmp_peers_total{exporter="10.16.0.34"} 63
akvorado_inlet_bmp_peers_total{exporter="10.16.0.35"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.36"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.37"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.39"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.41"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.42"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.43"} 28
akvorado_inlet_bmp_peers_total{exporter="10.16.0.44"} 40
# HELP akvorado_inlet_bmp_rib_copies_total Duration of RIB copies to read-only version.
# TYPE akvorado_inlet_bmp_rib_copies_total summary
akvorado_inlet_bmp_rib_copies_total{timer="maximum",quantile="0.5"} 9.74e-07
akvorado_inlet_bmp_rib_copies_total{timer="maximum",quantile="0.9"} 1.1690000000000002e-06
akvorado_inlet_bmp_rib_copies_total{timer="maximum",quantile="0.99"} 1.1690000000000002e-06
akvorado_inlet_bmp_rib_copies_total_sum{timer="maximum"} 7.473900000000002e-05
akvorado_inlet_bmp_rib_copies_total_count{timer="maximum"} 22
akvorado_inlet_bmp_rib_copies_total{timer="minimum",quantile="0.5"} 2.01e-06
akvorado_inlet_bmp_rib_copies_total{timer="minimum",quantile="0.9"} 3.783e-06
akvorado_inlet_bmp_rib_copies_total{timer="minimum",quantile="0.99"} 3.783e-06
akvorado_inlet_bmp_rib_copies_total_sum{timer="minimum"} 0.0005835950000000001
akvorado_inlet_bmp_rib_copies_total_count{timer="minimum"} 160
# HELP akvorado_inlet_bmp_routes_total Number of routes up.
# TYPE akvorado_inlet_bmp_routes_total gauge
akvorado_inlet_bmp_routes_total{exporter="10.16.0.1"} 1.431007e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.10"} 5.660126e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.19"} 5.319263e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.20"} 3.41618e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.21"} 5.249115e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.22"} 6.833826e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.23"} 4.608264e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.25"} 6.235529e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.27"} 1.874001e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.30"} 1.597232e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.31"} 6.872782e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.33"} 3.738601e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.34"} 2.557192e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.35"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.36"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.37"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.39"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.41"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.42"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.43"} 2.398579e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.44"} 3.704072e+06

##### Drops #####
# HELP akvorado_inlet_flow_input_udp_in_drops Dropped packets due to listen queue full.
# TYPE akvorado_inlet_flow_input_udp_in_drops gauge
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="0"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="1"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="10"} 684
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="11"} 3907
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="12"} 2878
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="13"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="14"} 489
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="17"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="18"} 121
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="19"} 2421
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="2"} 775
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="20"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="21"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="22"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="23"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="24"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="25"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="26"} 2085
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="27"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="28"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="29"} 825
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="3"} 59
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="30"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="4"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="5"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="6"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="7"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="8"} 0
akvorado_inlet_flow_input_udp_in_drops{listener="0.0.0.0:2055",worker="9"} 39

And here's a cut from top output for akvorado:
image

Anyway, looks like it more stable right now.

@vincentbernat
Copy link
Member Author

Thanks. So, previously, 30M routes, you were using 2.6G. Now, 60M routes, you are using 3.5G. So, this is a bit more memory efficient, but I was hoping for something more like 50%, so that the additional copy is "free". Also, you have two additional copies, so, you are using 10G, so the double of what was used previously. I was expecting a bit about that. As the copy time is really slow, I should be able to only use 7G in your use case by discarding the old copy before copying. I'll try to figure out how to reuse the memory of the old copy for the new one.

@vincentbernat
Copy link
Member Author

To be able to optimize memory usage, I need to reuse the old unused tree instead of waiting for the GC to remove it from memory. While reusing arrays is easy, maps are more difficult. There is a Go proposal that would help for this.

@kostik2022
Copy link

Interesting!...
Thank you for your efforts!

@vincentbernat
Copy link
Member Author

I have enhanced the memory mode. It is now able to interrupt large updates to answer lookups. However, all lookups are still handled by a single routine, so even if there was no update running, its latency is always higher than the performance mode. In your use case, if memory is not a problem, you should stick to the performance mode.

@kostik2022
Copy link

kostik2022 commented Nov 15, 2022

Vincent, working last day at lastet codebase, using perfomance mode. Drops are gone. Yes, it consumes a lot of RAM, but it's ok for now.
But new interesting effect was found: some amount of flows (about 1/4) is processed in several hours (!) after moment whaen they're really happened. This looks like these flows stuck somewhere in queues and processed after a long long time.
Exporters where this effect observed are not the mos loaded (not int the top of flows generators). Some portions of our config:

kafka:
  topic-configuration:
    num-partitions: 21
    replication-factor: 1
    config-entries:
      segment.bytes: 1073741824
      retention.ms: 86400000
      cleanup.policy: delete
      compression.type: producer

clickhouse:
  kafka:
    consumers: 4
  resolutions:
    - interval: 0
      ttl: 360h  # 15 days
    - interval: 1m
      ttl: 168h  # 1 week
    - interval: 5m
      ttl: 2160h # 3 months
    - interval: 1h
      ttl: 8760h # 1 year

console:
  homepage-top-widgets: [ protocol, src-port, dst-country, etype, exporter]
  default-visualize-options:
    start: 1 hour ago
    end: now
    filter: OutIfBoundary = external
    dimensions:
      - DstAS

inlet:
  kafka:
    compression-codec: zstd
    queue-size: 1000000
  bmp:
    listen: 0.0.0.0:10179
    collect-asns: true
    collect-aspaths: true
    collect-communities: false
    keep: 1h
    rib-mode: performance
  snmp:
    workers: 12
    poller-timeout: 5s
    poller-retries: 1
    cache-duration: 20m
    cache-refresh: 20m
    cache-check-interval: 10m
  flow:
    inputs:
      - type: udp
        decoder: netflow
        listen: 0.0.0.0:2055
        workers: 30
        receive-buffer: 41943040
        queue-size: 500000
  core:
    workers: 66
    exporter-classifiers:
      - ClassifySiteRegex(Exporter.Name, "^([^-]+)-", "$1")
      - ClassifyRegion("europe")
    interface-classifiers:
      - |
        ClassifyConnectivityRegex(Interface.Description, "^.*(ipt|ix|peer)", "$1") &&
        ClassifyProviderRegex(Interface.Description, "^(\\w+.+)-l3(-1|-2)?/EXT/.+", "$1") &&
        ClassifyExternal()
      - ClassifyInternal()

@vincentbernat
Copy link
Member Author

I don't have a clue about this. But since we are using a lot of memory, is there enough memory for ClickHouse? In Kafka-UI, could you check if ClickHouse is late to ingest from Kafka? Maybe it only happens for some partitions? Did you add more partitions? I know ClickHouse can take sometime to read the metadata again.

@kostik2022
Copy link

kostik2022 commented Nov 18, 2022

We study this alittle. Looks like in perfomance mode there are no drops any more with our load. But, looks like some BMP info is lost. It looks like we didn't recieve any route info from some routers (of course, we double-checked router configuration). And more more strange thing: sometimes DstAsPath is incorrect - the info from other exporter is used.
Again, we faced it in perfomance mode only; at default mode info is ok but a lot of drops..

@vincentbernat
Copy link
Member Author

I don't see anything obvious that would trigger that. I'll look more in the following days.

@vincentbernat vincentbernat reopened this Nov 18, 2022
@vincentbernat
Copy link
Member Author

I have pushed another commit that should provide a metric with the lag between the live RIB and the read-only RIB. I doubt there is a lag, but maybe that's the case. If you can provide again a dump of the metrics related to BMP, it would be helpful. You feel confident that the errors are only in performance mode, not in the default mode? The difference between the two modes are small, so if that's the case, it will be easier for me to find.

@kostik2022
Copy link

Give me 15-20 mins, I'll recompile and make dump.
All what I can say - in default mode was drops but processed BMPs are ok.
In perfomance - as I described. Let me make some additional tests

@vincentbernat
Copy link
Member Author

Unrelated, but in default mode, do you think there are more or less drops than previously (before the introduction of modes)? I think that during updates, it should be better now, but when there are no updates, the performance has been degraded (something that could be fixed at some point).

@vincentbernat
Copy link
Member Author

If you still have the profiler enabled, you can also query /debug/pprof/goroutine?debug=2 to get the state of all routines. It could be helpful if we suspect something is stuck somewhere.

@kostik2022
Copy link

Here are metrics in perfomance mode. The issue still exists

# HELP akvorado_inlet_bmp_closed_connections_total Number of closed connections.
# TYPE akvorado_inlet_bmp_closed_connections_total counter
akvorado_inlet_bmp_closed_connections_total{exporter="10.16.0.39"} 1
# HELP akvorado_inlet_bmp_errors_total Number of fatal errors while processing BMP messages.
# TYPE akvorado_inlet_bmp_errors_total counter
akvorado_inlet_bmp_errors_total{error="cannot read BMP header",exporter="10.16.0.39"} 1
# HELP akvorado_inlet_bmp_messages_received_total Number of BMP messages received.
# TYPE akvorado_inlet_bmp_messages_received_total counter
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="peer-down-notification"} 15
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="peer-up-notification"} 33
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="route-monitoring"} 393539
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="statistics-report"} 18
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="peer-down-notification"} 55
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="peer-up-notification"} 278
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="route-monitoring"} 1.621488e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="statistics-report"} 223
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="peer-down-notification"} 52
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="peer-up-notification"} 250
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="route-monitoring"} 1.601612e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="statistics-report"} 198
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="peer-up-notification"} 103
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="route-monitoring"} 949015
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="statistics-report"} 103
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="peer-down-notification"} 223
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="peer-up-notification"} 252
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="route-monitoring"} 1.430867e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="statistics-report"} 29
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="peer-down-notification"} 225
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="peer-up-notification"} 297
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="route-monitoring"} 2.055513e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="statistics-report"} 72
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="peer-up-notification"} 112
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="route-monitoring"} 1.420644e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="statistics-report"} 112
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.24",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.24",type="peer-down-notification"} 11
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.24",type="peer-up-notification"} 67
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.24",type="route-monitoring"} 883462
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.24",type="statistics-report"} 56
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="peer-down-notification"} 18
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="peer-up-notification"} 78
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="route-monitoring"} 1.846017e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="statistics-report"} 60
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="peer-down-notification"} 16
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="peer-up-notification"} 50
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="route-monitoring"} 520175
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="statistics-report"} 34
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="peer-down-notification"} 10
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="peer-up-notification"} 28
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="route-monitoring"} 426392
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="statistics-report"} 18
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="peer-down-notification"} 13
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="peer-up-notification"} 82
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="route-monitoring"} 1.855331e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="statistics-report"} 69
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="peer-up-notification"} 74
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="route-monitoring"} 997316
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="statistics-report"} 74
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="peer-up-notification"} 63
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="route-monitoring"} 705250
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="statistics-report"} 63
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.39",type="initiation"} 2
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="peer-up-notification"} 28
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="route-monitoring"} 574978
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="statistics-report"} 28
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="peer-up-notification"} 42
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="route-monitoring"} 1.079895e+06
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="statistics-report"} 42
# HELP akvorado_inlet_bmp_opened_connections_total Number of opened connections.
# TYPE akvorado_inlet_bmp_opened_connections_total counter
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.1"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.10"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.19"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.20"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.21"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.22"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.23"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.24"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.25"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.27"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.30"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.31"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.33"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.34"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.39"} 2
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.43"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.44"} 1
# HELP akvorado_inlet_bmp_peer_removal_done_total Number of peers removed from the RIB.
# TYPE akvorado_inlet_bmp_peer_removal_done_total counter
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.1"} 15
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.10"} 55
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.19"} 52
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.21"} 223
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.22"} 225
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.24"} 11
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.25"} 18
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.27"} 16
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.30"} 10
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.31"} 13
# HELP akvorado_inlet_bmp_peers_total Number of peers up.
# TYPE akvorado_inlet_bmp_peers_total gauge
akvorado_inlet_bmp_peers_total{exporter="10.16.0.1"} 18
akvorado_inlet_bmp_peers_total{exporter="10.16.0.10"} 223
akvorado_inlet_bmp_peers_total{exporter="10.16.0.19"} 198
akvorado_inlet_bmp_peers_total{exporter="10.16.0.20"} 103
akvorado_inlet_bmp_peers_total{exporter="10.16.0.21"} 29
akvorado_inlet_bmp_peers_total{exporter="10.16.0.22"} 72
akvorado_inlet_bmp_peers_total{exporter="10.16.0.23"} 112
akvorado_inlet_bmp_peers_total{exporter="10.16.0.24"} 56
akvorado_inlet_bmp_peers_total{exporter="10.16.0.25"} 60
akvorado_inlet_bmp_peers_total{exporter="10.16.0.27"} 34
akvorado_inlet_bmp_peers_total{exporter="10.16.0.30"} 18
akvorado_inlet_bmp_peers_total{exporter="10.16.0.31"} 69
akvorado_inlet_bmp_peers_total{exporter="10.16.0.33"} 74
akvorado_inlet_bmp_peers_total{exporter="10.16.0.34"} 63
akvorado_inlet_bmp_peers_total{exporter="10.16.0.39"} 0
akvorado_inlet_bmp_peers_total{exporter="10.16.0.43"} 28
akvorado_inlet_bmp_peers_total{exporter="10.16.0.44"} 42
# HELP akvorado_inlet_bmp_rib_copies_total Duration of RIB copies to read-only version.
# TYPE akvorado_inlet_bmp_rib_copies_total summary
akvorado_inlet_bmp_rib_copies_total{timer="idle",quantile="0.5"} 2.085e-06
akvorado_inlet_bmp_rib_copies_total{timer="idle",quantile="0.9"} 5.843e-05
akvorado_inlet_bmp_rib_copies_total{timer="idle",quantile="0.99"} 5.843e-05
akvorado_inlet_bmp_rib_copies_total_sum{timer="idle"} 6.0515e-05
akvorado_inlet_bmp_rib_copies_total_count{timer="idle"} 2
akvorado_inlet_bmp_rib_copies_total{timer="maximum",quantile="0.5"} NaN
akvorado_inlet_bmp_rib_copies_total{timer="maximum",quantile="0.9"} NaN
akvorado_inlet_bmp_rib_copies_total{timer="maximum",quantile="0.99"} NaN
akvorado_inlet_bmp_rib_copies_total_sum{timer="maximum"} 6.647599999999999e-05
akvorado_inlet_bmp_rib_copies_total_count{timer="maximum"} 9
akvorado_inlet_bmp_rib_copies_total{timer="minimum",quantile="0.5"} 3.1199999999999998e-06
akvorado_inlet_bmp_rib_copies_total{timer="minimum",quantile="0.9"} 4.243e-06
akvorado_inlet_bmp_rib_copies_total{timer="minimum",quantile="0.99"} 0.000864298
akvorado_inlet_bmp_rib_copies_total_sum{timer="minimum"} 0.00098448
akvorado_inlet_bmp_rib_copies_total_count{timer="minimum"} 21
# HELP akvorado_inlet_bmp_rib_lag_seconds How outdated is the readonly RIB.
# TYPE akvorado_inlet_bmp_rib_lag_seconds gauge
akvorado_inlet_bmp_rib_lag_seconds 3.180767335
# HELP akvorado_inlet_bmp_routes_total Number of routes up.
# TYPE akvorado_inlet_bmp_routes_total gauge
akvorado_inlet_bmp_routes_total{exporter="10.16.0.1"} 1.43191e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.10"} 6.052694e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.19"} 5.697585e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.20"} 3.445929e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.21"} 5.109545e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.22"} 6.898987e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.23"} 4.638608e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.24"} 3.082537e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.25"} 6.244517e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.27"} 1.877017e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.30"} 1.600476e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.31"} 6.882773e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.33"} 3.747829e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.34"} 2.565127e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.39"} 0
akvorado_inlet_bmp_routes_total{exporter="10.16.0.43"} 2.399924e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.44"} 3.713592e+06

Profiler at /debug/pprof/goroutine?debug=2 gives nothing (http://localhost:6060/debug/pprof/goroutine?debug=2: parsing profile: unrecognized profile format)

@kostik2022
Copy link

Metrics in default mode:

# HELP akvorado_inlet_bmp_closed_connections_total Number of closed connections.
# TYPE akvorado_inlet_bmp_closed_connections_total counter
akvorado_inlet_bmp_closed_connections_total{exporter="10.16.0.39"} 1
akvorado_inlet_bmp_closed_connections_total{exporter="88.210.36.128"} 1
# HELP akvorado_inlet_bmp_errors_total Number of fatal errors while processing BMP messages.
# TYPE akvorado_inlet_bmp_errors_total counter
akvorado_inlet_bmp_errors_total{error="cannot read BMP body",exporter="10.16.0.39"} 1
akvorado_inlet_bmp_errors_total{error="cannot read BMP body",exporter="88.210.36.128"} 1
# HELP akvorado_inlet_bmp_messages_received_total Number of BMP messages received.
# TYPE akvorado_inlet_bmp_messages_received_total counter
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="peer-down-notification"} 15
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="peer-up-notification"} 33
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="route-monitoring"} 381473
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.1",type="statistics-report"} 18
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="peer-down-notification"} 55
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="peer-up-notification"} 278
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="route-monitoring"} 706547
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.10",type="statistics-report"} 223
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="peer-down-notification"} 52
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="peer-up-notification"} 250
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="route-monitoring"} 676789
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.19",type="statistics-report"} 198
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="peer-up-notification"} 103
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="route-monitoring"} 727678
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.20",type="statistics-report"} 103
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="peer-down-notification"} 223
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="peer-up-notification"} 252
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="route-monitoring"} 725977
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.21",type="statistics-report"} 29
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="peer-down-notification"} 225
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="peer-up-notification"} 297
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="route-monitoring"} 723745
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.22",type="statistics-report"} 72
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="peer-up-notification"} 112
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="route-monitoring"} 727815
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.23",type="statistics-report"} 112
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.24",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.24",type="peer-down-notification"} 11
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.24",type="peer-up-notification"} 67
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.24",type="route-monitoring"} 728742
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.24",type="statistics-report"} 56
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="peer-down-notification"} 18
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="peer-up-notification"} 78
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="route-monitoring"} 723256
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.25",type="statistics-report"} 60
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="peer-down-notification"} 16
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="peer-up-notification"} 50
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="route-monitoring"} 510184
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.27",type="statistics-report"} 34
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="peer-down-notification"} 10
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="peer-up-notification"} 28
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="route-monitoring"} 420850
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.30",type="statistics-report"} 18
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="peer-down-notification"} 13
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="peer-up-notification"} 82
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="route-monitoring"} 701658
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.31",type="statistics-report"} 69
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="peer-up-notification"} 74
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="route-monitoring"} 625760
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.33",type="statistics-report"} 74
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="peer-up-notification"} 63
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="route-monitoring"} 639895
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.34",type="statistics-report"} 63
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.39",type="initiation"} 2
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.39",type="peer-up-notification"} 52
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.39",type="route-monitoring"} 4778
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.39",type="statistics-report"} 52
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="peer-up-notification"} 28
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="route-monitoring"} 555702
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.43",type="statistics-report"} 28
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="peer-up-notification"} 42
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="route-monitoring"} 737690
akvorado_inlet_bmp_messages_received_total{exporter="10.16.0.44",type="statistics-report"} 42
akvorado_inlet_bmp_messages_received_total{exporter="88.210.36.128",type="initiation"} 1
akvorado_inlet_bmp_messages_received_total{exporter="88.210.36.128",type="peer-up-notification"} 51
akvorado_inlet_bmp_messages_received_total{exporter="88.210.36.128",type="route-monitoring"} 172728
akvorado_inlet_bmp_messages_received_total{exporter="88.210.36.128",type="statistics-report"} 51
# HELP akvorado_inlet_bmp_opened_connections_total Number of opened connections.
# TYPE akvorado_inlet_bmp_opened_connections_total counter
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.1"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.10"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.19"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.20"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.21"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.22"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.23"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.24"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.25"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.27"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.30"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.31"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.33"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.34"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.39"} 2
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.43"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="10.16.0.44"} 1
akvorado_inlet_bmp_opened_connections_total{exporter="88.210.36.128"} 1
# HELP akvorado_inlet_bmp_peer_removal_done_total Number of peers removed from the RIB.
# TYPE akvorado_inlet_bmp_peer_removal_done_total counter
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.1"} 15
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.10"} 55
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.19"} 52
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.21"} 223
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.22"} 225
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.24"} 11
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.25"} 18
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.27"} 16
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.30"} 10
akvorado_inlet_bmp_peer_removal_done_total{exporter="10.16.0.31"} 13
# HELP akvorado_inlet_bmp_peers_total Number of peers up.
# TYPE akvorado_inlet_bmp_peers_total gauge
akvorado_inlet_bmp_peers_total{exporter="10.16.0.1"} 18
akvorado_inlet_bmp_peers_total{exporter="10.16.0.10"} 223
akvorado_inlet_bmp_peers_total{exporter="10.16.0.19"} 198
akvorado_inlet_bmp_peers_total{exporter="10.16.0.20"} 103
akvorado_inlet_bmp_peers_total{exporter="10.16.0.21"} 29
akvorado_inlet_bmp_peers_total{exporter="10.16.0.22"} 72
akvorado_inlet_bmp_peers_total{exporter="10.16.0.23"} 112
akvorado_inlet_bmp_peers_total{exporter="10.16.0.24"} 56
akvorado_inlet_bmp_peers_total{exporter="10.16.0.25"} 60
akvorado_inlet_bmp_peers_total{exporter="10.16.0.27"} 34
akvorado_inlet_bmp_peers_total{exporter="10.16.0.30"} 18
akvorado_inlet_bmp_peers_total{exporter="10.16.0.31"} 69
akvorado_inlet_bmp_peers_total{exporter="10.16.0.33"} 74
akvorado_inlet_bmp_peers_total{exporter="10.16.0.34"} 63
akvorado_inlet_bmp_peers_total{exporter="10.16.0.39"} 52
akvorado_inlet_bmp_peers_total{exporter="10.16.0.43"} 28
akvorado_inlet_bmp_peers_total{exporter="10.16.0.44"} 42
akvorado_inlet_bmp_peers_total{exporter="88.210.36.128"} 51
# HELP akvorado_inlet_bmp_routes_total Number of routes up.
# TYPE akvorado_inlet_bmp_routes_total gauge
akvorado_inlet_bmp_routes_total{exporter="10.16.0.1"} 1.432262e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.10"} 2.543878e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.19"} 2.350421e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.20"} 2.633345e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.21"} 2.568527e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.22"} 2.364352e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.23"} 2.351559e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.24"} 2.582051e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.25"} 2.347969e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.27"} 1.876958e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.30"} 1.600404e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.31"} 2.6844e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.33"} 2.613549e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.34"} 2.421327e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.39"} 21334
akvorado_inlet_bmp_routes_total{exporter="10.16.0.43"} 2.400135e+06
akvorado_inlet_bmp_routes_total{exporter="10.16.0.44"} 2.625317e+06
akvorado_inlet_bmp_routes_total{exporter="88.210.36.128"} 726881

Observe a delay between data gathered and processed (show at Visualise). There was no delay in perfomance mode.

@kostik2022
Copy link

kostik2022 commented Nov 18, 2022

Dear Vincet, our network team said that probably it may be caused by some kind network issues. So lets stop for now searching for data loss :) sorry for the disturbance very much

Anyway, above are mitrics for both modes.
We switched to perfomance again due to drops in default mode.
After network team check everything, we will monitor status and I will give you more feedback.
Thank you very much!

@vincentbernat
Copy link
Member Author

OK!

  1. For the goroutines, use curl, not the pprof tool. It will be an output in text format.
  2. The rib_lag_seconds is small, so it won't explain for errors in performance mode.
  3. For the delay in processing, I could add a metric for that. I just don't know how costly it is.

@kostik2022
Copy link

kostik2022 commented Nov 19, 2022

Here's debug output. To large to include inline.
debug.txt

@kostik2022
Copy link

kostik2022 commented Nov 21, 2022

Well, we tested alot, in perfomance mode only. Something wrong with BMP info, even if we turn on BMP at only one exporter. Comparing to SNMP, summary is ok (no drops/flow loss), but for ex., as-path data become completly unusable after short time. Zero traffic - moments when we restarted Akvorado stack

image

@vincentbernat
Copy link
Member Author

OK. I'll have a look.

@vincentbernat
Copy link
Member Author

I don't see why it would work in memory mode and not in performance mode. Do you see the same problem with memory mode (if you ignore the fact that packets are dropped, so you don't get a reliable view of the traffic)? I am unsure if your previous answers were definitive on this.

Your graph may have a reasonable explanation. When not all exporters have not synced with BMP, Akvorado tries to find the best possible match (in fact the first match, ignoring the next hop). This may mean the AS path is less diverse than it should be. By logging inside Clickhouse (docker-compose exec clickhouse clickhouse-client), you can look at the data manually. But first, maybe have a look at Dst1stAS. Does it make sense? How do you tell the data is incorrect? Is it because it was different before the introduction of performance mode?

In ClickHouse, you can do requests like:

SELECT COUNT(*) AS c, DstASPath FROM flows
WHERE TimeReceived > now() - interval 5 minute
GROUP BY DstASPath
ORDER BY c DESC

@vincentbernat
Copy link
Member Author

I have moved the new code to #278 to be able to make a release. Also, in a5d5b14, I have implemented a change in the old/current code (the one with locks) to ensure that after a costly peer removal, we sleep a bit while pausing all writes to let readers catch up. The interval is configurable and is currently 500ms. If previously, you only had issues during flush, maybe this is enough to fix it.

@alarig
Copy link

alarig commented Dec 30, 2022

Hello,

I tried to add BMP today, and it seems that I went into a similar issue. Even without any peer going up or down, after about ten minutes, all flows were dropped with error dropping flow due to queue full (size 100000)

I tried to git merge origin/fix/bmp-lockless to have the rib option but I can’t build from source:

Building akvorado-service
Step 1/18 : FROM nixpkgs/nix-flakes:latest AS build
 ---> 14acf8e95db7
Step 2/18 : WORKDIR /app
 ---> Using cache
 ---> 2f6376057af1
Step 3/18 : COPY flake.nix ./
 ---> Using cache
 ---> 08221a5c907f
Step 4/18 : COPY flake.lock ./
 ---> Using cache
 ---> 03fb111291e9
Step 5/18 : RUN nix develop -c true
 ---> Using cache
 ---> 6748bb1fb146
Step 6/18 : COPY . .
 ---> Using cache
 ---> 5dcde4f24d16
Step 7/18 : RUN mkdir -p /output/store
 ---> Using cache
 ---> b1e245d91373
Step 8/18 : RUN git describe --tags --always --dirty --match=v* > .version && git add -f .version
 ---> Using cache
 ---> 2f5a1067ceff
Step 9/18 : RUN nix build --option sandbox false
 ---> Running in 76052b659112
warning: Git tree '/app' is dirty
this derivation will be built:
  /nix/store/6dq464gx3sbvb7wamiq1fa8xap3px31m-akvorado.drv
building '/nix/store/6dq464gx3sbvb7wamiq1fa8xap3px31m-akvorado.drv'...
 error: builder for '/nix/store/6dq464gx3sbvb7wamiq1fa8xap3px31m-akvorado.drv' failed with exit code 2;
        last 10 log lines:
        > go: downloading modernc.org/mathutil v1.5.0
        > go: downloading modernc.org/memory v1.4.0
        > go: downloading github.com/remyoudompheng/bigfft v0.0.0-20200410134404-eec4a21b6bb0
        > # akvorado/common/helpers/intern
        > common/helpers/intern/intern.go:153:10: undefined: InternPool
        > common/helpers/intern/intern.go:154:13: undefined: InternPool
        > common/helpers/intern/intern.go:155:40: T does not implement Value[T] (missing Equal method)
        > common/helpers/intern/intern.go:156:28: undefined: InternReference
        > common/helpers/intern/intern.go:157:37: undefined: InternReference
        > make: *** [Makefile:34: all] Error 2
        For full logs, run 'nix log /nix/store/6dq464gx3sbvb7wamiq1fa8xap3px31m-akvorado.drv'.

I had some conflicts which I tried to fix by myself, so perhaps the code I’m trying to build doesn’t work at all :p

What I could see on master branch is that while the flows are computed, the load is balanced between all inlet processes. Once it begins to fail, only one process is at 100 % of CPU while the other are idling.

@vincentbernat
Copy link
Member Author

This branch is too far away to be merged. How many routes do you have inside BMP? Profiling would be helpful. I don't have a setup to test myself.

@alarig
Copy link

alarig commented Dec 30, 2022

I had to remove the bmp configuration to have a working setup again so I don’t have the exact count from akvorado, but on my LG I have ~8M IPv4 routes and ~2M IPv6 routes.

@alarig
Copy link

alarig commented Dec 30, 2022

I just re-configured the BMP from one router to have the akvorado metric, akvorado_inlet_bmp_routes_total shows 2.729067e+06 and I have four exporters

@alarig
Copy link

alarig commented Dec 30, 2022

With only one router exporting BMP, akvorado seems to keep up, but the results displayed aren’t making sense because the actual BGP path isn’t known to akvorado.
Screenshot 2022-12-30 at 19-38-16 Akvorado Visualize Stacked areas · OutIfDescription Dst1stAS DstAS DstNetPrefix · OutIfConnectivity transit AND DstAS AS12322 AND NOT Dst1stAS 0 · December 30 2022 6 37 PM · 7 37 PM

@alarig
Copy link

alarig commented Dec 30, 2022

With only one router exporting BMP, akvorado seems to keep up

False alarm, the flows aren’t computed anymore :p It took a bit less than twenty minutes, and the last bmp count is 2.78298e+06
Here is a screenshot displaying what I was saying about only one process working with the others idling
2022-12-30-195909_1676x1011_scrot

@alarig
Copy link

alarig commented Dec 31, 2022

Hi again o/
I was going to close my laptop and I just saw something weird, akvorado/inlet/bmp removed the same peers several time.

akvorado-inlet_1            | {"level":"info","time":"2022-12-30T20:02:48Z","caller":"akvorado/inlet/bmp/events.go:102","module":"akvorado/inlet/bmp","message":"remove peer 217.70.176.70 for exporter 173.246.102.247 (reason: stale)"}
akvorado-inlet_1            | {"level":"info","time":"2022-12-30T20:02:50Z","caller":"akvorado/inlet/bmp/events.go:102","module":"akvorado/inlet/bmp","message":"remove peer 217.70.176.70 for exporter 173.246.102.247 (reason: stale)"}
akvorado-inlet_1            | {"level":"info","time":"2022-12-30T20:02:52Z","caller":"akvorado/inlet/bmp/events.go:102","module":"akvorado/inlet/bmp","message":"remove peer 217.70.176.70 for exporter 173.246.102.247 (reason: stale)"}

@vincentbernat
Copy link
Member Author

That's not unexpected. If it takes a long time, the removal is queued several times, but only one removal is done at a time. As you have only one CPU at 100%, it would be interesting to get a CPU profile.

@vincentbernat
Copy link
Member Author

Also, I should add ability to get AS paths from sFlow/NetFlow. If it's your use, it may be lighter.

@alarig
Copy link

alarig commented Dec 31, 2022

I’m indeed using BMP to populate the Dst{1..3}AS fields

@DaAllexx
Copy link

DaAllexx commented Feb 8, 2024

Is there any progress on this topic? We are currently unable to retrieve BMP feeds of all of our routers as Akvorado starts dropping packets when too many peer removals occur.

If more metrics, logs or CPU profiles are needed I will be happy to assist. Or would you recommend to use BioRIS as an alternative?

@vincentbernat
Copy link
Member Author

No progress. CPU profiles would be helpful.

BioRIS backend is doing a gRPC request for each lookup, it is unlikely to scale.

@DaAllexx
Copy link

Sorry for the late reply. I finally found some time to take a closer look and captured a CPU profile during peer-removal (disabling BMP on one exporter resulting in many peers scheduled for removal). Shortly after triggering the removal, the queues have filled up and all incoming packets were dropped. According to the metrics, the BMP RIB was locked almost the whole time.

For me it seems like most of the time is spent in iterating the whole RIB for each peer, resulting in poor route removal speeds (only ~300 routes removed per second). Would it possibly be an option to iterate the RIB only once and compare to a short list of peers that need to be removed?

akvorado_cpu_peer-removal

flamegraph_peer-removal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants