-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize BMP module, notably when removing a peer #253
Comments
|
When the RIB is locked for too long, inlet is hung. Try to ensure give a bit of time for the inlet to move forward between two flush of the RIB. There are various knobs not documnted yet until we get better defaults: - `inlet.bmp.peer-removal-max-time`: how long to keep the lock - `inlet.bmp.peer-removal-sleep-interval`: how long to sleep between two runs if we were unable to flush the whole peer - `inlet.bmp.peer-removal-max-queue`: maximum number of flush requests - `inlet.bmp.peer-removal-min-routes`: minimum number of routes to flush before yielding Nay fix #253
When the RIB is locked for too long, inlet is hung. Try to ensure give a bit of time for the inlet to move forward between two flush of the RIB. There are various knobs not documnted yet until we get better defaults: - `inlet.bmp.peer-removal-max-time`: how long to keep the lock - `inlet.bmp.peer-removal-sleep-interval`: how long to sleep between two runs if we were unable to flush the whole peer - `inlet.bmp.peer-removal-max-queue`: maximum number of flush requests - `inlet.bmp.peer-removal-min-routes`: minimum number of routes to flush before yielding May fix #253
When the RIB is locked for too long, inlet is hung. Try to ensure give a bit of time for the inlet to move forward between two flush of the RIB. There are various knobs not documnted yet until we get better defaults: - `inlet.bmp.peer-removal-max-time`: how long to keep the lock - `inlet.bmp.peer-removal-sleep-interval`: how long to sleep between two runs if we were unable to flush the whole peer - `inlet.bmp.peer-removal-max-queue`: maximum number of flush requests - `inlet.bmp.peer-removal-min-routes`: minimum number of routes to flush before yielding May fix #253
When the RIB is locked for too long, inlet is hung. Try to ensure give a bit of time for the inlet to move forward between two flush of the RIB. There are various knobs not documnted yet until we get better defaults: - `inlet.bmp.peer-removal-max-time`: how long to keep the lock - `inlet.bmp.peer-removal-sleep-interval`: how long to sleep between two runs if we were unable to flush the whole peer - `inlet.bmp.peer-removal-max-queue`: maximum number of flush requests - `inlet.bmp.peer-removal-min-routes`: minimum number of routes to flush before yielding May fix #253
@kostik2022 can you test if it works well enough for you? You can change the image to |
Yes, we switched to
Looks like drops occurs at every new BMP connection added
Kafka queue-size is 50000, internal flow queue-size is 300000. |
It's odd that you don't have |
We use main branch (just checked again).
|
Unfortunately, I think the second profile was done after the internal RIB has been updated, so it does not show any hot path during RIB update. Note that when a BGP peer goes down, it is removed immediately by the RIB. When a BMP peer goes down, all the associated peers are kept for 5 minutes by default. This is Unrelated, but there is also a lot of CPU spent for classification. It should be easy to optimize this with a cache. How big are your classification rules? |
The current implementation tries to minimize memory usage and heavily relies on locking. I think I need to rewrite it to have a better compromise. Can you run the profile on memory instead of cpu? Then, in the meantime, you can either increase 10× the internal pipeline or reduce the number of BMP sessions. |
I have replaced the LRU cache by a TTL-based cache in 2e30798. I think this should help your use case (I suppose your cache was set too small, with the new commit, there is no size to be configured anymore). |
Vincent, I compiled latest version (with commit about cache). BMP cache was already set to 10h. Still have drops. Metrics are:
|
Vincent, thank you for your efforts. We set internal queue (Kafka) to 1M (from 50k). Looks like that hepls. Will observe during weekend. |
Thanks for the debug!
If you can, I'll still welcome a CPU profile as well during a BMP session shutdown. Set |
So, I have merged a partial redesign for the RIB moving away from locks. It would be great if you can test it with |
Thanks. So, previously, 30M routes, you were using 2.6G. Now, 60M routes, you are using 3.5G. So, this is a bit more memory efficient, but I was hoping for something more like 50%, so that the additional copy is "free". Also, you have two additional copies, so, you are using 10G, so the double of what was used previously. I was expecting a bit about that. As the copy time is really slow, I should be able to only use 7G in your use case by discarding the old copy before copying. I'll try to figure out how to reuse the memory of the old copy for the new one. |
To be able to optimize memory usage, I need to reuse the old unused tree instead of waiting for the GC to remove it from memory. While reusing arrays is easy, maps are more difficult. There is a Go proposal that would help for this. |
Interesting!... |
I have enhanced the |
Vincent, working last day at lastet codebase, using perfomance mode. Drops are gone. Yes, it consumes a lot of RAM, but it's ok for now.
|
I don't have a clue about this. But since we are using a lot of memory, is there enough memory for ClickHouse? In Kafka-UI, could you check if ClickHouse is late to ingest from Kafka? Maybe it only happens for some partitions? Did you add more partitions? I know ClickHouse can take sometime to read the metadata again. |
We study this alittle. Looks like in |
I don't see anything obvious that would trigger that. I'll look more in the following days. |
I have pushed another commit that should provide a metric with the lag between the live RIB and the read-only RIB. I doubt there is a lag, but maybe that's the case. If you can provide again a dump of the metrics related to BMP, it would be helpful. You feel confident that the errors are only in performance mode, not in the default mode? The difference between the two modes are small, so if that's the case, it will be easier for me to find. |
Give me 15-20 mins, I'll recompile and make dump. |
Unrelated, but in default mode, do you think there are more or less drops than previously (before the introduction of modes)? I think that during updates, it should be better now, but when there are no updates, the performance has been degraded (something that could be fixed at some point). |
If you still have the profiler enabled, you can also query |
Here are metrics in perfomance mode. The issue still exists
Profiler at /debug/pprof/goroutine?debug=2 gives nothing ( |
Metrics in default mode:
Observe a delay between data gathered and processed (show at Visualise). There was no delay in perfomance mode. |
Dear Vincet, our network team said that probably it may be caused by some kind network issues. So lets stop for now searching for data loss :) sorry for the disturbance very much Anyway, above are mitrics for both modes. |
OK!
|
Here's debug output. To large to include inline. |
Well, we tested alot, in perfomance mode only. Something wrong with BMP info, even if we turn on BMP at only one exporter. Comparing to SNMP, summary is ok (no drops/flow loss), but for ex., as-path data become completly unusable after short time. Zero traffic - moments when we restarted Akvorado stack |
OK. I'll have a look. |
I don't see why it would work in memory mode and not in performance mode. Do you see the same problem with memory mode (if you ignore the fact that packets are dropped, so you don't get a reliable view of the traffic)? I am unsure if your previous answers were definitive on this. Your graph may have a reasonable explanation. When not all exporters have not synced with BMP, Akvorado tries to find the best possible match (in fact the first match, ignoring the next hop). This may mean the AS path is less diverse than it should be. By logging inside Clickhouse ( In ClickHouse, you can do requests like: SELECT COUNT(*) AS c, DstASPath FROM flows
WHERE TimeReceived > now() - interval 5 minute
GROUP BY DstASPath
ORDER BY c DESC |
I have moved the new code to #278 to be able to make a release. Also, in a5d5b14, I have implemented a change in the old/current code (the one with locks) to ensure that after a costly peer removal, we sleep a bit while pausing all writes to let readers catch up. The interval is configurable and is currently 500ms. If previously, you only had issues during flush, maybe this is enough to fix it. |
Hello, I tried to add BMP today, and it seems that I went into a similar issue. Even without any peer going up or down, after about ten minutes, all flows were dropped with error I tried to
I had some conflicts which I tried to fix by myself, so perhaps the code I’m trying to build doesn’t work at all :p What I could see on master branch is that while the flows are computed, the load is balanced between all |
This branch is too far away to be merged. How many routes do you have inside BMP? Profiling would be helpful. I don't have a setup to test myself. |
I had to remove the bmp configuration to have a working setup again so I don’t have the exact count from akvorado, but on my LG I have ~8M IPv4 routes and ~2M IPv6 routes. |
I just re-configured the BMP from one router to have the akvorado metric, |
Hi again o/
|
That's not unexpected. If it takes a long time, the removal is queued several times, but only one removal is done at a time. As you have only one CPU at 100%, it would be interesting to get a CPU profile. |
Also, I should add ability to get AS paths from sFlow/NetFlow. If it's your use, it may be lighter. |
I’m indeed using BMP to populate the Dst{1..3}AS fields |
Is there any progress on this topic? We are currently unable to retrieve BMP feeds of all of our routers as Akvorado starts dropping packets when too many peer removals occur. If more metrics, logs or CPU profiles are needed I will be happy to assist. Or would you recommend to use BioRIS as an alternative? |
No progress. CPU profiles would be helpful. BioRIS backend is doing a gRPC request for each lookup, it is unlikely to scale. |
Sorry for the late reply. I finally found some time to take a closer look and captured a CPU profile during peer-removal (disabling BMP on one exporter resulting in many peers scheduled for removal). Shortly after triggering the removal, the queues have filled up and all incoming packets were dropped. According to the metrics, the BMP RIB was locked almost the whole time. For me it seems like most of the time is spent in iterating the whole RIB for each peer, resulting in poor route removal speeds (only ~300 routes removed per second). Would it possibly be an option to iterate the RIB only once and compare to a short list of peers that need to be removed? |
The BMP module can prevent the inlet from working when removing a peer. Several solutions:
See #241
The text was updated successfully, but these errors were encountered: