New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow performance and excessive SQL queries #156
Comments
I understand the issue. Unfortunately this is not an issue of the plugin it-self, more of netbox and django and the underlying django rest framework. The only thing this plugin makes different from the standard netbox api is to remove the paging of the API and use a new serializer. Can you check the API performance for a direct caparison? A The serializer is another thing we may could improve. It has a lot of lookups to related objects. Not sure how much of the request cycle is spent on these lookups. This should also be tested if it has a relevant part of the request cycle. As a mitigation you may add filters to your API query or try to shard them to multiple scrape jobs where each job has another subset filtered via URL. |
I don't understand SQL well enough to know why these queries are happening, but it does seem like they are caused by this plugin, because it doesn't happen with the endpoint you suggested. I've moved my testing to a local Docker instance to compare these 2 endpoints in a more repeatable way. I'm happy to share the scripts if helps you reproduce. Here are the results with 1000 devices:
I'm hoping to scale this up to around 10K devices, which will all be scraped by a single instance of blackbox exporter. I'm not sure yet if that's feasible or if it will need to be split up. I'm testing that system with this plugin and netbox all at once. |
Good catch already. Might be the lookups to related objects. I have to look into djangos orm framework on how queries are implemented even if there is nothing in the list. One option to reduce this could be a config on the plugin to disable some lookups, even before the |
This performance issue has led me to develop a different approach, which I’d like to share: |
While this plugin works great with a small number of devices, it becomes very slow at scale. For example, with 1158 devices on my test, the following query takes 23 seconds.
I did a little investigation and found that 99% of that time is spent querying the database. The HTTP download is only 10ms. Postgres log shows that this one API call causes 6157
SELECT
statements, about 2/3 of which return 0 rows. You can see this log here: prometheus-sd-query.psql.zipTo easily replicate this issue, you can launch a fresh instance of netbox-docker and run the following code in
nbshell
. This way it's a bit faster, taking 13 seconds, perhaps because the database is local.The text was updated successfully, but these errors were encountered: