New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add circular loop detection to MaxDepth #15579
base: master
Are you sure you want to change the base?
Conversation
We could skip calling the device observer code in the cases we don't want it. I believe when saving you add |
I'm not sure how that can help me. The observer code is there to cause child devices to update if a parent's level changes. This needs to trigger any time a parent's depth changes. The only time save is being called is in updateMaxDepth, which doesn't know if it's safe to trigger the observer or not. The problem here is how to keep track of the devices that have already been triggered so you know not to trigger it again. The issue is that in an A->B->C->D->A dependency, you need to traverse the tree to know it's not safe to trigger. Even if you do keep track of devices that have been triggered, you will potentially have an issue of the levels being constantly increased if you use the existing method of parent depth + 1. The only way I could think of around this is to traverse the path for each device and use a fixed formula. The exception is where devices only have a single parent (which I assume is most configurations), which I have put in an exception for. |
For the record, I originally thought the update to daily.php was all I would need, but in the end it's the change to Devices.php that actually fixed my issue. The fix to daily.php would resolve an issue if an end user accidentally set up something like A->B->C->A, which is why I left it in there. |
Yeah, the daily.php changes are obviously correct in my opinion. The other changes requires a lot more thought and I'm not sure about it. The observer is for when the parent is changed (from the webui). If we disable it when updating in daily, it might prevent loops. I'm wondering why the daily process is necessary... when can depth be affected? It seems like only when dependencies are changed in the webui. |
The observer also causes loops when saving from the Web UI. If you have a dependency of A<->B, A updates its max depth to B+1, then B updates to A+1, triggering A again. The same will happen if you accidentally create A->B->C->A (or any length loop). |
A couple of thoughts:
|
The
I have no opinion on 1 and 2 (not sure why they were added in the first place, or under what conditions they are needed) |
So, the purpose of the max depth field is to simplify polling (only used by fast ping at this time) in hierarchical order. If it is incorrect, it is not catastrophic. |
By the way, the reason I am resistant to your current implementation is that it is a lot of SQL queries (possibly n^2). |
I'm not sure if this change would make too many more SQL queries for a simple network (one path to the root node) thanks to the shortcut. The existing LibreNMS code already times out 100% of the time if you have redundant dependencies because the observer code keeps triggering each end of the link until the request times out. On our system this happens after thousands of iterations (I can't remember if it was around 2,000 or 20,000). I'm happy if you want to implement a better solution, but based on your response I don't think I made it clear that the current code fails 100% of the time if you end up creating circular dependencies. |
When using a non-recursive version of BFS to update possible dependencies, there would not be circular loops. Would this really cause performance issues if the dependency chain was of manageable size? Also, could we somehow save each level of possible dependencies? About the SQL, could the list of dependencies be saved in a way that all of the data could be fetched efficiently and processed in PHP? |
Unclear, can we merge the daily.php change so it is not held up by the other discussion? |
I'm in favour of merging the daily.php, as well as removing the device observer because both of these trigger an infinite number of SQL queries. I had another thought about the observer code - could we check the last modified timestamp and only update if the last update was not recent. This would require timestamps on the device model, and also the choice of an arbitrary last update interval where the observer code won't update the max_depth. |
The underlying bug causes issues for me when doing updates. Can this be merged if I remove the observer and apply the new daily.php code? |
I think the daily.php update still triggers the observer function, so you will need to remove the observer function if you only want to apply the daily.php fix. |
@murrant - The underlying bug is causing me issues with applying updates (I need to reverse this patch in my environment, then the update process gets stuck in an endless loop). I'm happy to re-work as needed, but don't want to spend the effort unless there's a clear indication on the proposed solution being merged. At the moment this would be either merge as-is, or remove the observer code and have the daily.php fix up the levels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the bit I wasn't sure about so we could merge the part to prevent loops during dialy.sh
This means the web interface will still go into an endless loop with infinite SQL queries. |
The observer code needs to be removed if the device model doesn't have a fix applied. |
In fact I'm sure the observer code will still cause the endless loop in the daily script because I only had a look at the device model because the change tot he daily script didn't fix my issue. |
How does this look for getting the ping job to use the device dependencies instead of max_depth? |
…d of using max_depth
…d of using max_depth
…s are always trigered for alerts
… parent devices before children
If device dependencies are created with circular dependencies, the code currently runs in a tight loop due to the MaxDepth code. There is also a daily task to update the max_depth column, which runs forever until the server runs out of RAM.
This code adds checks in both of the loops above to make sure we don't try to process the same device twice.
Circular dependencies could be A->B->C->A, which is always an error. The code in this request will avoid these loops, but could produce undefined results.
The other type of circular dependency, which is what we have, is where there are redundant links. e.g. A<-B<->C<->D<->B->A. As long as B->A is a one-way dependency the code will produce consistent results. We have these configured in our network, which is why we were having this issue.
It does add more database requests when calculating the depth on devices, and the number of extra requests are based on the number of parents a device needs to traverse. I couldn't see another way of doing this given that updating one device triggers all its children to re-evaluate their depth through the Device Observer code.
DO NOT DELETE THE UNDERLYING TEXT
Please note
Testers
If you would like to test this pull request then please run:
./scripts/github-apply <pr_id>
, i.e./scripts/github-apply 5926
After you are done testing, you can remove the changes with
./scripts/github-remove
. If there are schema changes, you can ask on discord how to revert.