Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add circular loop detection to MaxDepth #15579

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

eskyuu
Copy link
Contributor

@eskyuu eskyuu commented Nov 15, 2023

If device dependencies are created with circular dependencies, the code currently runs in a tight loop due to the MaxDepth code. There is also a daily task to update the max_depth column, which runs forever until the server runs out of RAM.

This code adds checks in both of the loops above to make sure we don't try to process the same device twice.

Circular dependencies could be A->B->C->A, which is always an error. The code in this request will avoid these loops, but could produce undefined results.

The other type of circular dependency, which is what we have, is where there are redundant links. e.g. A<-B<->C<->D<->B->A.  As long as B->A is a one-way dependency the code will produce consistent results. We have these configured in our network, which is why we were having this issue.

It does add more database requests when calculating the depth on devices, and the number of extra requests are based on the number of parents a device needs to traverse. I couldn't see another way of doing this given that updating one device triggers all its children to re-evaluate their depth through the Device Observer code.
 
DO NOT DELETE THE UNDERLYING TEXT

Please note

Please read this information carefully. You can run ./lnms dev:check to check your code before submitting.

  • Have you followed our code guidelines?
  • If my Pull Request does some changes/fixes/enhancements in the WebUI, I have inserted a screenshot of it.
  • If my Pull Request makes discovery/polling/yaml changes, I have added/updated test data.

Testers

If you would like to test this pull request then please run: ./scripts/github-apply <pr_id>, i.e ./scripts/github-apply 5926
After you are done testing, you can remove the changes with ./scripts/github-remove. If there are schema changes, you can ask on discord how to revert.

@murrant
Copy link
Member

murrant commented Nov 15, 2023

We could skip calling the device observer code in the cases we don't want it.

I believe when saving you add withoutEvents

@eskyuu
Copy link
Contributor Author

eskyuu commented Nov 15, 2023

I'm not sure how that can help me. The observer code is there to cause child devices to update if a parent's level changes. This needs to trigger any time a parent's depth changes. The only time save is being called is in updateMaxDepth, which doesn't know if it's safe to trigger the observer or not. The problem here is how to keep track of the devices that have already been triggered so you know not to trigger it again. The issue is that in an A->B->C->D->A dependency, you need to traverse the tree to know it's not safe to trigger.

Even if you do keep track of devices that have been triggered, you will potentially have an issue of the levels being constantly increased if you use the existing method of parent depth + 1. The only way I could think of around this is to traverse the path for each device and use a fixed formula. The exception is where devices only have a single parent (which I assume is most configurations), which I have put in an exception for.

@eskyuu
Copy link
Contributor Author

eskyuu commented Nov 15, 2023

For the record, I originally thought the update to daily.php was all I would need, but in the end it's the change to Devices.php that actually fixed my issue. The fix to daily.php would resolve an issue if an end user accidentally set up something like A->B->C->A, which is why I left it in there.

@murrant
Copy link
Member

murrant commented Nov 16, 2023

Yeah, the daily.php changes are obviously correct in my opinion.

The other changes requires a lot more thought and I'm not sure about it.

The observer is for when the parent is changed (from the webui). If we disable it when updating in daily, it might prevent loops.

I'm wondering why the daily process is necessary... when can depth be affected? It seems like only when dependencies are changed in the webui.

@eskyuu
Copy link
Contributor Author

eskyuu commented Nov 16, 2023

Yeah, the daily.php changes are obviously correct in my opinion.

The other changes requires a lot more thought and I'm not sure about it.

The observer is for when the parent is changed (from the webui). If we disable it when updating in daily, it might prevent loops.

I'm wondering why the daily process is necessary... when can depth be affected? It seems like only when dependencies are changed in the webui.

The observer also causes loops when saving from the Web UI. If you have a dependency of A<->B, A updates its max depth to B+1, then B updates to A+1, triggering A again. The same will happen if you accidentally create A->B->C->A (or any length loop).

@murrant
Copy link
Member

murrant commented Nov 18, 2023

A couple of thoughts:

  1. remove the daily process, I can't see how max depth would change outside of manual edit of device dependencies.
  2. Possibly remove the device observer update method (or change it)
  3. Make the manual update call code similar to the daily process instead so we can hold state to avoid loops.

@eskyuu
Copy link
Contributor Author

eskyuu commented Nov 23, 2023

The

A couple of thoughts:

  1. remove the daily process, I can't see how max depth would change outside of manual edit of device dependencies.
  2. Possibly remove the device observer update method (or change it)
  3. Make the manual update call code similar to the daily process instead so we can hold state to avoid loops.

I have no opinion on 1 and 2 (not sure why they were added in the first place, or under what conditions they are needed)
I believe that this pull request already implements 3. It could possibly be made simpler if you stopped at the first device found with no parents, but I kept going until no parents have been found so we get the max depth as per the field name.

@murrant
Copy link
Member

murrant commented Nov 27, 2023

So, the purpose of the max depth field is to simplify polling (only used by fast ping at this time) in hierarchical order. If it is incorrect, it is not catastrophic.

@murrant
Copy link
Member

murrant commented Nov 27, 2023

By the way, the reason I am resistant to your current implementation is that it is a lot of SQL queries (possibly n^2).
I think it could be simplified and made more performant. This current code would probably hit the http timeout with not too many devices.

@eskyuu
Copy link
Contributor Author

eskyuu commented Nov 27, 2023

I'm not sure if this change would make too many more SQL queries for a simple network (one path to the root node) thanks to the shortcut.

The existing LibreNMS code already times out 100% of the time if you have redundant dependencies because the observer code keeps triggering each end of the link until the request times out. On our system this happens after thousands of iterations (I can't remember if it was around 2,000 or 20,000). I'm happy if you want to implement a better solution, but based on your response I don't think I made it clear that the current code fails 100% of the time if you end up creating circular dependencies.

@ottorei
Copy link
Contributor

ottorei commented Nov 28, 2023

When using a non-recursive version of BFS to update possible dependencies, there would not be circular loops. Would this really cause performance issues if the dependency chain was of manageable size? Also, could we somehow save each level of possible dependencies? About the SQL, could the list of dependencies be saved in a way that all of the data could be fetched efficiently and processed in PHP?

@murrant
Copy link
Member

murrant commented Jan 16, 2024

Unclear, can we merge the daily.php change so it is not held up by the other discussion?

@eskyuu
Copy link
Contributor Author

eskyuu commented Jan 18, 2024

I'm in favour of merging the daily.php, as well as removing the device observer because both of these trigger an infinite number of SQL queries.

I had another thought about the observer code - could we check the last modified timestamp and only update if the last update was not recent. This would require timestamps on the device model, and also the choice of an arbitrary last update interval where the observer code won't update the max_depth.

@eskyuu
Copy link
Contributor Author

eskyuu commented Feb 29, 2024

The underlying bug causes issues for me when doing updates. Can this be merged if I remove the observer and apply the new daily.php code?

@eskyuu
Copy link
Contributor Author

eskyuu commented Mar 20, 2024

Unclear, can we merge the daily.php change so it is not held up by the other discussion?

I think the daily.php update still triggers the observer function, so you will need to remove the observer function if you only want to apply the daily.php fix.

@eskyuu
Copy link
Contributor Author

eskyuu commented May 8, 2024

@murrant - The underlying bug is causing me issues with applying updates (I need to reverse this patch in my environment, then the update process gets stuck in an endless loop). I'm happy to re-work as needed, but don't want to spend the effort unless there's a clear indication on the proposed solution being merged. At the moment this would be either merge as-is, or remove the observer code and have the daily.php fix up the levels.

murrant
murrant previously approved these changes May 8, 2024
Copy link
Member

@murrant murrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the bit I wasn't sure about so we could merge the part to prevent loops during dialy.sh

@eskyuu
Copy link
Contributor Author

eskyuu commented May 8, 2024

This means the web interface will still go into an endless loop with infinite SQL queries.

@eskyuu
Copy link
Contributor Author

eskyuu commented May 8, 2024

The observer code needs to be removed if the device model doesn't have a fix applied.

@eskyuu
Copy link
Contributor Author

eskyuu commented May 8, 2024

In fact I'm sure the observer code will still cause the endless loop in the daily script because I only had a look at the device model because the change tot he daily script didn't fix my issue.

@eskyuu
Copy link
Contributor Author

eskyuu commented May 9, 2024

How does this look for getting the ping job to use the device dependencies instead of max_depth?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants