New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix broken runtime config sync #10013
base: master
Are you sure you want to change the base?
Conversation
c93869d
to
ceab5a6
Compare
This looks like it significantly changes what was only added in #9980. If it stays that way, it might make sense to revert #9980 as a whole and make a complete new attempt. That would make the backport of this cleaner (as you wouldn't have to backport a commit and its reversion at once). What's the reason for moving ceab5a6 seems to add a second independent |
Well, my first thought was that we might be using it even more than we thought in #9980, and that it might introduce a circular dependency. However, if you are strongly against it, then of course I can undo it.
Feels strange to use the same mutex for the cluster config sync and the API request. Otherwise, there's nothing against it, I guess. |
Plus |
Plus |
I mainly asked the question because if it is moved, it's more code that was added in #9980 that wouldn't stay the way it was added there. So if it's moved, that would be more reason for me to revert #9980 completely and do a clean new PR. (I haven't looked at the PR in enough detail to assess whether it should be moved.)
Why does it feel strange? Think of what this mutex is supposed to protect: the consistency of the configuration of a specific object. So anything that attempts to modify the object at runtime should use the same mutex. Imagine someone would send a
Does the locking in |
IMHO yes, it does! It acquires the lock before it does
That would be too late for the cluster config sync part. |
2b40c6f
to
a259a68
Compare
What about that part? Currently, the PR adds different locking based on whether an object is created via HTTP or JSON-RPC. Both would create the same object though, so both should use the same locking. Wouldn't you end up with race conditions between HTTP and JSON-RPC otherwise? |
I don't think so! In case of a JSON-RPC connection, the object is locked twice! There is only one case in which a race condition may occur. This's when the API request enters icinga2/lib/remote/apilistener-configsync.cpp Lines 117 to 133 in a259a68
|
5bd94ce
to
4a43f0e
Compare
I've tested all the use cases of the config sync and it is working perfectly now. However, we still need to address the use of |
Some synchronization has to happen there for sure. Am I missing something or isn't even using At first glance, |
There is no |
4a43f0e
to
4e265c1
Compare
|
||
static std::mutex m_Mutex; | ||
static std::condition_variable m_CV; | ||
static std::map<Type*, std::set<String>> m_LockedObjectNames; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider unordered_{map,set}.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exact types won't matter that much here as the overall size of those containers will stay rather small anyways.
For completeness, as I already talked with @yhabteab about it: I was thinking about maybe making this a std::set<std::pair<Type*, String>>
instead (or std::unordered_set
for that matter, there won't be much of a difference), as that would reduce the map + set operations down to just the set operations. But this seems to bring no huge improvements for simplifying the code (which would have been my motivation here, not performance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine in general to me. One commit message refers to the outdated class name ObjectNameMutex
though. And if you're touching the PR anyways, you might do a slight improvement to one of the comments (see my inline comment).
4e265c1
to
40e71dd
Compare
ref/NC/804054 |
This PR kind of reverts 008fcd1 but it's still using
AtomicFile
to persist the configs to disk. Additionally,ConfigObjectUtility::CreateObject()
utilises theObjectNameMutex
class to prevent an object of a given type from being created concurrently, even if it is hard to trigger that data race via API requests.Tests
Start two Icinga endpoints that are connected to each other and must accept config updates from each other (
accept_config = true
).Before
It silently aborts the synchronisation attempts.
After
Works as it should!
fixes #10012