New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JsonRpcConnection#Send*(): discard messages ASAP once shutting down #9991
base: master
Are you sure you want to change the base?
Conversation
Which - with the current state of the PR - means that the messages are discarded instead of enqueued.
However, |
3ac0ce0
to
c856ab8
Compare
c856ab8
to
351d4b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[2024-02-09 15:26:01 +0100] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'master2'
Error: Broken pipe [system:32 at /opt/homebrew/include/boost/asio/detail/reactive_socket_send_op.hpp:137:37 in function 'do_complete']
Stacktrace:
0# __cxa_throw in /Users/yhabteab/Workspace/icinga2/prefix/lib/icinga2/sbin/icinga2
1# void boost::throw_exception<boost::system::system_error>(boost::system::system_error const&, boost::source_location const&) in /Users/yhabteab/Workspace/icinga2/prefix/lib/icinga2/sbin/icinga2
2# boost::asio::detail::do_throw_error(boost::system::error_code const&, boost::source_location const&) in /Users/yhabteab/Workspace/icinga2/prefix/lib/icinga2/sbin/icinga2
---
[2024-02-09 15:26:01 +0100] information/ApiListener: Endpoint 'master2' disconnected while replaying log.
[2024-02-09 15:26:01 +0100] warning/JsonRpcConnection: API client disconnected for identity 'master2'
[2024-02-09 15:26:19 +0100] warning/ApiListener: Removing API client for endpoint 'master2'. 0 API clients left.
[2024-02-09 15:26:19 +0100] information/ApiListener: Replayed 6694 messages.
[2024-02-09 17:55:57 +0100] information/ApiListener: Sending config updates for endpoint 'master2' in zone 'master'.
[2024-02-09 17:55:57 +0100] information/JsonRpcConnection: Received certificate request for CN 'master2' signed by our CA.
[2024-02-09 17:55:57 +0100] information/ApiListener: Syncing configuration files for global zone 'global-templates' to endpoint 'master2'.
[2024-02-09 17:55:57 +0100] information/JsonRpcConnection: The certificates for CN 'master2' and its root CA are valid and uptodate. Skipping automated renewal.
[2024-02-09 17:55:57 +0100] information/ApiListener: Finished sending config file updates for endpoint 'master2' in zone 'master'.
[2024-02-09 17:55:57 +0100] information/ApiListener: Syncing runtime objects to endpoint 'master2'.
[2024-02-09 17:55:57 +0100] warning/JsonRpcConnection: API client disconnected for identity 'master2'
[2024-02-09 17:55:57 +0100] information/ApiListener: Endpoint 'master2' disconnected while syncing runtime objects.
[2024-02-09 17:55:57 +0100] warning/ApiListener: Removing API client for endpoint 'master2'. 0 API clients left.
[2024-02-09 17:55:57 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'master2' in zone 'master'. |
30be38e
to
67f5d68
Compare
e0bada4
to
b6964c3
Compare
by checking the now atomic #m_ShuttingDown outside of it.
Especially ApiListener#ReplayLog() enqueued lots of messages into JsonRpcConnection#{m_IoStrand,m_OutgoingMessagesQueue} (RAM) even if the connection was shut(ting) down. Now #Disconnect() takes effect ASAP.
5ee4133
to
0191ec3
Compare
0191ec3
to
630f201
Compare
lib/remote/jsonrpcconnection.cpp
Outdated
if (m_ShuttingDown) { | ||
BOOST_THROW_EXCEPTION(std::runtime_error("Cannot send message to already disconnected endpoint '" + GetIdentity() + "'!")); | ||
} | ||
|
||
Ptr keepAlive (this); | ||
|
||
m_IoStrand.post([this, keepAlive, message]() { SendMessageInternal(message); }); | ||
} | ||
|
||
void JsonRpcConnection::SendRawMessage(const String& message) | ||
{ | ||
if (m_ShuttingDown) { | ||
BOOST_THROW_EXCEPTION(std::runtime_error("Cannot send message to already disconnected endpoint '" + GetIdentity() + "'!")); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON-RPC connections don't necessarily have an identity set (anonymous connections, like for initial certificate requests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON-RPC connections do always have an identity (a connection is considered to be anonymous if there can't be found a local endpoint matching a given identity).
630f201
to
3bb5baa
Compare
@yhabteab When should those new log messages you added appear? At least I didn't see them when simply restarting individual nodes in the cluster. Judging just from the message, they should probably have severity warning, I'm wondering though when/how often they might appear (so if this is something that can regularly appear during a restart, the severity probably shouldn't be too high). |
The first one is triggered by the timer handler, which gets scheduled every |
So it's possible that those messages appear during a normal reload depending on the timing, but it's rather unlikely? |
Yes, depending on the number of disconnected endpoints and on how busy |
Actually, the endpoint is dropped off the list as soon as it is disconnected, so there will be no attempts via the |
The implementation should be fine. Main remaining question for me is the log level of the new messages, like should they be downgraded to notice (if that's something that can just legitimately happen during a disconnect) or maybe promoted to warning (if that's actually something problematic). @Al2Klimov: please also review the current state of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I opt for getting the anti-memory-leak ready w/o changing too much for now. It shouldn't depend on the exceptions part.
@@ -163,23 +163,39 @@ ConnectionRole JsonRpcConnection::GetRole() const | |||
|
|||
void JsonRpcConnection::SendMessage(const Dictionary::Ptr& message) | |||
{ | |||
if (m_ShuttingDown) { | |||
BOOST_THROW_EXCEPTION(std::runtime_error("Cannot send message to already disconnected API client '" + GetIdentity() + "'!")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big friend of this tbh. All callers catch exceptions – ex. SyncSendMessage() which I didn't finish checking, yet. But so far I see that SyncSendMessage() is called by RelayMessageOne() which is called by SyncRelayMessage(). The latter misses PersistMessage() if RelayMessageOne() throws. Not good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But so far I see that SyncSendMessage() is called by RelayMessageOne() which is called by SyncRelayMessage(). The latter misses PersistMessage() if RelayMessageOne() throws.
This is never going happen! Nothing has changed in the previous synchronisation behaviour. The only thing that has changed is that we have added an error handling here.
icinga2/lib/remote/apilistener.cpp
Lines 1203 to 1208 in 3bb5baa
try { | |
client->SendMessage(message); | |
} catch (const std::runtime_error& ex) { | |
Log(LogInformation, "ApiListener") | |
<< "Error while sending message to endpoint '" << endpoint->GetName() << "': " << DiagnosticInformation(ex, false); | |
} |
Especially ApiListener#ReplayLog() enqueued lots of messages into
JsonRpcConnection#{m_IoStrand,m_OutgoingMessagesQueue} (RAM) even if
the connection was shut(ting) down. Now #Disconnect() takes effect ASAP.
fixes #9985