Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JsonRpcConnection#Send*(): discard messages ASAP once shutting down #9991

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

Al2Klimov
Copy link
Member

Especially ApiListener#ReplayLog() enqueued lots of messages into
JsonRpcConnection#{m_IoStrand,m_OutgoingMessagesQueue} (RAM) even if
the connection was shut(ting) down. Now #Disconnect() takes effect ASAP.

fixes #9985

@cla-bot cla-bot bot added the cla/signed label Feb 7, 2024
@icinga-probot icinga-probot bot added area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working ref/IP labels Feb 7, 2024
@julianbrost
Copy link
Contributor

Now #Disconnect() takes effect ASAP.

Which - with the current state of the PR - means that the messages are discarded instead of enqueued.

Especially ApiListener#ReplayLog() enqueued lots of messages into JsonRpcConnection#{m_IoStrand,m_OutgoingMessagesQueue} (RAM) even if the connection was shut(ting) down.

However, ApiListener#ReplayLog() will still generate messages for a dead connection that will be discarded anyways as it has no way to tell that the connection is dead. How much effort would it take to improve this?

lib/remote/apilistener.cpp Outdated Show resolved Hide resolved
lib/remote/apilistener.cpp Outdated Show resolved Hide resolved
yhabteab
yhabteab previously approved these changes Feb 9, 2024
Copy link
Member

@yhabteab yhabteab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2024-02-09 15:26:01 +0100] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'master2'
Error: Broken pipe [system:32 at /opt/homebrew/include/boost/asio/detail/reactive_socket_send_op.hpp:137:37 in function 'do_complete']

Stacktrace:
 0# __cxa_throw in /Users/yhabteab/Workspace/icinga2/prefix/lib/icinga2/sbin/icinga2
 1# void boost::throw_exception<boost::system::system_error>(boost::system::system_error const&, boost::source_location const&) in /Users/yhabteab/Workspace/icinga2/prefix/lib/icinga2/sbin/icinga2
 2# boost::asio::detail::do_throw_error(boost::system::error_code const&, boost::source_location const&) in /Users/yhabteab/Workspace/icinga2/prefix/lib/icinga2/sbin/icinga2
---

[2024-02-09 15:26:01 +0100] information/ApiListener: Endpoint 'master2' disconnected while replaying log.
[2024-02-09 15:26:01 +0100] warning/JsonRpcConnection: API client disconnected for identity 'master2'
[2024-02-09 15:26:19 +0100] warning/ApiListener: Removing API client for endpoint 'master2'. 0 API clients left.
[2024-02-09 15:26:19 +0100] information/ApiListener: Replayed 6694 messages.

@yhabteab
Copy link
Member

yhabteab commented Feb 9, 2024

[2024-02-09 17:55:57 +0100] information/ApiListener: Sending config updates for endpoint 'master2' in zone 'master'.
[2024-02-09 17:55:57 +0100] information/JsonRpcConnection: Received certificate request for CN 'master2' signed by our CA.
[2024-02-09 17:55:57 +0100] information/ApiListener: Syncing configuration files for global zone 'global-templates' to endpoint 'master2'.
[2024-02-09 17:55:57 +0100] information/JsonRpcConnection: The certificates for CN 'master2' and its root CA are valid and uptodate. Skipping automated renewal.
[2024-02-09 17:55:57 +0100] information/ApiListener: Finished sending config file updates for endpoint 'master2' in zone 'master'.
[2024-02-09 17:55:57 +0100] information/ApiListener: Syncing runtime objects to endpoint 'master2'.
[2024-02-09 17:55:57 +0100] warning/JsonRpcConnection: API client disconnected for identity 'master2'
[2024-02-09 17:55:57 +0100] information/ApiListener: Endpoint 'master2' disconnected while syncing runtime objects.
[2024-02-09 17:55:57 +0100] warning/ApiListener: Removing API client for endpoint 'master2'. 0 API clients left.
[2024-02-09 17:55:57 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'master2' in zone 'master'.

lib/remote/apilistener.cpp Outdated Show resolved Hide resolved
lib/remote/apilistener.cpp Outdated Show resolved Hide resolved
@yhabteab yhabteab force-pushed the JsonRpcConnection-9985 branch 2 times, most recently from e0bada4 to b6964c3 Compare February 13, 2024 07:59
@yhabteab yhabteab self-assigned this Feb 21, 2024
lib/remote/apilistener-configsync.cpp Outdated Show resolved Hide resolved
lib/remote/apilistener.cpp Outdated Show resolved Hide resolved
Comment on lines 166 to 179
if (m_ShuttingDown) {
BOOST_THROW_EXCEPTION(std::runtime_error("Cannot send message to already disconnected endpoint '" + GetIdentity() + "'!"));
}

Ptr keepAlive (this);

m_IoStrand.post([this, keepAlive, message]() { SendMessageInternal(message); });
}

void JsonRpcConnection::SendRawMessage(const String& message)
{
if (m_ShuttingDown) {
BOOST_THROW_EXCEPTION(std::runtime_error("Cannot send message to already disconnected endpoint '" + GetIdentity() + "'!"));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON-RPC connections don't necessarily have an identity set (anonymous connections, like for initial certificate requests).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON-RPC connections do always have an identity (a connection is considered to be anonymous if there can't be found a local endpoint matching a given identity).

@julianbrost
Copy link
Contributor

@yhabteab When should those new log messages you added appear? At least I didn't see them when simply restarting individual nodes in the cluster. Judging just from the message, they should probably have severity warning, I'm wondering though when/how often they might appear (so if this is something that can regularly appear during a restart, the severity probably shouldn't be too high).

@yhabteab
Copy link
Member

@yhabteab When should those new log messages you added appear?

The first one is triggered by the timer handler, which gets scheduled every 5 seconds and is trying to send the SetLogPosition message. The second one actually applies to every communication attempt initiated with ApiListener::Relay* or directly with ApiListener::SyncSendMessage(), which can be quite a lot but also quite hard to trigger them manually.

@julianbrost
Copy link
Contributor

So it's possible that those messages appear during a normal reload depending on the timing, but it's rather unlikely?

@yhabteab
Copy link
Member

So it's possible that those messages appear during a normal reload depending on the timing, but it's rather unlikely?

Yes, depending on the number of disconnected endpoints and on how busy m_RelayQueue was, it could result into a bunch of warnings.

@yhabteab
Copy link
Member

So it's possible that those messages appear during a normal reload depending on the timing, but it's rather unlikely?

Yes, depending on the number of disconnected endpoints and on how busy m_RelayQueue was, it could result into a bunch of warnings.

Actually, the endpoint is dropped off the list as soon as it is disconnected, so there will be no attempts via the ApiListener::Relay*() methods after that.

@julianbrost
Copy link
Contributor

The implementation should be fine. Main remaining question for me is the log level of the new messages, like should they be downgraded to notice (if that's something that can just legitimately happen during a disconnect) or maybe promoted to warning (if that's actually something problematic).

@Al2Klimov: please also review the current state of this PR.

Copy link
Member Author

@Al2Klimov Al2Klimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I opt for getting the anti-memory-leak ready w/o changing too much for now. It shouldn't depend on the exceptions part.

@@ -163,23 +163,39 @@ ConnectionRole JsonRpcConnection::GetRole() const

void JsonRpcConnection::SendMessage(const Dictionary::Ptr& message)
{
if (m_ShuttingDown) {
BOOST_THROW_EXCEPTION(std::runtime_error("Cannot send message to already disconnected API client '" + GetIdentity() + "'!"));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big friend of this tbh. All callers catch exceptions – ex. SyncSendMessage() which I didn't finish checking, yet. But so far I see that SyncSendMessage() is called by RelayMessageOne() which is called by SyncRelayMessage(). The latter misses PersistMessage() if RelayMessageOne() throws. Not good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But so far I see that SyncSendMessage() is called by RelayMessageOne() which is called by SyncRelayMessage(). The latter misses PersistMessage() if RelayMessageOne() throws.

This is never going happen! Nothing has changed in the previous synchronisation behaviour. The only thing that has changed is that we have added an error handling here.

try {
client->SendMessage(message);
} catch (const std::runtime_error& ex) {
Log(LogInformation, "ApiListener")
<< "Error while sending message to endpoint '" << endpoint->GetName() << "': " << DiagnosticInformation(ex, false);
}

@Al2Klimov Al2Klimov added this to the 2.15.0 milestone Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working cla/signed ref/IP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JsonRpcConnection#Send*() queues messages in memory even after #Disconnect()
3 participants