Skip to content
This repository has been archived by the owner on Dec 1, 2021. It is now read-only.

[Bug] Running into a ratelimit with a custom logger causes segmentation fault #79

Open
NexAdn opened this issue May 9, 2021 · 0 comments

Comments

@NexAdn
Copy link

NexAdn commented May 9, 2021

Environment

  • Recent Gentoo Linux
  • gcc 10.2.0
  • CMake 3.18.5
  • aegis.cpp Version 2.7.0
  • spdlog 1.8.2
  • libfmt 7.0.3
  • ASIO (aegis fork) 1.12.2
  • Websocketpp (aegis fork) 0.7.0
  • nlohmann JSON 3.9.1

Description

When using aegis with a custom logger (tested with spdlog::sinks::basic_file_sink_mt, spdlog::sinks::basic_file_sink_st, spdlog::sinks::null_sink_mt and spdlog::sinks::stdout_color_sink_mt) running into a rate limit (e.g. by spamming messages to a channel using a while (true) {/*...*/} construct), the executable gets terminated with SIGSEGV.

Example code

An excerpt from my main() without all the #include directives and command line parsing.

	auto log_file_sink =
		std::make_shared<spdlog::sinks::basic_file_sink_mt>(logfile_path); // logfile_path is a std::string, same problem when using any other mentioned sink
	auto logger =
		std::make_shared<spdlog::logger>("my_cool_logger_name", log_file_sink);
	logger->flush_on(spdlog::level::info); // removing this line doesn't mitigate the issue

	if (log_verbose) // log_verbose is a bool and (in my tests) always true
		logger->set_level(spdlog::level::debug); // same results when using spdlog::level::trace or spdlog::level::info

	logger->debug("creating aegis::core");
        // token is a std::string
	aegis::core core(aegis::create_bot_t{}.token(token).logger(logger)); // remove .logger(logger) and everything works fine!
	core.run();

	while (true)
	{
		core.channel_create(TEST_CHANNEL_ID)->create_message("spam").get();
	}

Behaviour when the code gets executed

The program runs fine for some time, spamming the given channel with messages until the rate limit kicks in.
After that, the programm terminates immediately, leaving (at least to me) no reasonable output:

[2021-05-09 16:48:59.280] [my_cool_logger_name] [debug] creating aegis::core
[2021-05-09 16:48:59.281] [my_cool_logger_name] [info] Creating websocket
[2021-05-09 16:48:59.550] [my_cool_logger_name] [info] Shard count: 1
[2021-05-09 16:48:59.550] [my_cool_logger_name] [info] Starting shard manager with 1 shards
[2021-05-09 16:48:59.550] [my_cool_logger_name] [info] Starting bot with 1 shards
[2021-05-09 16:48:59.550] [my_cool_logger_name] [info] Websocket[s] connecting
[2021-05-09 16:48:59.650] [my_cool_logger_name] [debug] Shards to connect : 1
[2021-05-09 16:48:59.650] [my_cool_logger_name] [debug] Shard#0: connecting
[2021-05-09 16:48:59.782] [my_cool_logger_name] [debug] message::populate_self() user not found - created
[2021-05-09 16:49:01.152] [my_cool_logger_name] [debug] Shard#0: connection established
[2021-05-09 16:49:02.085] [my_cool_logger_name] [info] Shard#0 READY Processed

(same output with spdlog::level::debug and spdlog::level::trace)

gdb gives me the following:

GNU gdb (Gentoo 10.1 vanilla) 10.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./src/(my test executable)...
(gdb) r
Starting program: (my test executable) -l output.log -v -t (my token)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff7585640 (LWP 122985)]
[New Thread 0x7ffff6d84640 (LWP 122986)]
[New Thread 0x7ffff6583640 (LWP 122987)]
[New Thread 0x7ffff5d82640 (LWP 122988)]
[New Thread 0x7ffff5555640 (LWP 122989)]

Thread 3 "(my bot name)" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff6d84640 (LWP 122986)]
0x00005555555b8bb7 in aegis::ratelimit::bucket::perform(aegis::rest::request_params) ()
(gdb) bt
#0  0x00005555555b8bb7 in aegis::ratelimit::bucket::perform(aegis::rest::request_params) ()
#1  0x00005555555ce781 in asio::detail::executor_op<asio::detail::work_dispatcher<aegis::core::async<aegis::ratelimit::ratelimit_mgr::post_task<aegis::gateway::objects::message, void>(aegis::rest::request_params)::{lambda()#1}, aegis::gateway::objects::message, void>(aegis::ratelimit::ratelimit_mgr::post_task<aegis::gateway::objects::message, void>(aegis::rest::request_params)::{lambda()#1})::{lambda()#1}>, std::allocator<void>, asio::detail::scheduler_operation>::do_complete(void*, asio::detail::scheduler_operation*, std::error_code const&, unsigned long) ()
#2  0x00005555555ff62f in asio::detail::scheduler::run(std::error_code&) ()
#3  0x0000555555601cda in asio::io_context::run() ()
#4  0x00005555555d5f8e in aegis::core::_thread_track(aegis::thread_state*) ()
#5  0x00007ffff79c9ef0 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/libstdc++.so.6
#6  0x00007ffff7e63f5e in start_thread () from /lib64/libpthread.so.0
#7  0x00007ffff781c1ef in clone () from /lib64/libc.so.6

sometimes, this is the function where the SEGV is received:

0x00005555555b8297 in void spdlog::logger::log_<fmt::v7::basic_string_view<char>, int>(spdlog::source_loc, spdlog::level::level_enum, fmt::v7::basic_string_view<char> const&, int const&) ()

Workaround

Refrain from using custom loggers or running into rate limits (good luck with that^^) for now.

Additional remarks

From the following excerpt I had the assumption that aegis might be implicitly depending on the logger being named aegis:

if (bot_config._log)
log = bot_config._log;
else
setup_logging();

spdlog::get("aegis")->debug("Ratelimit almost hit: {}({}) - waiting {}ms", rest::rest_controller::get_method(params.method), params.path, waitfor.count());

(note the spdlog::get)

However, without having searched the rest of the codebase, I assume that spdlog::get will be called elsewhere as well. Furthermore, giving the custom logger the name aegis like in the aegis codebase (see below) doesn't resolve the problem.

log = std::make_shared<spdlog::async_logger>("aegis", sinks.begin(), sinks.end(), spdlog::thread_pool(), spdlog::async_overflow_policy::block);

Another explanation might be that the file logger is too slow for the rate limit manager and somehow causes the rate limit manager to access memory which (possibly from another thread) has been deallocated by the time the memory is accessed.

Lastly, it still might be possible that the rate limit manager is the only one accessing the logger using spdlog::get() which I'd find a bit odd. But then, doing the registration of the logger wrong in a way that spdlog::get() returns an invalid pointer (possibly nullptr) could explain the symptoms. But it doesn't explain the debugger sometimes telling me that SEGV is received during an internal spdlog function call (for which I assume the logger object must have been accessed correctly).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant