You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Segfault and Crash in CollectionManager::load on startup
Steps to reproduce
Unfortunately I don't have an easy repro (without giving you my data), but I believe I have identified the problem in the code.
I believe this is caused by thread-unsafe use of std::map.
See discussion below.
Expected Behavior
Typesense should not crash on startup.
Actual Behavior
Crash from segfault:
# previous lines redacted
I20240415 21:55:57.330691 501 collection_manager.cpp:2276] Indexed 471/471 documents into collection search_references_v1-54f31749-93c6-43d9-8d65-0b5c9f066d26
E20240415 21:56:26.465981 502 backward.hpp:4200] Stack trace (most recent call last) in thread 502:
E20240415 21:56:26.466260 502 backward.hpp:4200] #11 Object "", at 0xffffffffffffffff, in
E20240415 21:56:26.466315 502 backward.hpp:4200] #10 Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x2aaaab509a03, in __clone
E20240415 21:56:26.466341 502 backward.hpp:4200] #9 Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x2aaaab478ac2, in
E20240415 21:56:26.466358 502 backward.hpp:4200] #8 Object "/opt/typesense-server", at 0x55555b0a6843, in execute_native_thread_routine
E20240415 21:56:26.466377 502 backward.hpp:4200] #7 | Source "include/threadpool.h", line 57, in operator()
E20240415 21:56:26.466392 502 backward.hpp:4200] Source "/usr/include/c++/10/future", line 1592, in ThreadPool [0x555558586b1c]
E20240415 21:56:26.466593 502 backward.hpp:4200] #6 | Source "/usr/include/c++/10/future", line 1459, in _M_set_result
E20240415 21:56:26.466655 502 backward.hpp:4200] | Source "/usr/include/c++/10/future", line 412, in call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>()>*, bool*>
E20240415 21:56:26.466679 502 backward.hpp:4200] | Source "/usr/include/c++/10/mutex", line 729, in __gthread_once
E20240415 21:56:26.466706 502 backward.hpp:4200] Source "/usr/include/x86_64-linux-gnu/c++/10/bits/gthr-default.h", line 700, in _M_run [0x555558641a43]
E20240415 21:56:26.466722 502 backward.hpp:4200] #5 Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x2aaaab47dee7, in
E20240415 21:56:26.466743 502 backward.hpp:4200] #4 | Source "/usr/include/c++/10/future", line 572, in operator()
E20240415 21:56:26.466773 502 backward.hpp:4200] Source "/usr/include/c++/10/bits/std_function.h", line 622, in _M_do_set [0x555558585d32]
E20240415 21:56:26.466799 502 backward.hpp:4200] #3 | Source "/usr/include/c++/10/bits/std_function.h", line 292, in __invoke_r<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<_Fn, _Alloc, _Res(_Args ...)>::_M_run<std::_Bind<CollectionManager::load(size_t, size_t)::<lambda()>()>, std::allocator<int>, void, {}>::<lambda()>, void>&>
E20240415 21:56:26.466886 502 backward.hpp:4200] | Source "/usr/include/c++/10/bits/invoke.h", line 115, in __invoke_impl<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<_Fn, _Alloc, _Res(_Args ...)>::_M_run<std::_Bind<CollectionManager::load(size_t, size_t)::<lambda()>()>, std::allocator<int>, void, {}>::<lambda()>, void>&>
E20240415 21:56:26.466912 502 backward.hpp:4200] | Source "/usr/include/c++/10/bits/invoke.h", line 60, in operator()
E20240415 21:56:26.466941 502 backward.hpp:4200] | Source "/usr/include/c++/10/future", line 1397, in operator()
E20240415 21:56:26.466964 502 backward.hpp:4200] | Source "/usr/include/c++/10/future", line 1456, in __invoke_r<void, std::_Bind<CollectionManager::load(size_t, size_t)::<lambda()>()>&>
E20240415 21:56:26.466981 502 backward.hpp:4200] | Source "/usr/include/c++/10/bits/invoke.h", line 110, in __invoke_impl<void, std::_Bind<CollectionManager::load(size_t, size_t)::<lambda()>()>&>
E20240415 21:56:26.467000 502 backward.hpp:4200] | Source "/usr/include/c++/10/bits/invoke.h", line 60, in operator()<>
E20240415 21:56:26.467015 502 backward.hpp:4200] | Source "/usr/include/c++/10/functional", line 499, in __call<void>
E20240415 21:56:26.467034 502 backward.hpp:4200] | Source "/usr/include/c++/10/functional", line 416, in __invoke<CollectionManager::load(size_t, size_t)::<lambda()>&>
E20240415 21:56:26.467051 502 backward.hpp:4200] | Source "/usr/include/c++/10/bits/invoke.h", line 95, in __invoke_impl<void, CollectionManager::load(size_t, size_t)::<lambda()>&>
E20240415 21:56:26.467067 502 backward.hpp:4200] Source "/usr/include/c++/10/bits/invoke.h", line 60, in _M_invoke [0x55555865d7f8]
E20240415 21:56:26.467088 502 backward.hpp:4200] #2 | Source "src/collection_manager.cpp", line 340, in operator[]
E20240415 21:56:26.467104 502 backward.hpp:4200] Source "/usr/include/c++/10/bits/stl_map.h", line 501, in operator() [0x55555865d56f]
E20240415 21:56:26.467119 502 backward.hpp:4200] #1 | Source "/usr/include/c++/10/bits/stl_tree.h", line 2473, in _M_insert_node
E20240415 21:56:26.467134 502 backward.hpp:4200] Source "/usr/include/c++/10/bits/stl_tree.h", line 2372, in _M_emplace_hint_unique<const std::piecewise_construct_t&, std::tuple<const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>, std::tuple<> > [0x55555864611a]
E20240415 21:56:26.467156 502 backward.hpp:4200] #0 Object "/opt/typesense-server", at 0x55555b02a02a, in std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)
Segmentation fault (Address not mapped to object [0x10])
I20240415 21:56:54.869499 362 conversation_manager.cpp:187] Cleared 0 expired conversations
I20240415 21:56:56.146819 360 batched_indexer.cpp:415] Running GC for aborted requests, req map size: 0
I20240415 21:57:54.870014 362 conversation_manager.cpp:187] Cleared 0 expired conversations
I20240415 21:57:57.394960 360 batched_indexer.cpp:415] Running GC for aborted requests, req map size: 0
E20240415 21:58:17.142200 502 typesense_server.cpp:137] Typesense 26.0 is terminating abruptly.
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
In the code, referenced_ins is a std::map, which is not thread safe.
This class is captured by value in the lambda below, which is passed to loading_pool.enqueue. Inside the lambda, the referenced_ins is accessed with a key collection_name. When that key is missing, referenced_ins tries to insert a value-initialized, empty spp::sparse_hash_map at the specified key. Since this operation happens concurrently with other accesses to referenced_ins, this triggers a segfault.
Possible solutions are:
Capture referenced_ins by value instead of reference so that there is no sharing (I would make it const to prevent this sort of access)
Protect referenced_ins with a mutex (complicated, but wouldn't require copying since the referenced_in ref is not invalidated by the insertion)
Check if collection_name is in referenced_ins before accessing to avoid mutation.
Something else?
The text was updated successfully, but these errors were encountered:
Thanks for reporting this: we'll have a fix shortly.
kishorenc
changed the title
Segfault in CollectionManager::load on startup
Segfault due to reference loading in CollectionManager::load on startup
Apr 18, 2024
mgraczyk
added a commit
to Quilt-AI/typesense
that referenced
this issue
May 8, 2024
Description
Segfault and Crash in
CollectionManager::load
on startupSteps to reproduce
Unfortunately I don't have an easy repro (without giving you my data), but I believe I have identified the problem in the code.
I believe this is caused by thread-unsafe use of std::map.
See discussion below.
Expected Behavior
Typesense should not crash on startup.
Actual Behavior
Crash from segfault:
Metadata
Typesense Version:
typesense 26.0
Docker hash sha256:f8a9d59c8ceaf67e547bac03a1df74db9b806abfcd497a6d7d6c8d9d8eef5f20
OS:
macOS, running under docker
Discussion
I looked at the code where the crash occurs.
The crash happens at this line:
https://github.com/typesense/typesense/blob/main/src/collection_manager.cpp#L339-L340
In the code,
referenced_ins
is astd::map
, which is not thread safe.This class is captured by value in the lambda below, which is passed to
loading_pool.enqueue
. Inside the lambda, thereferenced_ins
is accessed with a keycollection_name
. When that key is missing,referenced_ins
tries to insert a value-initialized, emptyspp::sparse_hash_map
at the specified key. Since this operation happens concurrently with other accesses toreferenced_ins
, this triggers a segfault.Possible solutions are:
referenced_ins
by value instead of reference so that there is no sharing (I would make it const to prevent this sort of access)collection_name
is inreferenced_ins
before accessing to avoid mutation.The text was updated successfully, but these errors were encountered: