Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault due to reference loading in CollectionManager::load on startup #1676

Open
mgraczyk opened this issue Apr 15, 2024 · 1 comment · May be fixed by #1716
Open

Segfault due to reference loading in CollectionManager::load on startup #1676

mgraczyk opened this issue Apr 15, 2024 · 1 comment · May be fixed by #1716
Labels

Comments

@mgraczyk
Copy link

mgraczyk commented Apr 15, 2024

Description

Segfault and Crash in CollectionManager::load on startup

Steps to reproduce

Unfortunately I don't have an easy repro (without giving you my data), but I believe I have identified the problem in the code.
I believe this is caused by thread-unsafe use of std::map.

See discussion below.

Expected Behavior

Typesense should not crash on startup.

Actual Behavior

Crash from segfault:

# previous lines redacted
I20240415 21:55:57.330691   501 collection_manager.cpp:2276] Indexed 471/471 documents into collection search_references_v1-54f31749-93c6-43d9-8d65-0b5c9f066d26
E20240415 21:56:26.465981   502 backward.hpp:4200] Stack trace (most recent call last) in thread 502:
E20240415 21:56:26.466260   502 backward.hpp:4200] #11   Object "", at 0xffffffffffffffff, in
E20240415 21:56:26.466315   502 backward.hpp:4200] #10   Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x2aaaab509a03, in __clone
E20240415 21:56:26.466341   502 backward.hpp:4200] #9    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x2aaaab478ac2, in
E20240415 21:56:26.466358   502 backward.hpp:4200] #8    Object "/opt/typesense-server", at 0x55555b0a6843, in execute_native_thread_routine
E20240415 21:56:26.466377   502 backward.hpp:4200] #7  | Source "include/threadpool.h", line 57, in operator()
E20240415 21:56:26.466392   502 backward.hpp:4200]       Source "/usr/include/c++/10/future", line 1592, in ThreadPool [0x555558586b1c]
E20240415 21:56:26.466593   502 backward.hpp:4200] #6  | Source "/usr/include/c++/10/future", line 1459, in _M_set_result
E20240415 21:56:26.466655   502 backward.hpp:4200]     | Source "/usr/include/c++/10/future", line 412, in call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>()>*, bool*>
E20240415 21:56:26.466679   502 backward.hpp:4200]     | Source "/usr/include/c++/10/mutex", line 729, in __gthread_once
E20240415 21:56:26.466706   502 backward.hpp:4200]       Source "/usr/include/x86_64-linux-gnu/c++/10/bits/gthr-default.h", line 700, in _M_run [0x555558641a43]
E20240415 21:56:26.466722   502 backward.hpp:4200] #5    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x2aaaab47dee7, in
E20240415 21:56:26.466743   502 backward.hpp:4200] #4  | Source "/usr/include/c++/10/future", line 572, in operator()
E20240415 21:56:26.466773   502 backward.hpp:4200]       Source "/usr/include/c++/10/bits/std_function.h", line 622, in _M_do_set [0x555558585d32]
E20240415 21:56:26.466799   502 backward.hpp:4200] #3  | Source "/usr/include/c++/10/bits/std_function.h", line 292, in __invoke_r<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<_Fn, _Alloc, _Res(_Args ...)>::_M_run<std::_Bind<CollectionManager::load(size_t, size_t)::<lambda()>()>, std::allocator<int>, void, {}>::<lambda()>, void>&>
E20240415 21:56:26.466886   502 backward.hpp:4200]     | Source "/usr/include/c++/10/bits/invoke.h", line 115, in __invoke_impl<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<_Fn, _Alloc, _Res(_Args ...)>::_M_run<std::_Bind<CollectionManager::load(size_t, size_t)::<lambda()>()>, std::allocator<int>, void, {}>::<lambda()>, void>&>
E20240415 21:56:26.466912   502 backward.hpp:4200]     | Source "/usr/include/c++/10/bits/invoke.h", line 60, in operator()
E20240415 21:56:26.466941   502 backward.hpp:4200]     | Source "/usr/include/c++/10/future", line 1397, in operator()
E20240415 21:56:26.466964   502 backward.hpp:4200]     | Source "/usr/include/c++/10/future", line 1456, in __invoke_r<void, std::_Bind<CollectionManager::load(size_t, size_t)::<lambda()>()>&>
E20240415 21:56:26.466981   502 backward.hpp:4200]     | Source "/usr/include/c++/10/bits/invoke.h", line 110, in __invoke_impl<void, std::_Bind<CollectionManager::load(size_t, size_t)::<lambda()>()>&>
E20240415 21:56:26.467000   502 backward.hpp:4200]     | Source "/usr/include/c++/10/bits/invoke.h", line 60, in operator()<>
E20240415 21:56:26.467015   502 backward.hpp:4200]     | Source "/usr/include/c++/10/functional", line 499, in __call<void>
E20240415 21:56:26.467034   502 backward.hpp:4200]     | Source "/usr/include/c++/10/functional", line 416, in __invoke<CollectionManager::load(size_t, size_t)::<lambda()>&>
E20240415 21:56:26.467051   502 backward.hpp:4200]     | Source "/usr/include/c++/10/bits/invoke.h", line 95, in __invoke_impl<void, CollectionManager::load(size_t, size_t)::<lambda()>&>
E20240415 21:56:26.467067   502 backward.hpp:4200]       Source "/usr/include/c++/10/bits/invoke.h", line 60, in _M_invoke [0x55555865d7f8]
E20240415 21:56:26.467088   502 backward.hpp:4200] #2  | Source "src/collection_manager.cpp", line 340, in operator[]
E20240415 21:56:26.467104   502 backward.hpp:4200]       Source "/usr/include/c++/10/bits/stl_map.h", line 501, in operator() [0x55555865d56f]
E20240415 21:56:26.467119   502 backward.hpp:4200] #1  | Source "/usr/include/c++/10/bits/stl_tree.h", line 2473, in _M_insert_node
E20240415 21:56:26.467134   502 backward.hpp:4200]       Source "/usr/include/c++/10/bits/stl_tree.h", line 2372, in _M_emplace_hint_unique<const std::piecewise_construct_t&, std::tuple<const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>, std::tuple<> > [0x55555864611a]
E20240415 21:56:26.467156   502 backward.hpp:4200] #0    Object "/opt/typesense-server", at 0x55555b02a02a, in std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)
Segmentation fault (Address not mapped to object [0x10])
I20240415 21:56:54.869499   362 conversation_manager.cpp:187] Cleared 0 expired conversations
I20240415 21:56:56.146819   360 batched_indexer.cpp:415] Running GC for aborted requests, req map size: 0
I20240415 21:57:54.870014   362 conversation_manager.cpp:187] Cleared 0 expired conversations
I20240415 21:57:57.394960   360 batched_indexer.cpp:415] Running GC for aborted requests, req map size: 0
E20240415 21:58:17.142200   502 typesense_server.cpp:137] Typesense 26.0 is terminating abruptly.
qemu: uncaught target signal 11 (Segmentation fault) - core dumped

Metadata

Typesense Version:

typesense 26.0
Docker hash sha256:f8a9d59c8ceaf67e547bac03a1df74db9b806abfcd497a6d7d6c8d9d8eef5f20

OS:
macOS, running under docker

Discussion

I looked at the code where the crash occurs.

The crash happens at this line:
https://github.com/typesense/typesense/blob/main/src/collection_manager.cpp#L339-L340

In the code, referenced_ins is a std::map, which is not thread safe.
This class is captured by value in the lambda below, which is passed to loading_pool.enqueue. Inside the lambda, the referenced_ins is accessed with a key collection_name. When that key is missing, referenced_ins tries to insert a value-initialized, empty spp::sparse_hash_map at the specified key. Since this operation happens concurrently with other accesses to referenced_ins, this triggers a segfault.

Possible solutions are:

  1. Capture referenced_ins by value instead of reference so that there is no sharing (I would make it const to prevent this sort of access)
  2. Protect referenced_ins with a mutex (complicated, but wouldn't require copying since the referenced_in ref is not invalidated by the insertion)
  3. Check if collection_name is in referenced_ins before accessing to avoid mutation.
  4. Something else?
@kishorenc kishorenc added the bug label Apr 16, 2024
@kishorenc
Copy link
Member

Thanks for reporting this: we'll have a fix shortly.

@kishorenc kishorenc changed the title Segfault in CollectionManager::load on startup Segfault due to reference loading in CollectionManager::load on startup Apr 18, 2024
mgraczyk added a commit to Quilt-AI/typesense that referenced this issue May 8, 2024
mgraczyk added a commit to Quilt-AI/typesense that referenced this issue May 8, 2024
@mgraczyk mgraczyk linked a pull request May 8, 2024 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants