Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when building with ENABLE_OVERRIDE=1 and linking it statically #297

Open
mxmlnkn opened this issue Feb 5, 2023 · 15 comments
Assignees

Comments

@mxmlnkn
Copy link
Contributor

mxmlnkn commented Feb 5, 2023

I thought I had a pretty simple setup. I'm building rpmalloc with ENABLE_OVERRIDE and then link it into my program. My program does not even include rpmalloc.h or calls from rpmalloc. It only contains a simple main that returns 0. And I still get this SIGSEV error:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00005555555feb28 in _rpmalloc_heap_extract_new_span.part.0 ()
#0  0x00005555555feb28 in _rpmalloc_heap_extract_new_span.part.0 ()
#1  0x00005555556006f8 in rpmalloc ()
#2  0x00007ffff7e1997a in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff7fc947e in call_init (l=<optimized out>, argc=argc@entry=5, argv=argv@entry=0x7fffffffd808, env=env@entry=0x7fffffffd838)
    at ./elf/dl-init.c:70
#4  0x00007ffff7fc9568 in call_init (env=0x7fffffffd838, argv=0x7fffffffd808, argc=5, l=<optimized out>) at ./elf/dl-init.c:33
#5  _dl_init (main_map=0x7ffff7ffe2e0, argc=5, argv=0x7fffffffd808, env=0x7fffffffd838) at ./elf/dl-init.c:117
#6  0x00007ffff7fe32ea in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#7  0x0000000000000005 in ?? ()
#8  0x00007fffffffdc2d in ?? ()
#9  0x00007fffffffdc89 in ?? ()
#10 0x00007fffffffdc8c in ?? ()
#11 0x00007fffffffdc8f in ?? ()
#12 0x00007fffffffdc99 in ?? ()
#13 0x0000000000000000 in ?? ()

CMake excerpt for rpmalloc:

project(rpmalloc C)

add_library(rpmalloc STATIC)
set(RPMALLOC_HOME "${CMAKE_CURRENT_SOURCE_DIR}/external/rpmalloc/rpmalloc")
target_include_directories(rpmalloc SYSTEM INTERFACE ${RPMALLOC_HOME})
target_sources(rpmalloc PRIVATE
    ${RPMALLOC_HOME}/rpmalloc.c
    ${RPMALLOC_HOME}/rpmalloc.h
    ${RPMALLOC_HOME}/rpnew.h
)
set_target_properties(rpmalloc PROPERTIES LINKER_LANGUAGE C)
target_compile_definitions(rpmalloc PUBLIC ENABLE_OVERRIDE=1)

I'm not sure whether I'm using it wrong or whether it is a bug. Some example setups would have been helpful.

I'm making heavy use of static initialization. I'm not sure whether that might be triggering this problem. I can try to minimize the example if the backtrace doesn't ring a bell for you.

@therealthingy
Copy link

Did you build w/ ENABLE_PRELOAD=1 (which seems to be required when using ENABLE_OVERRIDE=1)?

@mxmlnkn
Copy link
Contributor Author

mxmlnkn commented Apr 5, 2023

Did you build w/ ENABLE_PRELOAD=1 (which seems to be required when using ENABLE_OVERRIDE=1)?

No, as you can see from the CMake excerpt. It makes no sense semantically to have to add it because I am not preloading it but statically linking it. I tested it and adding ENABLE_PRELOAD=1 does not help. I get the same segfault and backtrace. Else, I would have wanted for it to be set automatically if it is always required.

(Fortunately, memory allocation was only performance-critical at one location and I could make use of rpmalloc by defining a std::vector with a custom memory allocator: mxmlnkn/indexed_bzip2@dd678c7. Also interesting is the other performance bug that was introduced by rpmalloc and could be fixed with this commit: mxmlnkn/indexed_bzip2@71a193a)

@mjansson mjansson self-assigned this Apr 11, 2023
@mjansson
Copy link
Owner

I can't reproduce this in either develop nor mjansson/rewrite branches. Is this still an issue? Closing this for now.

@mxmlnkn
Copy link
Contributor Author

mxmlnkn commented Dec 22, 2023

I can still reproduce the same segfault like this:

git clone --recursive git@github.com:mxmlnkn/rapidgzip.git
cd rapidgzip
git checkout rapidgzip-v0.11.1
sed -i 's|ENABLE_OVERRIDE=0|ENABLE_OVERRIDE=1|' src/CMakeLists.txt
mkdir build
cd build
cmake ..
make rapidgzip
gdb -ex r -ex bt --args src/tools/rapidgzip --help

I'm currently using it with ENABLE_OVERRIDE=0, so it works fine for my use case, but others might stumble into the same issue.

@mjansson
Copy link
Owner

You can try the mjansson/rewrite branch, it has a different global init flow and probably works better in this case (and you don't really need to call rpmalloc_initialize and rpmalloc_thread_initialize/rpmalloc_thread_finalize either in this branch)

@mjansson
Copy link
Owner

The issue in your case is that the global c++ constructors tries to new/delete - and that rpmalloc is not yet initialized.

@mxmlnkn
Copy link
Contributor Author

mxmlnkn commented Dec 23, 2023

Thank you for figuring that out. The rewrite branch does not segfault on a simple startup!

It seems I've also head similar initialization order issues without ENABLE_OVERRIDE=1 because my current implementation has commented this:

class RpmallocInit
{
public:
    RpmallocInit()
    {
        rpmalloc_initialize();
    }

    ~RpmallocInit()
    {
        rpmalloc_finalize();
    }
};


/* It must be the very first static variable so that rpmalloc is initialized before any usage of malloc
 * when overriding operator new (when including rpnew.h). And, this only works if everything is header-only
 * because else the static variable initialization order becomes undefined across different compile units.
 * That's why we avoid overriding operators new and delete and simply use it as a custom allocator in the
 * few places we know to be performance-critical */
inline static const RpmallocInit rpmallocInit{};

@mjansson
Copy link
Owner

Yeah, unfortunately the order of global constructors like that is not well defined, there is no guarantee that it will be the first thing called.

Give the rewrite a try, I am running some additional test and will promote it to the develop branch shortly if all looks good, heading for a 2.0 release early next year.

@mxmlnkn
Copy link
Contributor Author

mxmlnkn commented Dec 23, 2023

Some questions for the proper upgrade procedure:

  • rpmalloc_thread_finalize( /* release caches */ true ); The argument is gone. What to replace it with?
  • rpmalloc_initialize() now takes a required argument with lots of configuration options. Some defaults for some of the options would be nice, especially to make it future-proof, else any addition of new members would be a breaking change because they would be uninitialized after the upgrade without any changes in the user code. Some function like rpmalloc_default_initialize_interface( rpmalloc_interface_t* memory_interface ) might be helpful.
  • I saw in rpmalloc.c that rpmalloc_initialize( 0 ) is used, why not make that a possible default argument? rpmalloc_initialize(rpmalloc_interface_t* memory_interface = NULL);
  • Why is the pointer argument non-const? I think it should be const to convey that it is an input argument. Currently, rpmalloc_initialize( rpmalloc_interface_t* memory_interface ) could be thought of to initialize the memory_interface struct with default values.

I have pushed this change to my CI:

commit 08ef8ca31caf3be97a11effe8de477c0d4f12c2b (HEAD -> zlib-support, origin/zlib-support)
Author: mxmlnkn <mxmlnkn@github.de>
Date:   Sat Dec 23 10:55:15 2023 +0100

    [wip][build] Update to rpmalloc rewrite

diff --git a/src/core/FasterVector.hpp b/src/core/FasterVector.hpp
index 8c4847f4..019d8d94 100644
--- a/src/core/FasterVector.hpp
+++ b/src/core/FasterVector.hpp
@@ -12,7 +12,7 @@ class RpmallocInit
 public:
     RpmallocInit()
     {
-        rpmalloc_initialize();
+        rpmalloc_initialize( nullptr );
     }
 
     ~RpmallocInit()
diff --git a/src/core/JoiningThread.hpp b/src/core/JoiningThread.hpp
index 0040ff6e..78a2a5d9 100644
--- a/src/core/JoiningThread.hpp
+++ b/src/core/JoiningThread.hpp
@@ -19,7 +19,7 @@ public:
 
     ~RpmallocThreadInit()
     {
-        rpmalloc_thread_finalize( /* release caches */ true );
+        rpmalloc_thread_finalize();
     }
 };
 #endif
diff --git a/src/external/rpmalloc b/src/external/rpmalloc
index f4732ee2..395b352a 160000
--- a/src/external/rpmalloc
+++ b/src/external/rpmalloc
@@ -1 +1 @@
-Subproject commit f4732ee2b8a5a838cf52cfcd6bc4a44bdc084ef2
+Subproject commit 395b352a3ebd8d2c4298e20702b40abf7e9942ac

Rpmalloc does not build on the Windows CI anymore: rpmalloc-windows.log

My other CI tests with and without sanitizers run fine on Linux.

@mjansson
Copy link
Owner

C doesn't have default values for arguments, so would require an extra entry point.

In general the rewrite branch doesn't require to explicitly use any of the initialize/finalize entry points unless you want to do it at specific points. It uses thread or fiber local ctor/dtor to do it internally. Can still be used for explicit thread init/fini if you plan to use thread pools or similar and don't want idle thread to hang on to heaps.

Thread finalize does not have any similar concept of cache release as the thread heaps work differently now.

rpmalloc_initialize can as you say just take a null pointer for default values of everything and use OS mmap/unmap to grab memory.

Will look at making the arg const, good point.

@mjansson
Copy link
Owner

The rewrite branch uses the C11 built in atomics, which for Microsoft toolchain requires the compiler flags /std:c17 /experimental:c11atomics

@mxmlnkn
Copy link
Contributor Author

mxmlnkn commented Dec 23, 2023

In general the rewrite branch doesn't require to explicitly use any of the initialize/finalize entry points unless you want to do it at specific points. It uses thread or fiber local ctor/dtor to do it internally. Can still be used for explicit thread init/fini if you plan to use thread pools or similar and don't want idle thread to hang on to heaps.

That sounds awesome! I see that this would make a rpmalloc_thread_initialize() interface kinda redundant. Does this work even without any preprocessor options? Currently, I have my own std::thread-derived class, partly to ensure that each thread calls rpmalloc initialize/finalize. It would be nice if I could drop that. (Then, I'd only require C++20 std::jthread availability in all manylinux containers to drop my custom thread class completely.)

class JoiningThread
{
public:
    template<class Function, class... Args>
    explicit
    JoiningThread( Function&& function, Args&&... args ) :
#ifdef WITH_RPMALLOC
        m_thread( [=] () {
                      static const thread_local RpmallocThreadInit rpmallocThreadInit{};
                      function( std::forward<Args>( args )... );
                  } )
#else
        m_thread( std::forward<Function>( function ), std::forward<Args>( args )... )
#endif
    {}
...
private:
    std::thread m_thread;
};

@mjansson
Copy link
Owner

Yeah, you should be able to remove that and have ENABLE_OVERRIDE=1 (which it is by default on that branch)

@mjansson
Copy link
Owner

mjansson commented Apr 1, 2024

Did you have a chance to try out the rewrite branch?

@mxmlnkn
Copy link
Contributor Author

mxmlnkn commented Apr 1, 2024

Not much further than in my earlier comment #297 (comment). It seemed to work fine, though. I did try to add /experimental:c11atomics' on Windows and it still doesn't build, but this might be an issue with my cumbersome setup.py script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants