Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All tests fail with Attempt to free invalid pointer 0x82799f060: this is a suspected memory corruption problem #412

Open
yurivict opened this issue Jul 30, 2022 · 15 comments

Comments

@yurivict
Copy link

yurivict commented Jul 30, 2022

Failure:

$ python3.9 test_arit.py
src/tcmalloc.cc:333] Attempt to free invalid pointer 0x82799f060 
Abort trap

Stack trace:

#0  thr_kill () at thr_kill.S:4
#1  0x000000080072d104 in __raise (s=s@entry=6) at /disk-samsung/freebsd-src/lib/libc/gen/raise.c:52
#2  0x00000008007dddc9 in abort () at /disk-samsung/freebsd-src/lib/libc/stdlib/abort.c:67
#3  0x0000000805025591 in tcmalloc::Log(tcmalloc::LogMode, char const*, int, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem) ()
   from /usr/local/lib/libtcmalloc.so.4
#4  0x0000000805021b85 in ?? () from /usr/local/lib/libtcmalloc.so.4
#5  0x00000008006668a0 in _thr_mutexattr_destroy (attr=0x7fffffffaba0) at /disk-samsung/freebsd-src/lib/libthr/thread/thr_mutexattr.c:180
#6  0x00000008052de489 in std::__1::__libcpp_recursive_mutex_init (__m=<optimized out>) at /disk-samsung/freebsd-src/contrib/llvm-project/libcxx/include/__threading_support:268
#7  std::__1::recursive_mutex::recursive_mutex (this=0x80414a140) at /disk-samsung/freebsd-src/contrib/llvm-project/libcxx/src/mutex.cpp:56
#8  0x0000000802d95f2f in ?? () from /usr/local/lib/libsymengine.so.0.9
#9  0x0000000802d7a744 in ?? () from /usr/local/lib/libsymengine.so.0.9
#10 0x0000000802d7a6b5 in ?? () from /usr/local/lib/libsymengine.so.0.9
#11 0x0000000802cb1d61 in ?? () from /usr/local/lib/libsymengine.so.0.9
#12 0x0000000802ccc161 in ?? () from /usr/local/lib/libsymengine.so.0.9
#13 0x000000080020e1bd in objlist_call_init (list=list@entry=0x7fffffffb610, lockstate=lockstate@entry=0x7fffffffb590) at /disk-samsung/freebsd-src/libexec/rtld-elf/rtld.c:3141
#14 0x0000000800212966 in dlopen_object (name=name@entry=0x801850750 "/usr/local/lib/python3.9/site-packages/symengine/lib/symengine_wrapper.cpython-39.so", fd=fd@entry=-1,
    refobj=<optimized out>, lo_flags=<optimized out>, mode=mode@entry=2, lockstate=0x7fffffffb590, lockstate@entry=0x0) at /disk-samsung/freebsd-src/libexec/rtld-elf/rtld.c:3889
#15 0x000000080020f1ae in rtld_dlopen (name=0x801850750 "/usr/local/lib/python3.9/site-packages/symengine/lib/symengine_wrapper.cpython-39.so", fd=-1, mode=<optimized out>)
    at /disk-samsung/freebsd-src/libexec/rtld-elf/rtld.c:3749
#16 0x0000000800506b21 in ?? () from /usr/local/lib/libpython3.9.so.1.0
#17 0x00000008004d1a6f in ?? () from /usr/local/lib/libpython3.9.so.1.0
#18 0x00000008004d13b6 in ?? () from /usr/local/lib/libpython3.9.so.1.0
#19 0x000000080040f457 in ?? () from /usr/local/lib/libpython3.9.so.1.0

symengine.py-0.9.2
symengine-0.9.0
Python-3.9
clang-14
FreeBSD 13.1

@rikardn
Copy link
Contributor

rikardn commented Aug 4, 2022

Thanks for reporting this nasty bug. Since this doesn't show up in the CI on Mac and Linux I guess this is related to FreeBSD or possible clang-14 or the llvm c++ standard library.

In the stacktrace we don't see from where in libsymengine we are calling. Could you possibly compile with debug flags to get a better view of the trace? Do you know if it is failing on the very first test, some other test or an all tests?

@yurivict
Copy link
Author

yurivict commented Aug 7, 2022

#0  thr_kill () at thr_kill.S:4
#1  0x0000000800802104 in __raise (s=s@entry=6) at /disk-samsung/freebsd-src/lib/libc/gen/raise.c:52
#2  0x00000008008b2dc9 in abort () at /disk-samsung/freebsd-src/lib/libc/stdlib/abort.c:67
#3  0x0000000805d8f591 in tcmalloc::Log(tcmalloc::LogMode, char const*, int, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem) ()
   from /usr/local/lib/libtcmalloc.so.4
#4  0x0000000805d8bb85 in ?? () from /usr/local/lib/libtcmalloc.so.4
#5  0x000000080073b8a0 in _thr_mutexattr_destroy (attr=0x7ffffffec7e0) at /disk-samsung/freebsd-src/lib/libthr/thread/thr_mutexattr.c:180
#6  0x0000000806048489 in std::__1::__libcpp_recursive_mutex_init (__m=<optimized out>) at /disk-samsung/freebsd-src/contrib/llvm-project/libcxx/include/__threading_support:268
#7  std::__1::recursive_mutex::recursive_mutex (this=0x804fb9b60 <getManagedStaticMutex()::m>) at /disk-samsung/freebsd-src/contrib/llvm-project/libcxx/src/mutex.cpp:56
#8  0x0000000803b6873f in llvm::ManagedStaticBase::RegisterManagedStatic(void* (*)(), void (*)(void*)) const () from /usr/local/lib/libsymengine.so.0.9
#9  0x0000000803b4cfe4 in llvm::cl::OptionCategory::registerCategory() () from /usr/local/lib/libsymengine.so.0.9
#10 0x0000000803b4cf55 in llvm::cl::getGeneralCategory() () from /usr/local/lib/libsymengine.so.0.9
#11 0x0000000803a84601 in llvm::cl::opt<bool, false, llvm::cl::parser<bool> >::opt<char [42], llvm::cl::desc, llvm::cl::OptionHidden>(char const (&) [42], llvm::cl::desc const&, llvm::cl::OptionHidden const&) () from /usr/local/lib/libsymengine.so.0.9
#12 0x0000000803a9ea01 in _GLOBAL__sub_I_X86AsmParser.cpp () from /usr/local/lib/libsymengine.so.0.9
#13 0x000000080020e1bd in objlist_call_init (list=list@entry=0x7ffffffed250, lockstate=lockstate@entry=0x7ffffffed1d0) at /disk-samsung/freebsd-src/libexec/rtld-elf/rtld.c:3141
#14 0x0000000800212966 in dlopen_object (name=name@entry=0x8018579d0 "/usr/local/lib/python3.9/site-packages/symengine/lib/symengine_wrapper.cpython-39.so", fd=fd@entry=-1, 
    refobj=<optimized out>, lo_flags=<optimized out>, mode=mode@entry=2, lockstate=0x7ffffffed1d0, lockstate@entry=0x0) at /disk-samsung/freebsd-src/libexec/rtld-elf/rtld.c:3889
#15 0x000000080020f1ae in rtld_dlopen (name=0x8018579d0 "/usr/local/lib/python3.9/site-packages/symengine/lib/symengine_wrapper.cpython-39.so", fd=-1, mode=<optimized out>)
    at /disk-samsung/freebsd-src/libexec/rtld-elf/rtld.c:3749
#16 0x00000008005bb418 in _PyImport_FindSharedFuncptr (prefix=0x8002b271c "PyInit", shortname=0x8018bf890 "symengine_wrapper", 
    pathname=0x8018579d0 "/usr/local/lib/python3.9/site-packages/symengine/lib/symengine_wrapper.cpython-39.so", fp=0x0) at ./Python/dynload_shlib.c:100
#17 0x0000000800570e6e in _PyImport_LoadDynamicModuleWithSpec (spec=0x8018c9d30, fp=0x0) at ./Python/importdl.c:137
#18 0x00000008005709ab in _imp_create_dynamic_impl (module=0x801415680, spec=0x8018c9d30, file=0x0) at Python/import.c:2302
#19 0x000000080056feae in _imp_create_dynamic (module=0x801415680, args=0x8018c9898, nargs=1) at Python/clinic/import.c.h:330
#20 0x000000080045b316 in cfunction_vectorcall_FASTCALL (func=0x801428b30, args=0x8018c9898, nargsf=1, kwnames=0x0) at Objects/methodobject.c:430
#21 0x00000008004000e9 in PyVectorcall_Call (callable=0x801428b30, tuple=0x8018c9880, kwargs=0x8018cb8c0) at Objects/call.c:231
#22 0x00000008004001ec in _PyObject_Call (tstate=0x800e4b000, callable=0x801428b30, args=0x8018c9880, kwargs=0x8018cb8c0) at Objects/call.c:266
#23 0x00000008004002e2 in PyObject_Call (callable=0x801428b30, args=0x8018c9880, kwargs=0x8018cb8c0) at Objects/call.c:293
#24 0x000000080053b661 in do_call_core (tstate=0x800e4b000, func=0x801428b30, callargs=0x8018c9880, kwdict=0x8018cb8c0) at Python/ceval.c:5097
#25 0x0000000800537fce in _PyEval_EvalFrameDefault (tstate=0x800e4b000, f=0x801529200, throwflag=0) at Python/ceval.c:3582
#26 0x000000080052c04f in _PyEval_EvalFrame (tstate=0x800e4b000, f=0x801529200, throwflag=0) at ./Include/internal/pycore_ceval.h:40
#27 0x000000080053c4ec in _PyEval_EvalCode (tstate=0x800e4b000, _co=0x801418c90, globals=0x801425f00, locals=0x0, args=0x80180a1d0, argcount=2, kwnames=0x0, kwargs=0x80180a1e0, kwcount=0, 
    kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x801415a30, qualname=0x801415a30) at Python/ceval.c:4329
#28 0x000000080040062d in _PyFunction_Vectorcall (func=0x80142c3a0, stack=0x80180a1d0, nargsf=9223372036854775810, kwnames=0x0) at Objects/call.c:396
#29 0x000000080053de79 in _PyObject_VectorcallTstate (tstate=0x800e4b000, callable=0x80142c3a0, args=0x80180a1d0, nargsf=9223372036854775810, kwnames=0x0)
    at ./Include/cpython/abstract.h:118
#30 0x000000080053b2fa in PyObject_Vectorcall (callable=0x80142c3a0, args=0x80180a1d0, nargsf=9223372036854775810, kwnames=0x0) at ./Include/cpython/abstract.h:127
#31 0x000000080053b3e9 in call_function (tstate=0x800e4b000, pp_stack=0x7fffffff0148, oparg=2, kwnames=0x0) at Python/ceval.c:5077
#32 0x00000008005379d8 in _PyEval_EvalFrameDefault (tstate=0x800e4b000, f=0x80180a040, throwflag=0) at Python/ceval.c:3489
#33 0x000000080040209f in _PyEval_EvalFrame (tstate=0x800e4b000, f=0x80180a040, throwflag=0) at ./Include/internal/pycore_ceval.h:40
#34 0x000000080040070b in function_code_fastcall (tstate=0x800e4b000, co=0x801445b30, args=0x800bc9a40, nargs=2, globals=0x8014354c0) at Objects/call.c:330
#35 0x0000000800400425 in _PyFunction_Vectorcall (func=0x80146e700, stack=0x800bc9a30, nargsf=9223372036854775810, kwnames=0x0) at Objects/call.c:367
#36 0x000000080053de79 in _PyObject_VectorcallTstate (tstate=0x800e4b000, callable=0x80146e700, args=0x800bc9a30, nargsf=9223372036854775810, kwnames=0x0)
    at ./Include/cpython/abstract.h:118
#37 0x000000080053b2fa in PyObject_Vectorcall (callable=0x80146e700, args=0x800bc9a30, nargsf=9223372036854775810, kwnames=0x0) at ./Include/cpython/abstract.h:127
#38 0x000000080053b3e9 in call_function (tstate=0x800e4b000, pp_stack=0x7fffffff1448, oparg=2, kwnames=0x0) at Python/ceval.c:5077
#39 0x0000000800537a21 in _PyEval_EvalFrameDefault (tstate=0x800e4b000, f=0x800bc98b0, throwflag=0) at Python/ceval.c:3506
#40 0x000000080040209f in _PyEval_EvalFrame (tstate=0x800e4b000, f=0x800bc98b0, throwflag=0) at ./Include/internal/pycore_ceval.h:40

@yurivict
Copy link
Author

yurivict commented Aug 7, 2022

This happens inside of dlopen, and is related to mutexes.

@yurivict
Copy link
Author

yurivict commented Aug 7, 2022

When LLVM option was turned OFF the stack changed:

#1  0x0000000800802104 in __raise (s=s@entry=6) at /disk-samsung/freebsd-src/lib/libc/gen/raise.c:52
#2  0x00000008008b2dc9 in abort () at /disk-samsung/freebsd-src/lib/libc/stdlib/abort.c:67
#3  0x0000000803e92591 in tcmalloc::Log(tcmalloc::LogMode, char const*, int, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem) ()
   from /usr/local/lib/libtcmalloc.so.4
#4  0x0000000803e8eb85 in ?? () from /usr/local/lib/libtcmalloc.so.4
#5  0x000000080073cdd9 in _thr_rwlock_destroy (rwlock=<optimized out>) at /disk-samsung/freebsd-src/lib/libthr/thread/thr_rwlock.c:136
#6  0x0000000849653b43 in CRYPTO_THREAD_lock_free (lock=0x6a5a6) at /disk-samsung/freebsd-src/crypto/openssl/crypto/threads_pthread.c:107
#7  0x00000008496d29d2 in DSO_free (dso=0x800e5b050) at /disk-samsung/freebsd-src/crypto/openssl/crypto/dso/dso_lib.c:92
#8  0x0000000849650bb1 in ossl_init_load_crypto_nodelete () at /disk-samsung/freebsd-src/crypto/openssl/crypto/init.c:204
#9  ossl_init_load_crypto_nodelete_ossl_ () at /disk-samsung/freebsd-src/crypto/openssl/crypto/init.c:159
#10 0x000000080073bab3 in _thr_once (once_control=0x849823360, init_routine=0x849650b70 <ossl_init_load_crypto_nodelete_ossl_>) at /disk-samsung/freebsd-src/lib/libthr/thread/thr_once.c:98
#11 0x0000000849653b69 in CRYPTO_THREAD_run_once (once=0x6a5a6, init=0x6) at /disk-samsung/freebsd-src/crypto/openssl/crypto/threads_pthread.c:118
#12 0x0000000849650622 in OPENSSL_init_crypto (opts=opts@entry=8, settings=settings@entry=0x0) at /disk-samsung/freebsd-src/crypto/openssl/crypto/init.c:649
#13 0x0000000849741b8d in EVP_MD_do_all (fn=0x849498430 <_openssl_hash_name_mapper>, arg=0x7ffffffad5a8) at /disk-samsung/freebsd-src/crypto/openssl/crypto/evp/names.c:162
#14 0x0000000849491ee5 in hashlib_md_meth_names (module=0x848ffd4f0) at Modules/_hashopenssl.c:1900
#15 0x0000000849491b97 in PyInit__hashlib () at Modules/_hashopenssl.c:2283
#16 0x0000000800570f31 in _PyImport_LoadDynamicModuleWithSpec (spec=0x848ffcc70, fp=0x0) at ./Python/importdl.c:167
#17 0x00000008005709ab in _imp_create_dynamic_impl (module=0x801415680, spec=0x848ffcc70, file=0x0) at Python/import.c:2302
#18 0x000000080056feae in _imp_create_dynamic (module=0x801415680, args=0x848ffcbc8, nargs=1) at Python/clinic/import.c.h:330
#19 0x000000080045b316 in cfunction_vectorcall_FASTCALL (func=0x801428b30, args=0x848ffcbc8, nargsf=1, kwnames=0x0) at Objects/methodobject.c:430
#20 0x00000008004000e9 in PyVectorcall_Call (callable=0x801428b30, tuple=0x848ffcbb0, kwargs=0x848d3bfc0) at Objects/call.c:231
#21 0x00000008004001ec in _PyObject_Call (tstate=0x800e4b000, callable=0x801428b30, args=0x848ffcbb0, kwargs=0x848d3bfc0) at Objects/call.c:266

Otherwise the error message is the same.
This looks like memory corruption.

@yurivict
Copy link
Author

yurivict commented Aug 7, 2022

When I ran the python testcase with HEAPCHECK=normal LD_PRELOAD=/usr/local/lib/libtcmalloc.so the error disappeared.

@yurivict
Copy link
Author

yurivict commented Aug 7, 2022

Will also run with Valgrind.

@yurivict
Copy link
Author

yurivict commented Aug 8, 2022

Valgrind didn't find anything specific in SymEngine.

But turning off google perftools caused failures to disappear. IMO this means that this is a memory corruption problem. The behavior changes with memory layout changes.

@yurivict yurivict changed the title All tests fail with Attempt to free invalid pointer 0x82799f060 All tests fail with Attempt to free invalid pointer 0x82799f060: this is a suspected memory corruption problem Aug 8, 2022
@rikardn
Copy link
Contributor

rikardn commented Aug 9, 2022

In your trace with llvm turned off symengine isn't being called. The call chain is something like Python->openssl->threads->tcmalloc. Is tcmalloc used by default here? Why is openssl being called? I am starting to suspect that something other than symengine is causing this issue.

@yurivict
Copy link
Author

yurivict commented Aug 9, 2022

Memory corruption could have occurred earlier and it crashed when the Python interpreter called some OpenSSL code later.

@rikardn
Copy link
Contributor

rikardn commented Aug 9, 2022

Yes, true. Very challenging to debug in that case. We don't know for sure that symengine is causing it. Could also be compiler, libraries, Python or operating system. Perhaps changing some of the other moving parts could give some clues.

@yurivict
Copy link
Author

yurivict commented Aug 9, 2022

I turned off the TCMALLOC in SymEngine in the FreeBSD port as a workaround. perftools changes memory allocation and this triggers this issue. But the problem is most likely somewhere in SymEngine IMO.

@certik
Copy link
Contributor

certik commented Aug 9, 2022

@yurivict thanks for the report and for debugging this. Is SymEngine compiled in Debug mode with all checks enabled? Is the bug still there?

Things can fail in Release mode if we have a bug in the code, but usually in Debug build our asserts will catch it.

If you think the bug is in symengine, then we need to figure out how to reproduce it reliably on your machine. Once we have that, then we need to "bisect" (manually removing parts in the large reproducer to make the reproduce smaller) until we figure out what is going on.

@yurivict
Copy link
Author

yurivict commented Aug 9, 2022

Is SymEngine compiled in Debug mode with all checks enabled? Is the bug still there?

Yes, Debug code falls the same way.

If you think the bug is in symengine, then we need to figure out how to reproduce it reliably on your machine.

It can be reliably reproduced on FreeBSD. The math/symengine port should be built with TCMALLOC=ON, and then math/py-symengine reliably fails.

@certik
Copy link
Contributor

certik commented Aug 9, 2022

Thanks, perfect. I think it's a bug, so let's fix it.

If you have time to work on this a bit, what you can do is to try to "minimize" the reproducer. For example can you reproduce the bug without py-symengine? It would be great to eliminate Python somehow. Do you think the bug is in the Python wrappers or in the C++ symengine?

@yurivict
Copy link
Author

yurivict commented Aug 9, 2022

Do you think the bug is in the Python wrappers or in the C++ symengine?

I don't know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants