Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH] Crash on exit with Poco exception #5744

Closed
zhanglistar opened this issue May 14, 2024 · 2 comments
Closed

[CH] Crash on exit with Poco exception #5744

zhanglistar opened this issue May 14, 2024 · 2 comments
Labels
bug Something isn't working triage

Comments

@zhanglistar
Copy link
Contributor

Backend

CH (ClickHouse)

Bug description

Log is:

2024/05/14 16:27:30,975 INFO [dispatcher-Executor] JniLibLoader: Library /data9/hadoop/yarn/local/usercache/xumingyong/appcache/application_1710462985812_739851/container_e57_1710462985812_739851_01_000015/./libch.so has been loaded using path-loading method
2024-05-14 16:27:31.236 <Information> ClickHouseBackend: Init environment variables.
2024-05-14 16:27:31.238 <Debug> CHUtil: Set settings from config key:spark.gluten.sql.columnar.backend.ch.runtime_config.local_engine.settings.log_processors_profiles value:true
2024-05-14 16:27:31.238 <Debug> CHUtil: Set settings key:spark.gluten.sql.columnar.backend.ch.runtime_settings.input_format_orc_row_batch_size value:10000
2024-05-14 16:27:31.238 <Debug> CHUtil: Set settings key:spark.gluten.sql.columnar.backend.ch.runtime_settings.join_algorithm value:grace_hash
2024-05-14 16:27:31.239 <Debug> CHUtil: Set settings key:spark.gluten.sql.columnar.backend.ch.runtime_settings.max_block_size value:8192
2024-05-14 16:27:31.239 <Debug> CHUtil: Set settings key:spark.gluten.sql.columnar.backend.ch.runtime_settings.max_bytes_before_external_group_by value:1610612736
2024-05-14 16:27:31.239 <Debug> CHUtil: Set settings key:spark.gluten.sql.columnar.backend.ch.runtime_settings.max_bytes_before_external_sort value:1073741824
2024-05-14 16:27:31.239 <Debug> CHUtil: Set settings key:spark.gluten.sql.columnar.backend.ch.runtime_settings.max_bytes_in_join value:1073741824
2024-05-14 16:27:31.239 <Debug> CHUtil: Set settings key:spark.gluten.sql.columnar.backend.ch.runtime_settings.max_memory_usage value:3221225472
2024-05-14 16:27:31.239 <Debug> CHUtil: Set settings key:spark.gluten.sql.columnar.backend.ch.runtime_settings.max_threads value:1
2024-05-14 16:27:31.239 <Debug> CHUtil: Set settings key:spark.gluten.sql.columnar.backend.ch.runtime_settings.output_format_orc_compression_method value:snappy
2024-05-14 16:27:31.239 <Debug> CHUtil: Set settings key:spark.gluten.sql.columnar.backend.ch.runtime_settings.query_plan_enable_optimizations value:false
2024-05-14 16:27:31.239 <Debug> CHUtil: Set settings key:spark.gluten.sql.columnar.backend.ch.runtime_settings.short_circuit_function_evaluation value:force_enable
2024-05-14 16:27:31.239 <Information> ClickHouseBackend: Init settings.
2024-05-14 16:27:31.239 <Debug> Context: Setting up /data9/hadoop/yarn/local/usercache/xumingyong/appcache/application_1710462985812_739851/container_e57_1710462985812_739851_01_000015/tmp/libch/ to store temporary data in it
2024-05-14 16:27:31.239 <Information> ClickHouseBackend: Init shared context and global context.
2024-05-14 16:27:31.239 <Information> ClickHouseBackend: Apply configuration and setting for global context.
2024-05-14 16:27:31.239 <Warning> SignalHandler: LD_PRELOAD is not set, SignalHandler is disabled
2024-05-14 16:27:31.239 <Information> ClickHouseBackend: Register read buffer builders.
2024-05-14 16:27:31.239 <Information> ClickHouseBackend: Register relation parsers.
2024-05-14 16:27:31.248 <Information> ClickHouseBackend: Register all functions.
2024-05-14 16:27:31.248 <Information> ClickHouseBackend: Register all factories.
2024-05-14 16:27:31.248 <Information> ClickHouseBackend: Init compiled expressions cache factory.
2024/05/14 16:27:31,269 INFO [dispatcher-Executor] ExecutorPluginContainer: Initialized executor component for plugin org.apache.gluten.GlutenPlugin.
2024/05/14 16:27:31,272 INFO [dispatcher-GlutenExecutorEndpoint] GlutenExecutorEndpoint: Initialized GlutenExecutorEndpoint.
2024/05/14 16:27:40,979 ERROR [SIGTERM handler] CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
2024/05/14 16:27:41,080 INFO [shutdown-hook-0] MemoryStore: MemoryStore cleared
2024/05/14 16:27:41,081 INFO [shutdown-hook-0] BlockManager: BlockManager stopped
2024/05/14 16:27:41,098 INFO [shutdown-hook-0] ShutdownHookManager: Shutdown hook called
libc++abi: terminating due to uncaught exception of type Poco::SystemException: System exception

Spark version

Spark-3.3.x

Spark configurations

No response

System information

No response

Relevant logs

No response

@zhanglistar zhanglistar added bug Something isn't working triage labels May 14, 2024
@zhanglistar
Copy link
Contributor Author

(gdb) bt
#0 0x00007fab0a920428 in __GI_raise (sig=sig@entry=6)
at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007fab0a92202a in __GI_abort () at abort.c:89
#2 0x00007fab1e28f346 in abort_message () from /home/yarn/libch.so
#3 0x00007fab1e28f56d in demangling_terminate_handler() ()
from /home/yarn/libch.so
#4 0x00007fab1e28f423 in std::__terminate(void ()()) ()
from /home/yarn/libch.so
#5 0x00007fab1e28eb16 in __cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception
) () from /home/yarn/libch.so
#6 0x00007fab1e28ea8f in __cxa_throw () from /home/yarn/libch.so
#7 0x00007fab118b7a4c in Poco::ScopedLockPoco::FastMutex::ScopedLock(Poco::FastMutex&) () from /home/yarn/libch.so
#8 0x00007fab1d8ab9f9 in Poco::ErrorHandler::handle(Poco::Exception const&) () from /home/yarn/libch.so
#9 0x00007fab1d8aa249 in Poco::ThreadImpl::runnableEntry(void*) ()
from /home/yarn/libch.so
#10 0x00007fab0b0d96ba in start_thread (arg=0x7faac877c700)
at pthread_create.c:333
#11 0x00007fab0a9f241d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

@taiyang-li
Copy link
Contributor

taiyang-li commented May 15, 2024

猜测:yarn kill executor进程时,一些全局静态资源无法按照顺序释放,导致的core

(gdb) bt 
#0  0x00007fd07b010428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fd07b01202a in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fd07968fc15 in __gnu_cxx::__verbose_terminate_handler() () from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#3  0x00007fd07968de36 in __cxxabiv1::__terminate(void (*)()) () from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#4  0x00007fd07968de81 in std::terminate() () from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#5  0x00007fd0796841cf in __cxa_pure_virtual () from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#6  0x00007fd0794a76b0 in outputStream::print_cr(char const*, ...) () from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#7  0x00007fd07965d323 in VMError::report(outputStream*) () from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#8  0x00007fd07965ef1f in VMError::report_and_die() () from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#9  0x00007fd0794a04e5 in JVM_handle_linux_signal () from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#10 0x00007fd079492f48 in signalHandler(int, siginfo*, void*) () from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#11 <signal handler called>
#12 std::__1::__hash_node_base<std::__1::__hash_node<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, Poco::Logger::LoggerEntry>, void*>*>::__hash[abi:v15000]() const (
    this=this@entry=0x0) at ../contrib/llvm-project/libcxx/include/__hash_table:89
#13 std::__1::__hash_table<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, Poco::Logger::LoggerEntry>, std::__1::__unordered_map_hasher<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, Poco::Logger::LoggerEntry>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::__unordered_map_equal<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, Poco::Logger::LoggerEntry>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, Poco::Logger::LoggerEntry> > >::remove (this=0x7fd079f7b3c0, __p=__p@entry=...) at ../contrib/llvm-project/libcxx/include/__hash_table:2465
#14 0x00007fd08df70c52 in std::__1::__hash_table<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, Poco::Logger::LoggerEntry>, std::__1::__unordered_map_hasher<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, Poco::Logger::LoggerEntry>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::__unordered_map_equal<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, Poco::Logger::LoggerEntry>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, Poco::Logger::LoggerEntry> > >::erase (this=0x7fd079f7b3c0, __p=...) at ../contrib/llvm-project/libcxx/include/__hash_table:2403
#15 std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, Poco::Logger::LoggerEntry, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, Poco::Logger::LoggerEntry> > >::erase[abi:v15000](std::__1::__hash_map_iterator<std::__1::__hash_iterator<std::__1::__hash_node<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, Poco::Logger::LoggerEntry>, void*>*> >) (this=0x7fd079f7b3c0, __p=...) at ../contrib/llvm-project/libcxx/include/unordered_map:1358
#16 Poco::(anonymous namespace)::LoggerDeleter::operator() (this=<optimized out>, logger=<optimized out>) at ./build_debug/../base/poco/Foundation/src/Logger.cpp:328
#17 0x00007fd08df70d6a in std::__1::__shared_ptr_pointer<Poco::Logger*, Poco::(anonymous namespace)::LoggerDeleter, std::__1::allocator<Poco::Logger> >::__on_zero_shared (this=<optimized out>)
    at ../contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:263
#18 0x00007fd08948772a in std::__1::__shared_count::__release_shared[abi:v15000]() (this=0x7fd04440f180) at ../contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:174
#19 std::__1::__shared_weak_count::__release_shared[abi:v15000]() (this=0x7fd04440f180) at ../contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:215
#20 std::__1::shared_ptr<Poco::Logger>::~shared_ptr[abi:v15000]() (this=0x7fd044472028) at ../contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:702
#21 DB::IAccessStorage::~IAccessStorage (this=0x7fd044472000) at ../src/Access/IAccessStorage.h:45
#22 DB::MultipleAccessStorage::~MultipleAccessStorage (this=0x7fd044472000) at ./build_debug/../src/Access/MultipleAccessStorage.cpp:43
#23 0x00007fd08941d0a9 in DB::AccessControl::~AccessControl (this=0x7fd066583e78) at ./build_debug/../src/Access/AccessControl.cpp:264
#24 0x00007fd089d088f6 in std::__1::default_delete<DB::AccessControl>::operator()[abi:v15000](DB::AccessControl*) const (__ptr=0x7fd044472000, this=<optimized out>) at ../contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:48
#25 std::__1::unique_ptr<DB::AccessControl, std::__1::default_delete<DB::AccessControl> >::reset[abi:v15000](DB::AccessControl*) (__p=0x0, this=<optimized out>) at ../contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:305
#26 DB::ContextSharedPart::shutdown (this=this@entry=0x7fd043524000) at ./build_debug/../src/Interpreters/Context.cpp:726
#27 0x00007fd089d1a0d6 in DB::ContextSharedPart::~ContextSharedPart (this=this@entry=0x7fd043524000) at ./build_debug/../src/Interpreters/Context.cpp:540
#28 0x00007fd089cda058 in std::__1::default_delete<DB::ContextSharedPart>::operator()[abi:v15000](DB::ContextSharedPart*) const (this=<optimized out>, __ptr=0x7fd043524000) at ../contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:48
#29 std::__1::unique_ptr<DB::ContextSharedPart, std::__1::default_delete<DB::ContextSharedPart> >::reset[abi:v15000](DB::ContextSharedPart*) (this=<optimized out>, __p=0x0) at ../contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:305
#30 std::__1::unique_ptr<DB::ContextSharedPart, std::__1::default_delete<DB::ContextSharedPart> >::~unique_ptr[abi:v15000]() (this=<optimized out>) at ../contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:259
#31 DB::SharedContextHolder::~SharedContextHolder (this=<optimized out>) at ./build_debug/../src/Interpreters/Context.cpp:811
#32 0x00007fd07b01536a in __cxa_finalize () from /lib/x86_64-linux-gnu/libc.so.6
#33 0x00007fd0819dc097 in __do_global_dtors_aux () from ./libch.so
#34 0x00007fd066584550 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants