Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when we are doing logcabin unit test #196

Open
jujing opened this issue Oct 12, 2015 · 1 comment
Open

Deadlock when we are doing logcabin unit test #196

jujing opened this issue Oct 12, 2015 · 1 comment
Labels

Comments

@jujing
Copy link

jujing commented Oct 12, 2015

Recently, when we are doing logcabin unit test, we found a deadlock in the child process.
I think it is one problem likes issue #121


(gdb) bt
#0  0x00002accba7dd294 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00002accba7d8619 in _L_lock_1008 () from /lib64/libpthread.so.0
#2  0x00002accba7d842e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x000000000060c7ce in __gthread_mutex_lock (__mutex=0x2accbb2d1080 )
    at /usr/local/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/x86_64-unknown-linux-gnu/bits/gthr-default.h:769
#4  0x000000000060ef57 in std::mutex::lock (this=0x2accbb2d1080 ) at /usr/local/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/mutex:169
#5  0x000000000060f6a1 in std::lock_guard::lock_guard (this=0x7fffc2784c90, __m=...) at /usr/local/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/mutex:448
#6  0x00002accbb04c843 in LogCabin::Core::ThreadId::Internal::assign () at /home/sg/gmdb/dmdb/build/obj.gmdb/logcabin/build/Core/ThreadId.cc:57
#7  0x00002accbb04c8b6 in LogCabin::Core::ThreadId::getId () at /home/sg/gmdb/dmdb/build/obj.gmdb/logcabin/build/Core/ThreadId.cc:75
#8  0x00002accbb04c987 in LogCabin::Core::ThreadId::getName () at /home/sg/gmdb/dmdb/build/obj.gmdb/logcabin/build/Core/ThreadId.cc:95
#9  0x00002accbb0447a5 in LogCabin::Core::Debug::log (level=LogCabin::Core::Debug::ERROR, fileName=0x2accbb6df388 "/home/sg/gmdb/dmdb/build/obj.gmdb/logcabin/build/Storage/SegmentedLog.cc", lineNum=929, 
    functionName=0x2accbb6e0b50  "loadClosedSegment", 
    format=0x2accbb6dfd30 "Segment version read from %s was %u, but this code can only read version 1 Exiting...\n") at /home/sg/gmdb/dmdb/build/obj.gmdb/logcabin/build/Core/Debug.cc:497
#10 0x00002accbb6bed40 in LogCabin::Storage::SegmentedLog::loadClosedSegment (this=0x2accc00015b0, segment=..., logStartIndex=5000) at /home/sg/gmdb/dmdb/build/obj.gmdb/logcabin/build/Storage/SegmentedLog.cc:926
#11 0x00000000007e3aa1 in LogCabin::Storage::(anonymous namespace)::StorageSegmentedLogTest_loadClosedSegment_unknownVersion_Test::TestBody (this=0x2accc00008d0)
    at /home/sg/gmdb/dmdb/test/ut/external_storage/logcabin/Storage/SegmentedLogTest.cc:943
#12 0x00000000008352f6 in testing::HandleSehExceptionsInMethodIfSupported (object=0x2accc00008d0, method=&virtual testing::Test::TestBody(), location=0x896c4b "the test body")
    at /home/sg/gmdb/dmdb/test/gtest/src/gtest.cc:2075
#13 0x0000000000834794 in testing::HandleExceptionsInMethodIfSupported (object=0x2accc00008d0, method=&virtual testing::Test::TestBody(), location=0x896c4b "the test body")
    at /home/sg/gmdb/dmdb/test/gtest/src/gtest.cc:2111
#14 0x0000000000826232 in testing::Test::Run (this=0x2accc00008d0) at /home/sg/gmdb/dmdb/test/gtest/src/gtest.cc:2145
#15 0x0000000000826a25 in testing::TestInfo::Run (this=0xba4d90) at /home/sg/gmdb/dmdb/test/gtest/src/gtest.cc:2318
#16 0x000000000082711d in testing::TestCase::Run (this=0xb972a0) at /home/sg/gmdb/dmdb/test/gtest/src/gtest.cc:2423
#17 0x000000000082ce01 in testing::internal::UnitTestImpl::RunAllTests (this=0xba1590) at /home/sg/gmdb/dmdb/test/gtest/src/gtest.cc:4199
#18 0x0000000000835482 in testing::HandleSehExceptionsInMethodIfSupported (object=0xba1590, method=
    (bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x82cb5e , location=0x8976d0 "auxiliary test code (environments or event listeners)")
    at /home/sg/gmdb/dmdb/test/gtest/src/gtest.cc:2075
#19 0x0000000000834dbe in testing::HandleExceptionsInMethodIfSupported (object=0xba1590, method=
    (bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x82cb5e , location=0x8976d0 "auxiliary test code (environments or event listeners)")
    at /home/sg/gmdb/dmdb/test/gtest/src/gtest.cc:2111
#20 0x000000000082b841 in testing::UnitTest::Run (this=0xb95220 ) at /home/sg/gmdb/dmdb/test/gtest/src/gtest.cc:3838
#21 0x000000000081f3e0 in main (argc=1, argv=0x7fffc2785638) at /home/sg/gmdb/dmdb/test/ut/external_storage/logcabin/test/TestRunner.cc:141

I wonder whether we can remove the lock in the ThreadId::setName and ThreadId::getName to reduce the probability of deadlock when forking a child process.
Because the debug log call ThreadId::getName, which make it lock frequently, and it is easy to cause deadlock when forking a child process.
Whether we can change likes following:


void
setName(const std::string& name)
{
    uint64_t id = getId();
    Internal::threadNames[id] = name;
}

std::string
getName()
{
    uint64_t id = getId();
    std::string name = Internal::threadNames[id];
    if (name.empty()) {
        name = StringUtil::format("thread %lu", id);
    }
    return name;
}
@ongardie
Copy link
Member

Nice catch.

Removing that lock has its downsides, as then it'd be possible for the threadNames map to become corrupt if there was concurrent modification. We can use pthread_atfork in a manner similar to nhardt@f355a64 to grab the lock before the fork and release it in both parent and child after.

BTW, I suggest using "thread apply all bt" in gdb when you're looking at potential deadlocks, so that you can get a more global view of what the threads are doing.

@ongardie ongardie added the bug label Oct 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants