Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socksqlmasterswings test crashes: bad logic in cleaning early replayed sessions #2011

Open
adizaimi opened this issue Jan 10, 2020 · 0 comments

Comments

@adizaimi
Copy link

adizaimi commented Jan 10, 2020

from master tests last night at http://comdb2.s3-website-us-east-1.amazonaws.com/tests/d9e884b9/detail.txt

Core was generated by `/comdb2/build/db/comdb2 socksqlmasterswings97153 --no-global-lrl --lrl /dedicat'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f90692cf428 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
54	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f905e44f700 (LWP 1813))]
(gdb) #0  0x00007f90692cf428 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f90692d102a in __GI_abort () at abort.c:89
#2  0x00000000004f6c0a in _destroy_session (phase=0, prq=0x1ba7cd0)
    at /comdb2/db/osqlsession.c:138
#3  osql_close_session (psess=psess@entry=0x1ba7cd0, 
    is_linked=is_linked@entry=0, 
    func=func@entry=0x8e0f20 <__func__.45533> "osql_sess_set_terminate", 
    callfunc=callfunc@entry=0x0, line=line@entry=707)
    at /comdb2/db/osqlsession.c:110
#4  0x00000000004e2c9c in osql_bplog_free (iq=0x1a8be88, 
    are_sessions_linked=are_sessions_linked@entry=0, 
    func=func@entry=0x8e0f20 <__func__.45533> "osql_sess_set_terminate", 
    callfunc=callfunc@entry=0x0, line=line@entry=707)
    at /comdb2/db/osqlblockproc.c:507
#5  0x00000000004f7ef1 in osql_sess_set_terminate (sess=0x16c9728)
    at /comdb2/db/osqlsession.c:707
#6  osql_sess_try_terminate (sess=sess@entry=0x16c9728)
    at /comdb2/db/osqlsession.c:730
#7  0x00000000004f5ecd in osql_repository_add (sess=0x16b6e70, 
    replaced=replaced@entry=0x7f905e44e724) at /comdb2/db/osqlrepository.c:168
#8  0x00000000004ee6cd in sorese_rcvreq (
    fromhost=0x10bb780 "m1.d9e884b9_primary", dtap=<optimized out>, 
    dtalen=<optimized out>, type=2, nettype=<optimized out>)
    at /comdb2/db/osqlcomm.c:7398
#9  0x00000000004eee21 in net_sosql_req (hndl=<optimized out>, 
    uptr=<optimized out>, fromhost=<optimized out>, usertype=<optimized out>, 
    dtap=<optimized out>, dtalen=<optimized out>, is_tcp=1 '\001')
    at /comdb2/db/osqlcomm.c:6037
#10 0x00000000008182ca in process_user_message (host_node_ptr=0x7f906afc1c98, 
    netinfo_ptr=0x7f906af80048) at /comdb2/net/net.c:3656
#11 reader_thread (arg=<optimized out>) at /comdb2/net/net.c:4406
#12 0x00007f906966b6ba in start_thread (arg=0x7f905e44f700)
    at pthread_create.c:333
#13 0x00007f90693a141d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Coming from osql_close_session():

106         while (ATOMIC_LOAD32(sess->clients) > 0) {
107             poll(NULL, 0, 10);
108         }
109 
110         _destroy_session(psess, 0);                                                                                                                                       
111     }

Looks like a defective call flow:
osql_sess_set_terminate() is called with sess->mtx and sess->completed_lock held
it calls osql_bplog_free() which in turn calls
osql_close_session() which then destroys the same sess->mtx and sess->completed_lock

The bug occurs very rarely because the code path is only hit if we are trying to do a osql_repository_add() for a sess which already exists in theosql hash due to early replay.

@adizaimi adizaimi changed the title socksqlmasterswings test crashes possibly bad accounting with sess->clients or need_clean socksqlmasterswings test crashes: bad logic in cleaning early replayed sessions Jan 10, 2020
adizaimi added a commit that referenced this issue Jan 15, 2020
Unlock mtx before calling osql_bplog_free() to avoid crash. Before this PR, we destroy the session with the mutexes held via callstack: osql_bplog_free() -> osql_close_session() -> _destroy_session().
fixes PR #2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant