Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: glfsheal encounter a SIGSEGV in __strftime_internal #4240

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

GeorgeLjz
Copy link

glfsheal encounter a SIGSEGV in __strftime_interna called from afr_mark_split_brain_source_sinks_by_policy

Root cause: mis-compare between the int and unisgned int
Solution: convert the compare between 2 ints

Fixes: #4239
Change-Id: If6a356db60298da39a48c7979abdfbac03521aa7

@gluster-ant
Copy link
Collaborator

Can one of the admins verify this patch?

2 similar comments
@gluster-ant
Copy link
Collaborator

Can one of the admins verify this patch?

@gluster-ant
Copy link
Collaborator

Can one of the admins verify this patch?

@gluster-ant
Copy link
Collaborator

CLANG-FORMAT FAILURE:
Before merging the patch, this diff needs to be considered for passing clang-format

index 1c95411a2..54d9a7186 100644
--- a/xlators/cluster/afr/src/afr-self-heal-common.c
+++ b/xlators/cluster/afr/src/afr-self-heal-common.c
@@ -1299,7 +1299,7 @@ afr_mark_split_brain_source_sinks_by_policy(
                "Invalid child (%d) "
                "selected by policy %s.",
                fav_child, policy_str);
-	return -1;
+        return -1;
     } else if (fav_child >= 0) {
         time = replies[fav_child].poststat.ia_mtime;
         tm_ptr = localtime(&time);

@mohit84
Copy link
Contributor

mohit84 commented Oct 16, 2023

@GeorgeLjz what is the value of fav_child in crash? Can you please share "thread apply all bt full" data from the coredump here?

@GeorgeLjz
Copy link
Author

GeorgeLjz commented Oct 16, 2023

thread apply all bt full

(gdb) thread apply all bt full

Thread 10 (Thread 0x7fe32da6e6c0 (LWP 51741)):
#0  0x00007fe332a15189 in __futex_abstimed_wait_common () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fe332a1a503 in __pthread_clockjoin_ex () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007fe332da8bbf in event_dispatch_epoll (event_pool=0x55b044ce7c10) at event-epoll.c:840
        i = <optimized out>
        t_id = 140613682587328
        pollercount = 2
        ret = 0
        ev_data = <optimized out>
        __FUNCTION__ = "event_dispatch_epoll"
#3  0x00007fe332cf1458 in glfs_poller (data=<optimized out>) at glfs.c:728
        fs = <optimized out>
#4  0x00007fe332a18886 in start_thread () from /lib64/libc.so.6
No symbol table info available.
#5  0x00007fe332a9e6e0 in clone3 () from /lib64/libc.so.6
No symbol table info available.

Thread 9 (Thread 0x7fe32e2746c0 (LWP 51740)):
#0  0x00007fe332a62293 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fe332a66d37 in nanosleep () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007fe332d526df in gf_timer_proc (data=0x55b044cf4d30) at timer.c:194
        now = 1994564030728
        now_ts = {tv_sec = 1994, tv_nsec = 564030728}
        reg = 0x55b044cf4d30
        sleepts = {tv_sec = 1, tv_nsec = 0}
        event = 0x55b044cf5420
        tmp = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        old_THIS = <optimized out>
#3  0x00007fe332a18886 in start_thread () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007fe332a9e6e0 in clone3 () from /lib64/libc.so.6
No symbol table info available.

Thread 8 (Thread 0x7fe32ead46c0 (LWP 51739)):
#0  0x00007fe332a15189 in __futex_abstimed_wait_common () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fe332a17e62 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007fe332d839c8 in syncenv_task (proc=proc@entry=0x55b044ced100) at syncop.c:517
        env = 0x55b044cecd20
        task = 0x0
        sleep_till = {tv_sec = 1695708001, tv_nsec = 0}
        ret = <optimized out>
#3  0x00007fe332d84845 in syncenv_processor (thdata=0x55b044ced100) at syncop.c:584
        env = 0x55b044cecd20
        proc = 0x55b044ced100
        task = <optimized out>
#4  0x00007fe332a18886 in start_thread () from /lib64/libc.so.6
No symbol table info available.
#5  0x00007fe332a9e6e0 in clone3 () from /lib64/libc.so.6
No symbol table info available.

Thread 7 (Thread 0x7fe32f2d56c0 (LWP 51738)):
#0  0x00007fe332a15189 in __futex_abstimed_wait_common () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fe332a17e62 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007fe332d839c8 in syncenv_task (proc=proc@entry=0x55b044cecd20) at syncop.c:517
        env = 0x55b044cecd20
--Type <RET> for more, q to quit, c to continue without paging--
        task = 0x0
        sleep_till = {tv_sec = 1695708001, tv_nsec = 0}
        ret = <optimized out>
#3  0x00007fe332d84845 in syncenv_processor (thdata=0x55b044cecd20) at syncop.c:584
        env = 0x55b044cecd20
        proc = 0x55b044cecd20
        task = <optimized out>
#4  0x00007fe332a18886 in start_thread () from /lib64/libc.so.6
No symbol table info available.
#5  0x00007fe332a9e6e0 in clone3 () from /lib64/libc.so.6
No symbol table info available.

Thread 6 (Thread 0x7fe3307366c0 (LWP 51737)):
#0  0x00007fe332a15189 in __futex_abstimed_wait_common () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fe332a17e62 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007fe332d839c8 in syncenv_task (proc=proc@entry=0x55b044cae790) at syncop.c:517
        env = 0x55b044cae3b0
        task = 0x0
        sleep_till = {tv_sec = 1695708001, tv_nsec = 0}
        ret = <optimized out>
#3  0x00007fe332d84845 in syncenv_processor (thdata=0x55b044cae790) at syncop.c:584
        env = 0x55b044cae3b0
        proc = 0x55b044cae790
        task = <optimized out>
#4  0x00007fe332a18886 in start_thread () from /lib64/libc.so.6
No symbol table info available.
#5  0x00007fe332a9e6e0 in clone3 () from /lib64/libc.so.6
No symbol table info available.

Thread 5 (Thread 0x7fe330f376c0 (LWP 51736)):
--Type <RET> for more, q to quit, c to continue without paging--
#0  0x00007fe332a15189 in __futex_abstimed_wait_common () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fe332a17e62 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007fe332d839c8 in syncenv_task (proc=proc@entry=0x55b044cae3b0) at syncop.c:517
        env = 0x55b044cae3b0
        task = 0x0
        sleep_till = {tv_sec = 1695708001, tv_nsec = 0}
        ret = <optimized out>
#3  0x00007fe332d84845 in syncenv_processor (thdata=0x55b044cae3b0) at syncop.c:584
        env = 0x55b044cae3b0
        proc = 0x55b044cae3b0
        task = <optimized out>
#4  0x00007fe332a18886 in start_thread () from /lib64/libc.so.6
No symbol table info available.
#5  0x00007fe332a9e6e0 in clone3 () from /lib64/libc.so.6
No symbol table info available.

Thread 4 (Thread 0x7fe3323986c0 (LWP 51735)):
#0  0x00007fe332a62293 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fe332a66d37 in nanosleep () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007fe332a66c63 in sleep () from /lib64/libc.so.6
No symbol table info available.
#3  0x00007fe332d6d63b in pool_sweeper (arg=<optimized out>) at mem-pool.c:446
        state = {death_row = {next = 0x0, prev = 0x0}, cold_lists = {0x0 <repeats 1024 times>}, n_cold_lists = 0}
        pool_list = <optimized out>
        next_pl = <optimized out>
        pt_pool = <optimized out>
        i = <optimized out>
        poisoned = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
#4  0x00007fe332a18886 in start_thread () from /lib64/libc.so.6
No symbol table info available.
#5  0x00007fe332a9e6e0 in clone3 () from /lib64/libc.so.6
No symbol table info available.

Thread 3 (Thread 0x7fe32d26d6c0 (LWP 51742)):
#0  0x00007fe332a9eae2 in epoll_wait () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fe332da98f2 in event_dispatch_epoll_worker (data=0x7fe328000bb0) at event-epoll.c:745
        event = {events = 1, data = {ptr = 0x100000003, fd = 3, u32 = 3, u64 = 4294967299}}
        ret = <optimized out>
        ev_data = 0x7fe328000bb0
        event_pool = 0x55b044ce7c10
        myindex = 1
        timetodie = 0
        gen = <optimized out>
        poller_death_notify = {next = 0x0, prev = 0x0}
        slot = 0x0
        tmp = 0x0
        __FUNCTION__ = "event_dispatch_epoll_worker"
        out = <optimized out>
#2  0x00007fe332a18886 in start_thread () from /lib64/libc.so.6
No symbol table info available.
#3  0x00007fe332a9e6e0 in clone3 () from /lib64/libc.so.6
No symbol table info available.

Thread 2 (Thread 0x7fe32c99e6c0 (LWP 51743)):
#0  0x00007fe332a9eae2 in epoll_wait () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fe332da98f2 in event_dispatch_epoll_worker (data=0x7fe328000d50) at event-epoll.c:745
        event = {events = 1, data = {ptr = 0x400000001, fd = 1, u32 = 1, u64 = 17179869185}}
        ret = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        ev_data = 0x7fe328000d50
        event_pool = 0x55b044ce7c10
        myindex = 2
        timetodie = 0
        gen = <optimized out>
        poller_death_notify = {next = 0x0, prev = 0x0}
        slot = 0x0
        tmp = 0x0
        __FUNCTION__ = "event_dispatch_epoll_worker"
        out = <optimized out>
#2  0x00007fe332a18886 in start_thread () from /lib64/libc.so.6
No symbol table info available.
#3  0x00007fe332a9e6e0 in clone3 () from /lib64/libc.so.6
No symbol table info available.

Thread 1 (Thread 0x7fe33239a480 (LWP 51734)):
#0  0x00007fe332a5cf1f in __strftime_internal () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fe332a5f6ed in strftime_l () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007fe32c0cefda in afr_mark_split_brain_source_sinks_by_policy (frame=frame@entry=0x55b044d15b18, this=this@entry=0x7fe32000f470, inode=inode@entry=0x7fe3200483b8, sources=sources@entry=0x7ffea65faaa0 "", sinks=sinks@entry=0x7ffea65faa90 "\001\001", healed_sinks=healed_sinks@entry=0x7ffea65faa80 "\001\001", locked_on=0x7ffea65faab0 "\001\001\336\003&\364\360:", replies=0x7ffea65fa250, type=AFR_METADATA_TRANSACTION) at afr-self-heal-common.c:1294
        priv = 0x7fe32004c9e0
        fav_child = 0
        mtime_str = "2023-09-26 13:29:54", '\000' <repeats 86 times>, "\343v\372\256M\316`\000\000\000\000\000\000\000\000\030\362\377\377\377\377\377\377\000\000\000\000\000\000\000\000\220.\324D\260U\000\000\000\000\000\000\000\000\000\000\250>\321D\260U\000\000P\235_\246\376\177\000\000n\220\2422\343\177\000\000P\235_\246\376\177\000\000\365\324\3262\343\177\000\000\300\235_\246\376\177\000\000"...
        ctime_str = "`\376\004 \343\177\000\000\000\000\000\000\376\177", '\000' <repeats 58 times>, "\360\235_\246\376\177\000\000\000\000\000\000\000\000\000\000\220\006\r,\343\177", '\000' <repeats 18 times>, "\377\027\000\000\000\000\--Type <RET> for more, q to quit, c to continue without paging--
000\000\001\000\000\000\000\000\000\000q\375", '\000' <repeats 15 times>, "\020\000\000\000\000\000\000\016", '\000' <repeats 12 times>, "\020\000\000\b", '\000' <repeats 15 times>, "Rl\022e\000\000\000\000\000)6\300K\314\354\210"...
        policy_str = 0x7fe32c104438 "MTIME"
        tm_ptr = <optimized out>
        time = -8580258564328249088
        __FUNCTION__ = "afr_mark_split_brain_source_sinks_by_policy"
#3  0x00007fe32c0d0968 in afr_mark_split_brain_source_sinks (frame=frame@entry=0x55b044d15b18, this=this@entry=0x7fe32000f470, inode=inode@entry=0x7fe3200483b8, sources=sources@entry=0x7ffea65faaa0 "", sinks=0x7ffea65faa90 "\001\001", healed_sinks=healed_sinks@entry=0x7ffea65faa80 "\001\001", locked_on=0x7ffea65faab0 "\001\001\336\003&\364\360:", replies=0x7ffea65fa250, type=AFR_METADATA_TRANSACTION) at afr-self-heal-common.c:1426
        local = <optimized out>
        priv = 0x7fe32004c9e0
        xdata_req = 0x0
        heal_op = -1
        ret = <optimized out>
        source = -1
#4  0x00007fe32c0ddded in __afr_selfheal_metadata_finalize_source (frame=frame@entry=0x55b044d15b18, this=this@entry=0x7fe32000f470, inode=inode@entry=0x7fe3200483b8, sources=sources@entry=0x7ffea65faaa0 "", sinks=sinks@entry=0x7ffea65faa90 "\001\001", healed_sinks=healed_sinks@entry=0x7ffea65faa80 "\001\001", undid_pending=0x7ffea65faa70 "", locked_on=0x7ffea65faab0 "\001\001\336\003&\364\360:", replies=0x7ffea65fa250) at afr-self-heal-metadata.c:220
        i = 0
        priv = 0x7fe32004c9e0
        srcstat = {ia_flags = 94215557302744, ia_ino = 140613673356280, ia_dev = 140613471436800, ia_rdev = 140613673356156, ia_size = 140731689664769, ia_nlink = 739052289, ia_uid = 32739, ia_gid = 257, ia_blksize = 0, ia_blocks = 140613673355929, ia_atime = 0, ia_mtime = 140613471488480, ia_ctime = 140731689705632, ia_btime = 140731689705568, ia_atime_nsec = 2791284896, ia_mtime_nsec = 32766, ia_ctime_nsec = 2791284832, ia_btime_nsec = 32766, ia_attributes = 140731689705632, ia_attributes_mask = 140731689705648, ia_gfid = "\260\252_\246\376\177\000\000`\240_\246\376\177\000", ia_type = 2791287859, ia_prot = {suid = 0 '\000', sgid = 1 '\001', sticky = 1 '\001', owner = {read = 1 '\001', write = 1 '\001', exec = 1 '\001'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_fuse_nlink = 2791287440, ia_fuse_ctime = 32766}
        source = -1
        sources_count = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        __FUNCTION__ = "__afr_selfheal_metadata_finalize_source"
#5  0x00007fe32c0de723 in __afr_selfheal_metadata_prepare (frame=frame@entry=0x55b044d15b18, this=this@entry=0x7fe32000f470, inode=inode@entry=0x7fe3200483b8, locked_on=locked_on@entry=0x7ffea65faab0 "\001\001\336\003&\364\360:", sources=sources@entry=0x7ffea65faaa0 "", sinks=sinks@entry=0x7ffea65faa90 "\001\001", healed_sinks=<optimized out>, undid_pending=<optimized out>, replies=<optimized out>, pflag=<optimized out>) at afr-self-heal-metadata.c:364
        ret = <optimized out>
        source = <optimized out>
        priv = 0x7fe32004c9e0
        i = <optimized out>
        witness = 0x7ffea65fa180
#6  0x00007fe32c0f71ea in afr_selfheal_locked_metadata_inspect (frame=frame@entry=0x55b044d15b18, this=this@entry=0x7fe32000f470, inode=0x7fe3200483b8, msh=msh@entry=0x7ffea65fab8e, pending=pending@entry=0x7ffea65fac33 "\003\f") at /usr/src/debug/glusterfs-7.0-1.wos10.wf38.x86_64/xlators/cluster/afr/src/afr-common.c:6002
        ret = <optimized out>
        locked_on = <optimized out>
        sources = <optimized out>
        sinks = <optimized out>
        healed_sinks = <optimized out>
        undid_pending = <optimized out>
        locked_replies = <optimized out>
        priv = 0x7fe32004c9e0
#7  0x00007fe32c0f7c06 in afr_selfheal_locked_inspect (frame=frame@entry=0x55b044d15b18, this=this@entry=0x7fe32000f470, gfid=gfid@entry=0x55b044d34e38 "", inode=inode@entry=0x7ffea65fac40, entry_selfheal=entry_selfheal@entry=0x7ffea65fac32, data_selfheal=data_selfheal@entry=0x7ffea65fac30, metadata_selfheal=0x7ffea65fac31, pending=0x7ffea65fac33 "\003\f") at /usr/src/debug/glusterfs-7.0-1.wos10.wf38.x86_64/xlators/cluster/afr/src/afr-common.c:6165
        ret = <optimized out>
        fd = 0x0
        dsh = false
        msh = true
        esh = false
        __FUNCTION__ = "afr_selfheal_locked_inspect"
#8  0x00007fe32c0f7d7e in afr_get_heal_info (frame=frame@entry=0x55b044d14188, this=this@entry=0x7fe32000f470, loc=loc@--Type <RET> for more, q to quit, c to continue without paging--
entry=0x55b044d34e18) at /usr/src/debug/glusterfs-7.0-1.wos10.wf38.x86_64/xlators/cluster/afr/src/afr-common.c:6246
        data_selfheal = false
        metadata_selfheal = false
        entry_selfheal = false
        pending = 3 '\003'
        dict = 0x0
        ret = -1
        op_errno = 12
        inode = 0x7fe3200483b8
        substr = 0x0
        status = 0x0
        heal_frame = 0x55b044d15b18
        heal_local = 0x55b044d3d518
        __FUNCTION__ = "afr_get_heal_info"
#9  0x00007fe32c0aa6ff in afr_handle_heal_xattrs (heal_op=0x55b043e68209 "glusterfs.heal-info", loc=0x55b044d34e18, this=0x7fe32000f470, frame=0x55b044d14188) at afr-inode-read.c:1507
        ret = -1
        data = 0x0
        out = <optimized out>
        ret = <optimized out>
        data = <optimized out>
        out = <optimized out>
        __FUNCTION__ = "afr_handle_heal_xattrs"
        __local = <optimized out>
        __this = <optimized out>
        __op_ret = <optimized out>
        __op_errno = <optimized out>
        fn = <optimized out>
        _parent = <optimized out>
        old_THIS = <optimized out>
#10 afr_getxattr (frame=frame@entry=0x55b044d14188, this=this@entry=0x7fe32000f470, loc=loc@entry=0x7ffea65fbb40, name=name@entry=0x55b043e68209 "glusterfs.heal-info", xdata=xdata@entry=0x0) at afr-inode-read.c:1597
--Type <RET> for more, q to quit, c to continue without paging--
        priv = 0x7fe32004c9e0
        local = 0x55b044d349d8
        children = 0x7fe32004cd40
        i = 0
        op_errno = 0
        ret = -1
        cbk = 0x0
        out = <optimized out>
        __FUNCTION__ = "afr_getxattr"
#11 0x00007fe332d895ba in syncop_getxattr (subvol=subvol@entry=0x7fe32000f470, loc=loc@entry=0x7ffea65fbb40, dict=dict@entry=0x7ffea65fbb38, key=key@entry=0x55b043e68209 "glusterfs.heal-info", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1574
        _new = 0x55b044d14188
        old_THIS = 0x7fe32000f470
        next_xl_fn = 0x7fe32c0a9cd0 <afr_getxattr>
        tmp_cbk = 0x7fe332d81480 <syncop_getxattr_cbk>
        task = 0x0
        frame = 0x55b044cfd888
        args = {op_ret = 0, op_errno = 0, iatt1 = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_mtime = 0, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = '\000' <repeats 15 times>, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_fuse_nlink = 0, ia_fuse_ctime = 0}, iatt2 = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_mtime = 0, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = '\000' <repeats 15 times>, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_fuse_nlink = 0, ia_fuse_ctime = 0}, iatt3 = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_mtime = 0, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_--Type <RET> for more, q to quit, c to continue without paging--
attributes_mask = 0, ia_gfid = '\000' <repeats 15 times>, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_fuse_nlink = 0, ia_fuse_ctime = 0}, xattr = 0x0, statvfs_buf = {f_bsize = 0, f_frsize = 0, f_blocks = 0, f_bfree = 0, f_bavail = 0, f_files = 0, f_ffree = 0, f_favail = 0, f_fsid = 0, f_flag = 0, f_namemax = 0, __f_spare = {0, 0, 0, 0, 0, 0}}, vector = 0x0, count = 0, iobref = 0x0, buffer = 0x0, xdata = 0x0, flock = {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, lease = {cmd = 0, lease_type = NONE, lease_id = '\000' <repeats 15 times>, lease_flags = 0}, dict_out = 0x0, uuid = '\000' <repeats 15 times>, errstr = 0x0, dict = 0x0, lock_dict = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, barrier = {initialized = false, guard = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, cond = {__data = {__wseq = {__value64 = 0, __value32 = {__low = 0, __high = 0}}, __g1_start = {__value64 = 0, __value32 = {__low = 0, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 0, __wrefs = 0, __g_signals = {0, 0}}, __size = '\000' <repeats 47 times>, __align = 0}, waitq = {next = 0x0, prev = 0x0}, count = 0, waitfor = 0}, task = 0x0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, cond = {__data = {__wseq = {__value64 = 0, __value32 = {__low = 0, __high = 0}}, __g1_start = {__value64 = 0, __value32 = {__low = 0, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 0, __wrefs = 0, __g_signals = {0, 0}}, __size = '\000' <repeats 47 times>, __align = 0}, done = 0, entries = {{list = {next = 0x0, prev = 0x0}, {next = 0x0, prev = 0x0}}, d_ino = 0, d_off = 0, d_len = 0, d_type = 0, d_stat = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_mtime = 0, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = '\000' <repeats 15 times>, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_fuse_nlink = 0, ia_fuse_ctime = 0}, dict = 0x0, inode = 0x0, d_name = 0x7ffea65fb660 ""}, offset = 0, locklist = {list = {next = 0x0, prev = 0x0}, flock = {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, client_uid = 0x0, lk_flags = 0}}
        __FUNCTION__ = "syncop_getxattr"
#12 0x000055b043e65786 in glfsh_process_entries (xl=xl@entry=0x7fe320009800, fd=fd@entry=0x55b044d01a88, entries=entries@entry=0x7ffea65fbca0, offset=offset@entry=0x7ffea65fbc98, num_entries=num_entries@entry=0x7ffea65fbf10, glfsh_print_status=0x55b043e65260 <glfsh_print_heal_status>, ignore_dirty=false, mode=GLFSH_MODE_CONTINUE_ON_ERROR) at glfs-heal.c:8--Type <RET> for more, q to quit, c to continue without paging--
08
        entry = 0x7fe31c00baf0
        tmp = 0x7ffea65fbca0
        ret = <optimized out>
        print_status = <optimized out>
        path = 0x0
        gfid = '\000' <repeats 15 times>, "\001"
        this = 0x7fe32000f470
        dict = 0x0
        loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, "\001", pargfid = '\000' <repeats 15 times>}
#13 0x000055b043e66772 in glfsh_crawl_directory (fs=fs@entry=0x55b044c713e0, top_subvol=top_subvol@entry=0x7fe320025330, rootloc=rootloc@entry=0x7ffea65fc060, readdir_xl=readdir_xl@entry=0x7fe320009800, fd=<optimized out>, xattr_req=xattr_req@entry=0x55b044d153d8, num_entries=0x7ffea65fbf10, ignore=false, loc=0x7ffea65fbe50) at glfs-heal.c:896
        ret = 0
        heal_op = 3
        offset = 9223372036854775806
        entries = {{list = {next = 0x7fe31c00b890, prev = 0x7fe31c00baf0}, {next = 0x7fe31c00b890, prev = 0x7fe31c00baf0}}, d_ino = 0, d_off = 0, d_len = 0, d_type = 0, d_stat = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_mtime = 0, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = '\000' <repeats 15 times>, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_fuse_nlink = 0, ia_fuse_ctime = 0}, dict = 0x0, inode = 0x0, d_name = 0x7ffea65fbd78 ""}
        free_entries = true
        mode = <optimized out>
        out = <optimized out>
#14 0x000055b043e66a97 in glfsh_print_pending_heals_type (fs=fs@entry=0x55b044c713e0, top_subvol=top_subvol@entry=0x7fe320025330, rootloc=rootloc@entry=0x7ffea65fc060, xl=xl@entry=0x7fe320009800, heal_op=heal_op@entry=GF_SHD_OP_INDEX_SUMMARY, xattr_req=xattr_req@entry=0x55b044d153d8, vgfid=0x55b043e68337 "glusterfs.xattrop_index_gfid", num_entries=0x7ffea65fbf10) at glfs-heal.c:968
--Type <RET> for more, q to quit, c to continue without paging--
        ret = 0
        dirloc = {path = 0x55b044d142b0 "<gfid:34e10f27-9590-4e64-9c6d-16249fdbce23>", name = 0x0, inode = 0x55b044d12018, parent = 0x0, gfid = "4\341\017'\225\220Nd\234m\026$\237\333\316#", pargfid = '\000' <repeats 15 times>}
        fd = 0x55b044d01a88
        op_errno = 0
        ignore = false
#15 0x000055b043e66cff in glfsh_print_pending_heals (fs=fs@entry=0x55b044c713e0, top_subvol=top_subvol@entry=0x7fe320025330, rootloc=rootloc@entry=0x7ffea65fc060, xl=xl@entry=0x7fe320009800, heal_op=heal_op@entry=GF_SHD_OP_INDEX_SUMMARY, is_parent_replicate=<optimized out>) at glfs-heal.c:1010
        ret = 0
        num_entries = {num_entries = 0, pending_entries = 0, spb_entries = 0, possibly_healing_entries = 0}
        total = {num_entries = 0, pending_entries = 0, spb_entries = 0, possibly_healing_entries = 0}
        xattr_req = 0x55b044d153d8
#16 0x000055b043e66e79 in glfsh_gather_heal_info (fs=fs@entry=0x55b044c713e0, top_subvol=top_subvol@entry=0x7fe320025330, rootloc=rootloc@entry=0x7ffea65fc060, heal_op=heal_op@entry=GF_SHD_OP_INDEX_SUMMARY) at glfs-heal.c:1153
        ret = <optimized out>
        xl = 0x7fe320009800
        heal_xl = 0x7fe32000f470
        old_THIS = 0x7fe332e25360 <global_xlator>
#17 0x000055b043e64372 in main (argc=<optimized out>, argv=<optimized out>) at glfs-heal.c:1750
        fs = 0x55b044c713e0
        ret = <optimized out>
        volname = <optimized out>
        top_subvol = 0x7fe320025330
        rootloc = {path = 0x55b044d15870 "/", name = 0x0, inode = 0x7fe320061038, parent = 0x0, gfid = '\000' <repeats 15 times>, "\001", pargfid = '\000' <repeats 15 times>}
        logfilepath = "/var/log/glusterfs/glfsheal-mstate.log", '\000' <repeats 4057 times>
        hostname = <optimized out>
        path = <optimized out>
        file = <optimized out>
        op_errstr = 0x0
        socket_filepath = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        heal_op = <optimized out>
        log_level = <optimized out>
        flags = <optimized out>
        out = <optimized out>

@GeorgeLjz
Copy link
Author

@GeorgeLjz what is the value of fav_child in crash? Can you please share "thread apply all bt full" data from the coredump here?

I remember that the value of fav_child is something bigger than 80000000, and priv->child_count is 0 when I begin debug this issue, so I suppose I find the root cause for this crash, but after I re-open the coredump file, it seems the value of fav_child became normal with 0, and priv->child_count is 2, so I need re-investigate this issue, sorry for it.

@GeorgeLjz
Copy link
Author

from the latest investigation, the negative value of replies[fav_child].poststat.ia_ctime=-8580258564328249088
lead strftime result with "Segmentation fault"

glfsheal encounter a SIGSEGV in __strftime_interna called from afr_mark_split_brain_source_sinks_by_policy

Root cause: ctime is negative
Solution: change ctime to 0 when ctime is negative before strftime

Fixes: gluster#4239
Change-Id: If6a356db60298da39a48c7979abdfbac03521aa7
} else if (fav_child >= 0) {
time = replies[fav_child].poststat.ia_mtime;
if (time < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of putting a check on time value we should try to figure out why ia_mtime < 0, The glfsheal you are using is old(release-7) and the latest code is completely changed so i am not sure it is a right way to fix the issue.

@pranithk @karthik-us Can you please share your view on the same.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

figure out why ia_mtime or ia_ctime <0 maybe not so easy, but when one of the values is negative, it will make "Segmentation fault" with strftime call, and in the latest glusterfs code, it also has not the protection for it, so I suggest the protection maybe a work-around for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

glfsheal coredump occasionally
3 participants