Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on sending large message #796

Open
spaghetti2514 opened this issue Nov 30, 2017 · 47 comments
Open

Segfault on sending large message #796

spaghetti2514 opened this issue Nov 30, 2017 · 47 comments

Comments

@spaghetti2514
Copy link

When I try to print the entire christian bible to a channel (with flood protection disabled on the server and client), irssi segfaults part of the way through.

If I use /exec -out cat bible.txt to print the bible, irssi segfaults. If I use a different command that breaks the file into ~10 chunks and sleeps for two seconds between printing chunks, the entire thing is successfully sent.

Steps to reproduce:

  • run the above command with a file of approximately the same size as the entire bible.

Expected behavior:

  • Entire bible is sent to channel
  • irssi does not segfault

Actual behavior:

  • Part of the bible is sent to channel
  • irssi segfaults

I am using irssi 1.0.5-1 (20171020 1715) from the debian testing repositories, but this issue has persisted across several versions.

Please let me know if there is any additional information that would be useful for me to provide.

@dequis
Copy link
Member

dequis commented Nov 30, 2017

Works for me using ftp://ftp.cs.princeton.edu/pub/cs226/textfiles/bible.txt (3.9mb)

Get a backtrace https://wiki.debian.org/HowToGetABacktrace

@spaghetti2514
Copy link
Author

@dequis

#0  flood_newmsg (server=0x555555b96a10, level=4, nick=0x5555559b6c41 "iled", host=0x5555559b6c4a "", target=0x555556f25600 "#test") at flood.c:196
#1  0x00005555555e053f in flood_privmsg (server=0x555555b96a10, data=<optimized out>, nick=0x5555559b6c41 "iled", addr=0x5555559b6c4a "") at flood.c:255
#2  0x00005555555fa0b6 in signal_emit_real (rec=rec@entry=0x555555870350, params=params@entry=4, va=va@entry=0x7fffffffe1d0, first_hook=<optimized out>) at signals.c:242
#3  0x00005555555fa60d in signal_emit (signal=<optimized out>, params=params@entry=4) at signals.c:286
#4  0x00005555555c0858 in irc_server_event (server=0x555555b96a10, line=0x5555559b6c6c "", nick=0x5555559b6c41 "iled", address=0x5555559b6c4a "") at irc.c:308
#5  0x00005555555fa0b6 in signal_emit_real (rec=rec@entry=0x5555558743d0, params=params@entry=4, va=va@entry=0x7fffffffe390, first_hook=<optimized out>) at signals.c:242
#6  0x00005555555fa6e5 in signal_emit_id (signal_id=<optimized out>, params=4) at signals.c:304
#7  0x00005555555fa0b6 in signal_emit_real (rec=rec@entry=0x55555586ed00, params=params@entry=2, va=va@entry=0x7fffffffe500, first_hook=<optimized out>) at signals.c:242
#8  0x00005555555fa6e5 in signal_emit_id (signal_id=<optimized out>, params=params@entry=2) at signals.c:304
#9  0x00005555555c09b5 in irc_parse_incoming (server=0x555555b96a10) at irc.c:383
#10 0x00005555555eb84d in irssi_io_invoke (source=<optimized out>, condition=<optimized out>, data=<optimized out>) at misc.c:55
#11 0x00007ffff6ba8dd5 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#12 0x00007ffff6ba91a0 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#13 0x00007ffff6ba922c in g_main_context_iteration () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#14 0x000055555557701f in main (argc=<optimized out>, argv=<optimized out>) at irssi.c:335

Got to 1 Kings 11:43 before the crash

@dequis
Copy link
Member

dequis commented Nov 30, 2017

What quit message do other people see in the channel?

Does this also happen with a clean profile? irssi --home=~/.irssi-test/

@spaghetti2514
Copy link
Author

What quit message do other people see in the channel?

A few seconds after the messages stop, irssi segfaults and -!- test [test@test.host] has quit [Connection closed] is sent to the channel

Does this also happen with a clean profile?

With a completely clean profile, flood protection stops me from trying this. After turning off flood protection, I send the entire thing, but only so much of it actually shows up to others in the channel (same as was happening before) but a few seconds later instead of segfaulting, the connection is just closed. Irssi says "Connection lost" and the channel displays the same quit message that it was when segfaulting occurred (has quit [Connection closed]).

I'm surprised the behavior changed, since the only real changes to my config besides turning off flood protection are colorschemes and aliases.

@spaghetti2514
Copy link
Author

Okay this is interesting. For reference, I'm testing this with a "test" client where I actually send the bible text from, and another client (the "viewing" client) joined to the same channel where I observe externally.

When I send the bible text in the test client, I see it all get sent to the channel within a second or two.
From the viewing client, I see the text streaming in rapidly for several seconds, before abruptly halting at some point well before the end of the text.

At this point, I usually wait for a few seconds and then the client segfaults or gets disconnected (depending on config), but I just realized that this always coincides with someone else in the channel saying or doing something that gets sent to the test client. In the case of the original config where segfaulting occurs, anyone saying anything in the channel or quitting the channel after the text has halted triggers the segfault. With the fresh config, it triggers a disconnect.

@spaghetti2514
Copy link
Author

Here's another backtrace

#0  flood_newmsg (server=0x555555b96c40, level=4, nick=0x5555559b6c41 "", host=0x5555559b6c4b "table_lookup: assertion 'hash_table != NULL' failed", 
    target=0x555555d6bf80 "#test") at flood.c:196
#1  0x00005555555e053f in flood_privmsg (server=0x555555b96c40, data=<optimized out>, nick=0x5555559b6c41 "", 
    addr=0x5555559b6c4b "table_lookup: assertion 'hash_table != NULL' failed") at flood.c:255
#2  0x00005555555fa0b6 in signal_emit_real (rec=rec@entry=0x555555870350, params=params@entry=4, va=va@entry=0x7fffffffe1d0, first_hook=<optimized out>) at signals.c:242
#3  0x00005555555fa60d in signal_emit (signal=<optimized out>, params=params@entry=4) at signals.c:286
#4  0x00005555555c0858 in irc_server_event (server=0x555555b96c40, line=0x5555559b6c6b "ble != NULL' failed", nick=0x5555559b6c41 "", 
    address=0x5555559b6c4b "table_lookup: assertion 'hash_table != NULL' failed") at irc.c:308
#5  0x00005555555fa0b6 in signal_emit_real (rec=rec@entry=0x5555558743d0, params=params@entry=4, va=va@entry=0x7fffffffe390, first_hook=<optimized out>) at signals.c:242
#6  0x00005555555fa6e5 in signal_emit_id (signal_id=<optimized out>, params=4) at signals.c:304
#7  0x00005555555fa0b6 in signal_emit_real (rec=rec@entry=0x55555586ed00, params=params@entry=2, va=va@entry=0x7fffffffe500, first_hook=<optimized out>) at signals.c:242
#8  0x00005555555fa6e5 in signal_emit_id (signal_id=<optimized out>, params=params@entry=2) at signals.c:304
#9  0x00005555555c09b5 in irc_parse_incoming (server=0x555555b96c40) at irc.c:383
#10 0x00005555555eb84d in irssi_io_invoke (source=<optimized out>, condition=<optimized out>, data=<optimized out>) at misc.c:55
#11 0x00007ffff6ba8dd5 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#12 0x00007ffff6ba91a0 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#13 0x00007ffff6ba922c in g_main_context_iteration () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#14 0x000055555557701f in main (argc=<optimized out>, argv=<optimized out>) at irssi.c:335

@ailin-nemui
Copy link
Contributor

are there any scripts loaded which could explain the difference between the configs, or any other config differences?

which client crashes, the one who sends the bible?

@spaghetti2514
Copy link
Author

are there any scripts loaded which could explain the difference between the configs, or any other config differences?

The only scripts I have loaded are a few simple things that register a command. Nothing that listens to all lines, incoming or outgoing, like I would expect something to do if it causes a segfault from high volume text. The only config differences are also very minor - just a few aliases an a colorscheme. When I have time later I'll try enabling things in my normal config bit by bit in a fresh config and see at what point it starts crashing.

which client crashes, the one who sends the bible?

Yes

@ailin-nemui
Copy link
Contributor

ailin-nemui commented Dec 5, 2017

the odd thing in your backtrace is that it crashes in some incoming message, so it should be unrelated to your sending of the bible. (but maybe that sending causes some memory corruption somewhere? just random speculation from me..)

are you trying it with the bible dequis linked?

do you have the possibility to get a wireshark capture of the network traffic before it crashes?

#4  0x00005555555c0858 in irc_server_event (server=0x555555b96c40, line=0x5555559b6c6b "ble != NULL' failed", nick=0x5555559b6c41 "", 
    address=0x5555559b6c4b "table_lookup: assertion 'hash_table != NULL' failed") at irc.c:308

already does not look like a real line that should be coming in from the IRC server

if you go to the first frame (f 0) and p *server, is the server record intact? you can also try to show us a bt full

@spaghetti2514
Copy link
Author

the odd thing in your backtrace is that it crashes in some incoming message

Yeah, it's always the first incoming message after the bible text stops getting sent.

are you trying it with the bible dequis linked?

I don't think they're exactly the same bible, but they are nearly the same size and it's all just normal ascii text.

if you go to the first frame (f 0) and p *server, is the server record intact?

(gdb) p *server
$1 = {type = 716, chat_type = 66, refcount = 1, connrec = 0x555555ab3680, connect_time = 1512491572, real_connect_time = 1512491574, tag = 0x555555c86b30 "test", 
  nick = 0x555555ccdde0 "test", connected = 1, disconnected = 1, connection_lost = 1, session_reconnect = 0, no_reconnect = 0, handle = 0x0, readtag = -1, connect_pipe = {0x0, 0x0}, 
  connect_tag = -1, connect_pid = 2281, rawlog = 0x555555c858e0, module_data = 0x555555a6a060, version = 0x555555ab4570 "InspIRCd-2.2", away_reason = 0x0, last_invite = 0x0, 
  server_operator = 1, usermode_away = 0, banned = 0, dns_error = 0, lag_sent = {tv_sec = 0, tv_usec = 0}, lag_last_check = 1512491575, lag = 132, channels = 0x0, queries = 0x0, 
  channels_join = 0x5555555c1980 <irc_channels_join>, isnickflag = 0x5555555c8670 <isnickflag_func>, ischannel = 0x5555555c9410 <ischannel_func>, 
  get_nick_flags = 0x5555555c6db0 <get_nick_flags>, send_message = 0x5555555c90e0 <send_message>, split_message = 0x5555555c8950 <split_message>, 
  channel_find_func = 0x5555555c1740 <irc_channel_find_server>, query_find_func = 0x5555555c8490 <irc_query_find>, mask_match_func = 0x0, nick_match_msg = 0x0, redirects = 0x0, 
  redirect_queue = 0x0, redirect_next = 0x0, redirect_active = 0x0, last_nick = 0x555555d6bcc0 "+i", real_address = 0x0, usermode = 0x0, wanted_usermode = 0x0, userhost = 0x0, 
  channels_formed = 30, whois_found = 0, whowas_found = 0, emode_known = 1, no_multi_mode = 0, no_multi_who = 0, one_endofwho = 0, disable_lag = 0, nick_collision = 0, motd_got = 1, 
  isupport_sent = 1, cap_complete = 1, sasl_success = 0, max_kicks_in_cmd = 1, max_modes_in_cmd = 20, max_whois_in_cmd = 4, max_msgs_in_cmd = 20, cap_supported = 0x0, 
  cap_active = 0x0, cap_queue = 0x0, sasl_buffer = 0x0, sasl_timeout = 0, cmdcount = 57070, cmdqueue = 0x0, wait_cmd = {tv_sec = 0, tv_usec = 346171}, last_cmd = {
    tv_sec = 1512491613, tv_usec = 92250}, max_cmds_at_once = 5, cmd_queue_speed = 0, max_query_chans = 1, idles = 0x0, ctcpqueue = 0x0, knockoutlist = 0x0, splits = 0x0, 
  split_servers = 0x0, rejoin_channels = 0x0, chanqueries = 0x0, isupport = 0x0, modes = {{func = 0x0, prefix = 0 '\000'} <repeats 65 times>, {func = 0x5555555cee90 <modes_type_d>, 
      prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x5555555cee90 <modes_type_d>, prefix = 0 '\000'}, {func = 0x0, 
      prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x5555555ce3f0 <modes_type_a>, 
      prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x5555555cee90 <modes_type_d>, 
      prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x5555555cee90 <modes_type_d>, prefix = 0 '\000'}, {func = 0x0, 
      prefix = 0 '\000'}, {func = 0x5555555cee90 <modes_type_d>, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, 
      prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x5555555cebb0 <modes_type_prefix>, 
      prefix = 33 '!'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, 
      prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x5555555ce3f0 <modes_type_a>, 
      prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x5555555ce3f0 <modes_type_a>, prefix = 0 '\000'}, {func = 0x0, 
      prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x5555555cebb0 <modes_type_prefix>, prefix = 37 '%'}, {func = 0x5555555cee90 <modes_type_d>, prefix = 0 '\000'}, {
      func = 0x0, prefix = 0 '\000'}, {func = 0x5555555ce950 <modes_type_b>, prefix = 0 '\000'}, {func = 0x5555555ce8d0 <modes_type_c>, prefix = 0 '\000'}, {
      func = 0x5555555cee90 <modes_type_d>, prefix = 0 '\000'}, {func = 0x5555555cee90 <modes_type_d>, prefix = 0 '\000'}, {func = 0x5555555cebb0 <modes_type_prefix>, 
      prefix = 64 '@'}, {func = 0x5555555cee90 <modes_type_d>, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x5555555cee90 <modes_type_d>, prefix = 0 '\000'}, {
      func = 0x5555555cee90 <modes_type_d>, prefix = 0 '\000'}, {func = 0x5555555cee90 <modes_type_d>, prefix = 0 '\000'}, {func = 0x5555555cee90 <modes_type_d>, prefix = 0 '\000'}, 
    {func = 0x5555555cebb0 <modes_type_prefix>, prefix = 43 '+'}, {func = 0x5555555ce3f0 <modes_type_a>, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'}, {func = 0x0, 
      prefix = 0 '\000'}, {func = 0x5555555cee90 <modes_type_d>, prefix = 0 '\000'}, {func = 0x0, prefix = 0 '\000'} <repeats 133 times>}, 
  prefix = '\000' <repeats 33 times>, "Y\000\000\000h\000\000\000\000\000v", '\000' <repeats 20 times>, "o", '\000' <repeats 190 times>, 
  nick_comp_func = 0x5555555c7ec0 <irc_nickcmp_rfc1459>}

you can also try to show us a bt full

(gdb) bt full
#0  flood_newmsg (server=0x555555b96ae0, level=4, nick=0x5555559b6c81 "le_lookup: assertion 'hash_table != NULL' failed", 
    host=0x5555559b6c8c "assertion 'hash_table != NULL' failed", target=0x555556b0d0c0 "#test") at flood.c:196
        mserver = <optimized out>
        flood = <optimized out>
        rec = <optimized out>
        now = <optimized out>
        ttime = <optimized out>
        times = <optimized out>
        tnext = <optimized out>
        __func__ = "flood_newmsg"
#1  0x00005555555e053f in flood_privmsg (server=0x555555b96ae0, data=<optimized out>, nick=0x5555559b6c81 "le_lookup: assertion 'hash_table != NULL' failed", 
    addr=0x5555559b6c8c "assertion 'hash_table != NULL' failed") at flood.c:255
        params = 0x555556b0d0c0 "#test"
        target = 0x555556b0d0c0 "#test"
        text = 0x555556b0d0cc "no one will see this"
        level = 4
        nick = 0x5555559b6c81 "le_lookup: assertion 'hash_table != NULL' failed"
        server = 0x555555b96ae0
        data = <optimized out>
        addr = 0x5555559b6c8c "assertion 'hash_table != NULL' failed"
#2  0x00005555555fa0b6 in signal_emit_real (rec=rec@entry=0x555555870350, params=params@entry=4, va=va@entry=0x7fffffffe1f0, first_hook=<optimized out>) at signals.c:242
        arglist = {0x555555b96ae0, 0x555555d7c0ce, 0x5555559b6c81, 0x5555559b6c8c, 0x0, 0x0}
        prev_emitted_signal = 0x5555558743d0
        hook = 0x555555877000
        prev_emitted_hook = 0x55555586e470
        i = <optimized out>
        stopped = 0
        stop_emit_count = 0
        continue_emit_count = 0
        __func__ = "signal_emit_real"
#3  0x00005555555fa60d in signal_emit (signal=<optimized out>, params=params@entry=4) at signals.c:286
        rec = 0x555555870350
        va = {{gp_offset = 48, fp_offset = 48, overflow_arg_area = 0x7fffffffe2e0, reg_save_area = 0x7fffffffe210}}
        signal_id = <optimized out>
        __func__ = "signal_emit"
#4  0x00005555555c0858 in irc_server_event (server=0x555555b96ae0, line=0x5555559b6cb5 "[", nick=0x5555559b6c81 "le_lookup: assertion 'hash_table != NULL' failed", 
    address=0x5555559b6c8c "assertion 'hash_table != NULL' failed") at irc.c:308
        signal = <optimized out>
        event = 0x555555d7c0c0 "event privmsg"
        args = <optimized out>
        __func__ = "irc_server_event"
#5  0x00005555555fa0b6 in signal_emit_real (rec=rec@entry=0x5555558743d0, params=params@entry=4, va=va@entry=0x7fffffffe3b0, first_hook=<optimized out>) at signals.c:242
        arglist = {0x555555b96ae0, 0x5555559b6cb5, 0x5555559b6c81, 0x5555559b6c8c, 0x0, 0x0}
        prev_emitted_signal = 0x55555586ed00
        hook = 0x55555586e470
        prev_emitted_hook = 0x55555586ed30
        i = <optimized out>
        stopped = 0
        stop_emit_count = 0
        continue_emit_count = 0
        __func__ = "signal_emit_real"
#6  0x00005555555fa6e5 in signal_emit_id (signal_id=<optimized out>, params=4) at signals.c:304
        rec = 0x5555558743d0
        va = {{gp_offset = 48, fp_offset = 48, overflow_arg_area = 0x7fffffffe4a0, reg_save_area = 0x7fffffffe3d0}}
        __func__ = "signal_emit_id"
#7  0x00005555555fa0b6 in signal_emit_real (rec=rec@entry=0x55555586ed00, params=params@entry=2, va=va@entry=0x7fffffffe520, first_hook=<optimized out>) at signals.c:242
        arglist = {0x555555b96ae0, 0x5555559b6c80, 0x0, 0x0, 0x0, 0x0}
        prev_emitted_signal = 0x0
        hook = 0x55555586ed30
        prev_emitted_hook = 0x0
        i = <optimized out>
        stopped = 0
        stop_emit_count = 0
        continue_emit_count = 0
        __func__ = "signal_emit_real"
#8  0x00005555555fa6e5 in signal_emit_id (signal_id=<optimized out>, params=params@entry=2) at signals.c:304
        rec = 0x55555586ed00
        va = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x7fffffffe610, reg_save_area = 0x7fffffffe540}}
        __func__ = "signal_emit_id"
#9  0x00005555555c09b5 in irc_parse_incoming (server=0x555555b96ae0) at irc.c:383
        str = 0x5555559b6c80 "ble_lookup: assertion 'hash_table != NULL' failed"
        count = 0
        ret = 0
        server = 0x555555b96ae0
        count = 0
        ret = 0
#10 0x00005555555eb84d in irssi_io_invoke (source=<optimized out>, condition=<optimized out>, data=<optimized out>) at misc.c:55
        rec = <optimized out>
        icond = <optimized out>
#11 0x00007ffff6ba8dd5 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
No symbol table info available.
#12 0x00007ffff6ba91a0 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
No symbol table info available.
#13 0x00007ffff6ba922c in g_main_context_iteration () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
No symbol table info available.
#14 0x000055555557701f in main (argc=<optimized out>, argv=<optimized out>) at irssi.c:335
        version = 0
        options = {{long_name = 0x55555560ef4e "version", short_name = 118 'v', flags = 0, arg = G_OPTION_ARG_NONE, arg_data = 0x555555854a60 <version>, 
            description = 0x55555560c515 "Display irssi version", arg_description = 0x0}, {long_name = 0x0, short_name = 0 '\000', flags = 0, arg = G_OPTION_ARG_NONE, 
            arg_data = 0x0, description = 0x0, arg_description = 0x0}}
        loglev = 5

@spaghetti2514
Copy link
Author

Oh, here's a new development.

I only crash when I send the text to a channel that has other people in it. What seems to be happening is that while the text is flooding into the channel and my client is busy sending it, some other client in the channel can't keep up and crashes, and the message from the server about their disconnect gets sent to me and my bible flood immediately stops. The next message to me then segfaults.

If I do this in an empty channel or with other users I know won't say anything or crash for the duration of the flood, I do not crash and the entire text is sent.

@ailin-nemui
Copy link
Contributor

I can't really track it down even with all the information you provided. are you comfortable with compiling irssi? you could apply this patch https://github.com/ailin-nemui/irssi/commit/4e56675d2fea8aed59d91875f317ec228c94da85.patch and record a signal emission trace like so: irssi 2>signal_trace.log

@spaghetti2514
Copy link
Author

@ailin-nemui Sure, give me a few minutes

@spaghetti2514
Copy link
Author

Okay so with the version of irssi I compiled with the patch, it never crashes. I have no idea why.

Also the signal_trace.log file is 645MB.

@ailin-nemui
Copy link
Contributor

well.. thanks for trying.. meh. I wonder if you could run it under valgrind memory checker (without the signal trace patch); but maybe it's also be too slow to make it crash. or with some other methods @josephbisch do you have some suggestion how to best triage this issue?

@josephbisch
Copy link
Member

I second @ailin-nemui suggestion about trying to capture network traffic if possible with Wireshark to see what the server is sending to the crashing client.

Then we might be able to create a "fake" server that sends the same commands with netcat to avoid needing a real server and client(s) setup to reproduce the issue on our end (hopefully).

@ailin-nemui
Copy link
Contributor

f wireshark is too hard, can you try to /connect -rawlog file and if we're lucky it captures/writes enough before it crashes

@spaghetti2514
Copy link
Author

So irssi doesn't crash either with the signal tracing patch or when running under valgrind. I guess both make it too slow?

I don't think wireshark is feasible. The server I run irssi on has many users causing a lot of network traffic (including other irc clients). It's not so much traffic that I think it could have anything to do with the crashes, but it's enough that it would probably be a pain to extract anything from the log unless wireshark has some really good filtering tools that I'm not aware of.

can you try to /connect -rawlog file

Sure. What does that do exactly? I just tried using it and it didn't seem to create any file.

@spaghetti2514
Copy link
Author

spaghetti2514 commented Dec 12, 2017

For the sake of reproduction, I'll mention again that this only happens if the ircd sends a message to your bible-sending client while it's in the middle of sending the bible. If you guys were trying to reproduce it in an empty channel, it probably won't work. Having a bot in the same channel that says something arbitrary every second or so should work, as should having a second client connected where you just manually send some messages while the first client is sending the bible.

You guys can probably figure out what's wrong a lot more easily than I can, if only it could be reproduced.

@josephbisch
Copy link
Member

Are you sure that /connect -rawlog file doesn't create a file named file in the same directory as you run irssi from? It should record the raw IRC messages being sent.

I think that if we can get the rawlog from you, we can reproduce it without needing to try to get the same conditions with respect to bots or other clients that you have.

Ultimately the issue here is that irssi is crashing in response to something being received from the server, even if that message from the server is being sent to the crashing irssi as a result of a message from some other client. So we really need the messages between the server and the crashing client and then we should be able to reproduce the bug.

@josephbisch
Copy link
Member

I can't reproduce the bug, even with the instructions in your latest comment. I think it is more complex than just having a bot or second client sending messages at the same time as the bible is being sent.

@ailin-nemui
Copy link
Contributor

by the way @spaghetti2514 what exactly did you do when you say disable flood protection?

@spaghetti2514
Copy link
Author

what exactly did you do when you say disable flood protection?

I set cmd_queue_speed to 0

@spaghetti2514
Copy link
Author

I got rawlog to work (I was giving it a path instead of just a filename because I wanted the file somewhere else). The results aren't illuminating. It's just a bunch of outgoing PRIVMSGs from me to the server containing bible passages followed by the single message that kills me, whether that's a message from a bot or whatever. There's nothing after it.

Example:

...
<< PRIVMSG #test :13:27 So the servants of the householder came and said unto him, Sir,
<< PRIVMSG #test :didst not thou sow good seed in thy field? from whence then hath it
>> :bot!bot@bot.host PRIVMSG #test :aaa

@spaghetti2514
Copy link
Author

Maybe you guys can't reproduce it because you can't fill up some kind of buffer fast enough? If disk IO is bottlenecking you there's a kernel module that you can compile and load to get a character device that you can read the bible straight out of.

@ailin-nemui
Copy link
Contributor

I don't think disk IO should be a bottle neck here. You said on one config it didn't crash but only disconnect, did you have a chance to investigate the config difference?

@spaghetti2514
Copy link
Author

Okay, I finally figured out what the profile difference is. If nickcolor.pl is present in scripts/autorun, irssi segfaults, otherwise it doesn't.

@horgh
Copy link
Member

horgh commented Dec 16, 2017

Would it be possible to get a copy of your ~/.irssi directory (including nickcolor.pl) that you use to cause the crash please?

As well, could you point me to a copy of the bible.txt you're using?

Would it be possible for us to test on the IRC server you're using as well? If so, could you tell us the information to connect?

Maybe we can reproduce it if we copy everything exactly!

Thanks for your persistence in helping us figure it out!

@spaghetti2514
Copy link
Author

Here's a zip containing the minimal .irssi directory and the bible.txt

I'm testing this on irc.oppaiti.me, which is running inspircd 2.2.0. If you're gonna join be warned that the motd is the entire bee movie script.

@dequis
Copy link
Member

dequis commented Dec 16, 2017

Ah, yes, a christian server.

I managed to reproduce the crash. Thank you!

The short version (breakpoints on server_connect_init, server->connection_lost = TRUE, glog_func, server_disconnect), pretty much tells the whole story already. Bigger log with backtraces: log.txt

0x00007f095e233f30 in _start () from /lib64/ld-linux-x86-64.so.2
(rr) c
Continuing.

Breakpoint 9, server_connect_init (server=0x558ecd4099a0) at servers.c:357
357             g_return_if_fail(server != NULL);
(rr)
Continuing.
[New Thread 29950.29980]

Thread 1 hit Breakpoint 12, glog_func (log_domain=0x0, log_level=G_LOG_LEVEL_WARNING, message=0x558ecd88a9d0
    "Dropping some data on an outgoing connection") at fe-common-core.c:253
253             switch (log_level) {
(rr)
Continuing.

Thread 1 hit Breakpoint 11, irc_server_send_data (server=0x558ecd4099a0, data=<optimized out>, len=94) at irc-servers.c:529
529                     server->connection_lost = TRUE;
(rr)
Continuing.

Thread 1 hit Breakpoint 4, server_disconnect (server=0x558ecd4099a0) at servers.c:462
462     {
(rr)
Continuing.

Thread 1 hit Breakpoint 12, glog_func (log_domain=0x7f095d55efc5 "GLib", log_level=G_LOG_LEVEL_CRITICAL, message=0x558ecd5876b0
    "g_hash_table_lookup: assertion 'hash_table != NULL' failed") at fe-common-core.c:253
253             switch (log_level) {
(rr)
Continuing.

Thread 1 hit Breakpoint 12, glog_func (log_domain=0x7f095d55efc5 "GLib", log_level=G_LOG_LEVEL_CRITICAL, message=0x558ecd9f0200
    "g_hash_table_lookup: assertion 'hash_table != NULL' failed") at fe-common-core.c:253
253             switch (log_level) {
(rr)
Continuing.

[...]

Thread 1 received signal SIGSEGV, Segmentation fault.
flood_newmsg (server=0x558ecd4099a0, level=4, nick=0x558ecd40acd1 "|\237U",
    host=0x558ecd40acd9 "ble_lookup: assertion 'hash_table != NULL' failed",
    target=0x558ecd594720 "#christianity") at flood.c:196
196             flood = g_hash_table_lookup(mserver->floodlist, nick);
(rr)

The server->connection_lost = TRUE happens a few stack frames above the same path as the Dropping some data on an outgoing connection

The incoming message sent from the second irssi is handled by nickcolor.pl's sig_public which does $server->command(...), which calls parse_command which notices server->connection_lost and disconnects.

server_disconnect doesn't kill it yet, but it emits the "server disconnected" signal that clears stuff like server->isupport which results in those NULL hash table messages and calls flood_deinit_server which removes the module data of that flood.

So when it gets to flood_newmsg it does mserver = MODULE_DATA(server);, fails to find it and so it returns NULL, and mserver->floodlist crashes.

I think there's no use after free involved here, which is pretty cool. By the time it crashes the refcount of the server is 1, everything is good there. It's just a handful of things getting called after the irc part of the server is full of NULLs.

@ailin-nemui
Copy link
Contributor

@spaghetti2514 can you test #803 and see if that fixes the issue?

@spaghetti2514
Copy link
Author

I would be happy to

@spaghetti2514
Copy link
Author

@ailin-nemui
Surprisingly it did not. Irssi still segfaults after following the same steps.

@spaghetti2514
Copy link
Author

spaghetti2514 commented Jan 5, 2018

gdb info and backtrace with #803 applied

Thread 1 "irssi" received signal SIGSEGV, Segmentation fault.
flood_newmsg (server=0x5555559b60d0, level=4, 
    nick=0x5555559af021 "B\"\366\377\177", host=0x5555559af027 "", 
    target=0x555555b32000 "#test") at flood.c:196
196             flood = g_hash_table_lookup(mserver->floodlist, nick);

(gdb) bt
#0  flood_newmsg (server=0x5555559b60d0, level=4, 
    nick=0x5555559af021 "B\"\366\377\177", host=0x5555559af027 "", 
    target=0x555555b32000 "#test") at flood.c:196
#1  0x00005555555ddfff in flood_privmsg (server=0x5555559b60d0, 
    data=<optimized out>, nick=0x5555559af021 "B\"\366\377\177", 
    addr=0x5555559af027 "") at flood.c:255
#2  0x00005555555f7136 in signal_emit_real (rec=rec@entry=0x555555872340, 
    params=params@entry=4, va=va@entry=0x7fffffffe138, 
    first_hook=<optimized out>) at signals.c:242
#3  0x00005555555f7660 in signal_emit (signal=<optimized out>, 
    params=params@entry=4) at signals.c:286
#4  0x00005555555bf388 in irc_server_event (server=0x5555559b60d0, 
    line=0x5555559af040 "PRIVMSG #test :And all the prophets prophesied so, saying, Go up to Ramothgilead, and prosper: for the anime tiddies shall deliver it into the hand of the king.", nick=0x5555559af021 "B\"\366\377\177", 
    address=0x5555559af027 "") at irc.c:338
#5  0x00005555555f7136 in signal_emit_real (rec=rec@entry=0x555555876360, 
    params=params@entry=4, va=va@entry=0x7fffffffe2e8, 
    first_hook=<optimized out>) at signals.c:242
#6  0x00005555555f7738 in signal_emit_id (signal_id=<optimized out>, 
    params=4) at signals.c:304
#7  0x00005555555f7136 in signal_emit_real (rec=rec@entry=0x555555881210, 
    params=params@entry=2, va=va@entry=0x7fffffffe448,

@ailin-nemui
Copy link
Contributor

@spaghetti2514 :(

can you try with this script? how much does it print before crashing?

sub print_signal {
Irssi::active_win->print(Irssi::signal_get_emitted) ;
} 
for ("server connected", "server disconnected", "server destroyed") {
Irssi::signal_add_first ( $_, "print_signal" ) ;
} 

@spaghetti2514
Copy link
Author

@ailin-nemui What does it print to? I don't see anything extra printed with this script loaded.

@ailin-nemui
Copy link
Contributor

@spaghetti2514 odd. it's supposed to print to the active window. you can change the code to

sub print_signal {
print STDERR Irssi::signal_get_emitted() . "\n";
} 

to print to stderr instead and run irssi 2>stderr.log

first, test it without the bible:

/connect irc.oppaiti.me
/disconnect

it should print something like this, if everything is working

 -!- Irssi: Looking up irc.oppaiti.me
 -!- Irssi: server connected
 -!- Irssi: Connection to irc.oppaiti.me established
 !irc.oppaiti.me *** Looking up your hostname...
 -!- Your host is irc.oppaiti.me, running version InspIRCd-2.2
...
 -!- Irssi: Disconnecting from server irc.oppaiti.me: [leaving]
 -!- Irssi: server disconnected
 -!- Irssi: Connection lost to irc.oppaiti.me
 -!- Irssi: server destroyed

the 3 messages "Irssi: server..." are from this little debugging script

ailin-nemui added a commit to irssi/scripts.irssi.org that referenced this issue Jan 5, 2018
fixes an odd crash when $server unexpectedly becomes invalid during command execution.
cf irssi/irssi#796
@horgh
Copy link
Member

horgh commented Jan 7, 2018

I tested the changes in #803 as well. From master at d85f867, I merged the branch in. Before merging I reliably saw the segfault. After applying the branch there is no segfault.

One interesting point: I see this in the status window (even before Irssi updates to show there is no connection):

19:39 -!- Irssi: warning Dropping some data on an outgoing connection
19:39 -!- Irssi: process 0 (cat /will/t/t/bible.txt) terminated with return code 0

While there's no crash, the behaviour afterwards is weird. Irssi sits there looking like it is connected until another client in the channel does something (sends a message). Then Irssi quits. Doing anything else in Irssi, including sending a message to the channel, does not cause it to quit, but it appears disconnected in that if I try to send a message to the channel it looks like it worked yet nothing shows in the channel in the other client. This is the same thing @spaghetti2514 found in that something must happen externally after flooding, but now we do not crash, just disconnect.

By the way, I am able to reproduce the crash reliably using a server I put up for testing (I didn't want to repeatedly test on the server @spaghetti2514 mentioned).

@ailin-nemui
Copy link
Contributor

thanks for the detailed analysis @horgh , sounds like we should further investigate what's going on there and why it doesn't disconnect (could it be a bug/side effect of my fix?)

@spaghetti2514
Copy link
Author

@horgh

Before merging I reliably saw the segfault. After applying the branch there is no segfault.

Weird. Maybe I applied the patch wrong

@ailin-nemui

why it doesn't disconnect (could it be a bug/side effect of my fix?)

I don't think it's a side-effect of your fix. It's the same behavior I was seeing before your patch whenever nickcolor.pl isn't loaded. Loading nickcolor.pl only replaces the disconnect with a segfault. Either one is still triggered by the server sending something after the client ghost-disconnects.

Also sorry if I'm slow to try things or help out, I'm currently traveling and am very busy.

@horgh
Copy link
Member

horgh commented Jan 8, 2018

Yeah, I think the weird semi-connected state is not caused by the fix since we saw the delay before.

About the delayed disconnection:

We end up setting server->connection_lost true (in irc_server_send_data()) after we see the warning about "Dropping some data". So at that point (or ASAP) we should probably be fully disconnecting the server. But we don't do it yet.

The disconnect happens in irc_parse_incoming() it looks like: When a message (or anything from the server) comes in and we see server->connection_lost is true, we call server_disconnect().

This changes in this commit make us call server_disconnect() immediately: horgh/irssi@9ce8404

It works okay for this one scenario. If we want to do this, we should check any signal that uses a SERVER_REC* that could lead to us calling that server_disconnect(), as without holding a reference it can get freed and the SERVER_REC* can become invalid (and we'll crash in a new way). So there might be a better way to cause the disconnect.

I also tried having a timer that disconnected any servers flagged ->connection_lost. A timer function avoids worrying about the SERVER_REC* becoming invalid mid-event, but there would be a window of time where the server is in this partially connected state (between when we get ->connection_lost set and when the timer fires), so it does not seem ideal.

Edit: We probably can't use that approach (killing the SERVER_REC* at that point) as I don't think it will be possible to protect the SERVER_REC* sufficiently (e.g., Perl scripts). Calling server_disconnect() somewhere does seem like the right thing to do, but I don't know where!

@ailin-nemui
Copy link
Contributor

@spaghetti2514 maybe you can just retry the patch whenever you get the time. verify that the patch is working first, with the simple case I showed you in #796 (comment)

@ailin-nemui
Copy link
Contributor

@horgh I think calling server_disconnect seems to be the right (and only) solution. The only other idea I have would be to add the check to the mainloop. I once had this plan to change the signal (event) emitter model to default to "after event", i.e. you put in some events that are going to run after the current signal stack finishes...

@horgh
Copy link
Member

horgh commented Jan 10, 2018

Your idea to have a way to run things after the current signal processing finishes is a neat one! I was looking around to see if there was an API kind of like that. It would definitely be useful here, and it sounds like there have been other situations where it would have been handy.

Maybe something as simple as registering callback functions for the main loop to fire would work. I'll take a look at that loop!

horgh added a commit to horgh/irssi that referenced this issue Mar 17, 2018
This fixes the delayed segfault/disconnection aspect of irssi#796.

When sending fails, the server is marked as disconnected and no longer
useful. However it was not yet fully cleaned up. That cleanup happened in
irc_parse_incoming() when we receive traffic from the server. This meant
there could be a noticeable delay until we disconnect. With this change
we clean it up in the next tick.
@horgh
Copy link
Member

horgh commented Mar 17, 2018

I have a solution to the delayed disconnect/segfault aspect of this in horgh/irssi@f8bdc01. It causes us to clean up the server the next time through the event loop. The interval is very short as you can see. The idea was to be effectively instant. I've tested it together with the changes in #803 and it seems to work fine. No segfault and Irssi shows the server is disconnected right away.

I've not been able to come up with a better solution despite continuing to poking at this.

#803 seems to solve the segfault part nicely.

@ailin-nemui
Copy link
Contributor

I think we should probably implement our own event source to queue events to instead of the timeout,1 hack :-)

@horgh
Copy link
Member

horgh commented Jun 22, 2018

Yeah, it's not too nice. Short though!

I'm not sure what such a queue would look like. Do you have a rough idea of how you think it should look? I'm not clear on how the events work in general 😳

@ailin-nemui ailin-nemui added the WIP label Sep 4, 2018
@ailin-nemui ailin-nemui added this to the 1.2.0 milestone Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants