Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compilation failure on v850-elf target. #1

Open
monaka opened this issue May 14, 2014 · 2 comments
Open

compilation failure on v850-elf target. #1

monaka opened this issue May 14, 2014 · 2 comments

Comments

@monaka
Copy link
Member

monaka commented May 14, 2014

/Volumes/git/binutils-gdb/build/./gcc/xgcc -B/Volumes/git/binutils-gdb/build/./gcc/ -B/usr/local/v850-elf/bin/ -B/usr/local/v850-elf/lib/ -isystem /usr/local/v850-elf/include -isystem /usr/local/v850-elf/sys-include -L/Volumes/git/binutils-gdb/build/./ld    -g -O2 -mv850e -O2  -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE  -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition  -isystem ./include -mno-app-regs -msmall-sld -Wa,-mwarn-signed-overflow -Wa,-mwarn-unsigned-overflow  -g  -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -fno-stack-protector -Dinhibit_libc  -I. -I. -I../../.././gcc -I../../../../libgcc -I../../../../libgcc/. -I../../../../libgcc/../gcc -I../../../../libgcc/../include   -DUSE_EMUTLS -o _absvsi2.o -MT _absvsi2.o -MD -MP -MF _absvsi2.dep -DL_absvsi2 -c ../../../../libgcc/../gcc/libgcc2.c \

/var/folders/wr/mrhq4cgx2b15l4xwgj12jchm0000gn/T//cc98iEJS.s: Assembler messages:
/var/folders/wr/mrhq4cgx2b15l4xwgj12jchm0000gn/T//cc98iEJS.s:88: Internal error, aborting at ../../gas/config/tc-v850.c line 3248 in void md_assemble(char *)
Please report this bug.
make[3]: *** [_absvsi2.o] Error 1
make[2]: *** [multi-do] Error 1
make[1]: *** [all-multi] Error 2
make: *** [all-target-libgcc] Error 2
@monaka
Copy link
Member Author

monaka commented May 14, 2014

It seems this issue is caused by Clang. When I changed compiler's optimization level to -O0, as-new was crashed.

@monaka
Copy link
Member Author

monaka commented May 14, 2014

Not only Clang but also Apple-gcc42. Segmentation Fault (Signal 11) on Apple-gcc42.

monaka pushed a commit that referenced this issue May 15, 2014
PR tui/14880 shows a reproducer that triggers this assertion:

  int
  value_available_contents_eq (const struct value *val1, int offset1,
  			     const struct value *val2, int offset2,
  			     int length)
  {
    int idx1 = 0, idx2 = 0;

    /* This routine is used by printing routines, where we should
       already have read the value.  Note that we only know whether a
       value chunk is available if we've tried to read it.  */
    gdb_assert (!val1->lazy && !val2->lazy);

(top-gdb) bt
#0  internal_error (file=0x88a26c "../../src/gdb/value.c", line=549, string=0x88a220 "%s: Assertion `%s' failed.") at ../../src/gdb/utils.c:844
#1  0x000000000057b9cd in value_available_contents_eq (val1=0x10fa900, offset1=0, val2=0x10f9e10, offset2=0, length=8) at ../../src/gdb/value.c:549
#2  0x00000000004fd756 in tui_get_register (frame=0xd5c430, data=0x109a548, regnum=0, changedp=0x109a560) at ../../src/gdb/tui/tui-regs.c:736
#3  0x00000000004fd111 in tui_check_register_values (frame=0xd5c430) at ../../src/gdb/tui/tui-regs.c:521
#4  0x0000000000501884 in tui_check_data_values (frame=0xd5c430) at ../../src/gdb/tui/tui-windata.c:234
#5  0x00000000004f976f in tui_selected_frame_level_changed_hook (level=1) at ../../src/gdb/tui/tui-hooks.c:222
#6  0x00000000006f0681 in select_frame (fi=0xd5c430) at ../../src/gdb/frame.c:1490
#7  0x00000000005dd94b in up_silently_base (count_exp=0x0) at ../../src/gdb/stack.c:2268
#8  0x00000000005dd985 in up_command (count_exp=0x0, from_tty=1) at ../../src/gdb/stack.c:2280
#9  0x00000000004dc5cf in do_cfunc (c=0xd3f720, args=0x0, from_tty=1) at ../../src/gdb/cli/cli-decode.c:113
#10 0x00000000004df664 in cmd_func (cmd=0xd3f720, args=0x0, from_tty=1) at ../../src/gdb/cli/cli-decode.c:1888
#11 0x00000000006e43e1 in execute_command (p=0xc7e6c2 "", from_tty=1) at ../../src/gdb/top.c:489

The fix is to fetch the value before comparing the contents.  The
comment additions to value.h explain why it can't be
value_available_contents_eq itself that fetches the contents.

Tested on x86_64 Fedora 17.

gdb/
2013-06-28  Pedro Alves  <palves@redhat.com>

	PR tui/14880
	* tui/tui-regs.c (tui_get_register): Fetch register value contents
	before checking whether they're available.
	* value.c (value_available_contents_eq): Change comment.
	* value.h (value_available_contents_eq): Expand comment.
monaka pushed a commit that referenced this issue May 15, 2014
In entry-values.exp, we have a test where the entry value of 'j' is
unavailable, so it is expected that printing j@entry yields
"<unavailable>".  However, the actual output is:

 (gdb) frame
 #0  0x0000000000400540 in foo (i=0, i@entry=2, j=2, j@entry=<error reading variable: Cannot access memory at address 0x6009e8>)

The error is thrown here:

#0  throw_it (reason=RETURN_ERROR, error=MEMORY_ERROR, fmt=0x8cd550 "Cannot access memory at address %s", ap=0x7fffffffc8e8) at ../../src/gdb/exceptions.c:373
#1  0x00000000005e2f9c in throw_error (error=MEMORY_ERROR, fmt=0x8cd550 "Cannot access memory at address %s") at ../../src/gdb/exceptions.c:422
#2  0x0000000000673a5f in memory_error (status=5, memaddr=6293992) at ../../src/gdb/corefile.c:204
#3  0x0000000000673aea in read_memory (memaddr=6293992, myaddr=0x7fffffffca60 "\200\316\377\377\377\177", len=4) at ../../src/gdb/corefile.c:223
#4  0x00000000006784d1 in dwarf_expr_read_mem (baton=0x7fffffffcd50, buf=0x7fffffffca60 "\200\316\377\377\377\177", addr=6293992, len=4) at ../../src/gdb/dwarf2loc.c:334
#5  0x000000000067645e in execute_stack_op (ctx=0x1409480, op_ptr=0x7fffffffce87 "\237<\005@", op_end=0x7fffffffce88 "<\005@") at ../../src/gdb/dwarf2expr.c:1045
#6  0x0000000000674e29 in dwarf_expr_eval (ctx=0x1409480, addr=0x7fffffffce80 "\003\350\t`", len=8) at ../../src/gdb/dwarf2expr.c:364
#7  0x000000000067c5b2 in dwarf2_evaluate_loc_desc_full (type=0x10876d0, frame=0xd8ecc0, data=0x7fffffffce80 "\003\350\t`", size=8, per_cu=0xf24c40, byte_offset=0)
    at ../../src/gdb/dwarf2loc.c:2236
#8  0x000000000067cc65 in dwarf2_evaluate_loc_desc (type=0x10876d0, frame=0xd8ecc0, data=0x7fffffffce80 "\003\350\t`", size=8, per_cu=0xf24c40)
    at ../../src/gdb/dwarf2loc.c:2407
#9  0x000000000067a5d4 in dwarf_entry_parameter_to_value (parameter=0x13a7960, deref_size=18446744073709551615, type=0x10876d0, caller_frame=0xd8ecc0, per_cu=0xf24c40)
    at ../../src/gdb/dwarf2loc.c:1160
#10 0x000000000067a962 in value_of_dwarf_reg_entry (type=0x10876d0, frame=0xd8de70, kind=CALL_SITE_PARAMETER_DWARF_REG, kind_u=...) at ../../src/gdb/dwarf2loc.c:1310
#11 0x000000000067aaca in value_of_dwarf_block_entry (type=0x10876d0, frame=0xd8de70, block=0xf1c2d4 "Q", block_len=1) at ../../src/gdb/dwarf2loc.c:1363
#12 0x000000000067e7c9 in locexpr_read_variable_at_entry (symbol=0x13a7540, frame=0xd8de70) at ../../src/gdb/dwarf2loc.c:3326
#13 0x00000000005daab6 in read_frame_arg (sym=0x13a7540, frame=0xd8de70, argp=0x7fffffffd0e0, entryargp=0x7fffffffd100) at ../../src/gdb/stack.c:362
#14 0x00000000005db384 in print_frame_args (func=0x13a7470, frame=0xd8de70, num=-1, stream=0xea3890) at ../../src/gdb/stack.c:669
#15 0x00000000005dc338 in print_frame (frame=0xd8de70, print_level=1, print_what=SRC_AND_LOC, print_args=1, sal=...) at ../../src/gdb/stack.c:1199
#16 0x00000000005db8ee in print_frame_info (frame=0xd8de70, print_level=1, print_what=SRC_AND_LOC, print_args=1) at ../../src/gdb/stack.c:851
#17 0x00000000005da2bb in print_stack_frame (frame=0xd8de70, print_level=1, print_what=SRC_AND_LOC) at ../../src/gdb/stack.c:169
#18 0x00000000005de236 in frame_command (level_exp=0x0, from_tty=1) at ../../src/gdb/stack.c:2265

dwarf2_evaluate_loc_desc_full (frame #7) knows to handle
NOT_AVAILABLE_ERROR errors, but read_memory always throws
a generic error.

Presently, only the value machinery knows to handle unavailable
memory.  We need to push the awareness down to the target_xfer layer,
making it return a finer grained error indication.  We can only return
a generic -1 nowadays, which leaves the upper layers with no clue on
why the xfer failed.  Use target_xfer_partial directly, rather than
propagating the error through target_read_memory so as to get a better
address to display in the error message.

(target_read_memory & friends build on top of target_read (thus the
target_xfer machinery), but turn all errors to EIO, an errno value.  I
think this is a mistake, and we'd better convert all these to return a
target_xfer_error too, but that can be done separately.  I looked
around a bit over memory_error calls, and the need to handle random
errno values, other than the EIOs gdb itself hardcodes, probably comes
(only) from deprecated_xfer_memory, which uses errno for error
indication, but I didn't look exhaustively.  We should really get rid
of deprecated_xfer_memory and of passing down errno values as error
indication in target_read & friends methods).

Tested on x86_64 Fedora 17, native and gdbserver.  Fixes the test in
the PR, which will be added to the testsuite later.

gdb/
2013-08-22  Pedro Alves  <palves@redhat.com>

	PR gdb/15871
	* corefile.c (target_xfer_memory_error): New function.
	(memory_error): Defer EIO to target_memory_error.
	(read_memory): Use target_xfer_partial, and handle finer-grained
	target xfer errors.
	* target.c (target_xfer_error_to_string): New function.
	(memory_xfer_partial_1): If memory is known to be
	unavailable, return TARGET_XFER_E_UNAVAILABLE instead of -1.
	(target_xfer_partial): Make extern.
	* target.h (enum target_xfer_error): New enum.
	(target_xfer_error_to_string): Declare function.
	(target_xfer_partial): Declare function.
	(struct target_ops) <xfer_partial>: Adjust describing comment.
monaka pushed a commit that referenced this issue May 15, 2014
… "break", "list")

"info threads" changes the default source for "break" and "list", to
whatever the location of the first/bottom thread in the thread list
is...

 (gdb) b start
 (gdb) c
 ...
 (gdb) list
 *lists "start"*
 (gdb) b 23
 Breakpoint 3 at 0x400614: file test.c, line 23.
 (gdb) info threads
   Id   Target Id         Frame
 * 2    Thread 0x7ffff7fcb700 (LWP 1760) "test" start (arg=0x0) at test.c:23
   1    Thread 0x7ffff7fcc740 (LWP 1748) "test" 0x000000323dc08e60 in pthread_join (threadid=140737353922304, thread_return=0x0) at pthread_join.c:93
 (gdb) b 23
 Breakpoint 4 at 0x323dc08d90: file pthread_join.c, line 23.
                                    ^^^^^^^^^^^^^^^
 (gdb) list
 93          lll_wait_tid (pd->tid);
 94
 95
 96        /* Restore cancellation mode.  */
 97        CANCEL_RESET (oldtype);
 98
 99        /* Remove the handler.  */
 100       pthread_cleanup_pop (0);
 101
 102

The issue is that print_stack_frame always sets the current sal to the
frame's sal.  print_frame_info (which print_stack_frame calls to do
most of the work) also sets the last displayed sal, but only if
print_what isn't LOCATION.  Now the call in question, from within
thread.c:print_thread_info, does pass in LOCATION as print_what, but
print_stack_frame doesn't have the same check print_frame_info has.
We could consider adding it, but setting these globals depending on
print_what isn't very clean, IMO.  What we have is two logically
distinct operations mixed in the same function(s):

  #1 - print frame, in the format specified by {print_what,
    print_level and print_args}.

  #2 - We're displaying a frame to the user, and I want the default
    sal to point here, because the program stopped here, or the user
    did some context-changing command (up, down, etc.).

So I added a new parameter to print_stack_frame & friends for point
#2, and went through all calls in the tree adjusting as necessary.

Tested on x86_64 Fedora 17.

gdb/
2013-09-17  Pedro Alves  <palves@redhat.com>

	PR gdb/15911
	* ada-tasks.c (task_command_1): Adjust call to print_stack_frame.
	* bsd-kvm.c (bsd_kvm_open, bsd_kvm_proc_cmd, bsd_kvm_pcb_cmd):
	* corelow.c (core_open):
	* frame.h (print_stack_frame, print_frame_info): New
	'set_current_sal' parameter.
	* infcmd.c (finish_command, kill_command): Adjust call to
	print_stack_frame.
	* inferior.c (inferior_command): Likewise.
	* infrun.c (normal_stop): Likewise.
	* linux-fork.c (linux_fork_context): Likewise.
	* record-full.c (record_full_goto_entry, record_full_restore):
	Likewise.
	* remote-mips.c (common_open): Likewise.
	* stack.c (print_stack_frame): New 'set_current_sal' parameter.
	Use it.
	(print_frame_info): New 'set_current_sal' parameter.  Set the last
	displayed sal depending on the new paremeter instead of looking at
	print_what.
	(backtrace_command_1, select_and_print_frame, frame_command)
	(current_frame_command, up_command, down_command): Adjust call to
	print_stack_frame.
	* thread.c (print_thread_info, restore_selected_frame)
	(do_captured_thread_select): Adjust call to print_stack_frame.
	* tracepoint.c (tfind_1): Likewise.
	* mi/mi-cmd-stack.c (mi_cmd_stack_list_frames)
	(mi_cmd_stack_info_frame): Likewise.
	* mi/mi-interp.c (mi_on_normal_stop): Likewise.
	* mi/mi-main.c (mi_cmd_exec_return, mi_cmd_trace_find): Likewise.

	gdb/testsuite/
	* gdb.threads/info-threads-cur-sal-2.c: New file.
	* gdb.threads/info-threads-cur-sal.c: New file.
	* gdb.threads/info-threads-cur-sal.exp: New file.
monaka pushed a commit that referenced this issue May 15, 2014
Hi,

This FIXME goes into my eyes, when I am about to modify something here,

  /* Name is allocated by name_of_child.  */
  /* FIXME: xstrdup should not be here.  */

This FIXME was introduced in the python pretty-pretter patches.

  Python pretty-printing [6/6]
  https://sourceware.org/ml/gdb-patches/2009-05/msg00467.html

create_child_with_value is called in two paths,

 1. varobj_list_children -> create_child -> create_child_with_value,
 2. install_dynamic_child -> install_dynamic_child -> varobj_add_child
    -> create_child_with_value

In path #1, 'name' is allocated by name_of_child, as the original
comment said, we don't have to duplicate NAME in
create_child_with_value.  In path #2, 'name' is got from
PyArg_ParseTuple, and we have to duplicate NAME.

This patch removes the call to xstrdup in create_child_with_value
and call xstrudp in update_dynamic_varobj_children (path #2).

gdb:

2013-10-04  Yao Qi  <yao@codesourcery.com>

	* varobj.c (create_child_with_value): Remove 'const' from the
	type of parameter 'name'.
	(varobj_add_child): Likewise.
	(install_dynamic_child): Remove 'const' from the type of
	parameter 'name'.
	(varobj_add_child): Likewise.
	(create_child_with_value): Likewise.  Update comments.  Don't
	duplicate 'name'.
	(update_dynamic_varobj_children): Duplicate 'name'
	and pass it to install_dynamic_child.
monaka pushed a commit that referenced this issue May 15, 2014
The UNWIND_SAME_ID check is done between THIS_FRAME and the next frame
when we go try to unwind the previous frame.  But at this point, it's
already too late -- we ended up with two frames with the same ID in
the frame chain.  Each frame having its own ID is an invariant assumed
throughout GDB.  This patch applies the UNWIND_SAME_ID detection
earlier, right after the previous frame is unwound, discarding the dup
frame if a cycle is detected.

The patch includes a new test that fails before the change.  Before
the patch, the test causes an infinite loop in GDB, after the patch,
the UNWIND_SAME_ID logic kicks in and makes the backtrace stop with:

  Backtrace stopped: previous frame identical to this frame (corrupt stack?)

The test uses dwarf CFI to emulate a corrupted stack with a cycle.  It
has a function with registers marked DW_CFA_same_value (most
importantly RSP/RIP), so that GDB computes the same ID for that frame
and its caller.  IOW, something like this:

 #0 - frame_id_1
 #1 - frame_id_2
 #2 - frame_id_3
 #3 - frame_id_4
 #4 - frame_id_4  <<<< outermost (UNWIND_SAME_ID).

(The test's code is just a copy of dw2-reg-undefined.S /
dw2-reg-undefined.c, adjusted to use DW_CFA_same_value instead of
DW_CFA_undefined, and to mark a different set of registers.)

The infinite loop is here, in value_fetch_lazy:

      while (VALUE_LVAL (new_val) == lval_register && value_lazy (new_val))
	{
	  frame = frame_find_by_id (VALUE_FRAME_ID (new_val));
...
	  new_val = get_frame_register_value (frame, regnum);
	}

get_frame_register_value can return a lazy register value pointing to
the next frame.  This means that the register wasn't clobbered by
FRAME; the debugger should therefore retrieve its value from the next
frame.

To be clear, get_frame_register_value unwinds the value in question
from the next frame:

 struct value *
 get_frame_register_value (struct frame_info *frame, int regnum)
 {
   return frame_unwind_register_value (frame->next, regnum);
                                       ^^^^^^^^^^^
 }

In other words, if we get a lazy lval_register, it should have the
frame ID of the _next_ frame, never of FRAME.

At this point in value_fetch_lazy, the whole relevant chunk of the
stack up to frame #4 has already been unwound.  The loop always
"unlazies" lval_registers in the "next/innermost" direction, not in
the "prev/unwind further/outermost" direction.

So say we're looking at frame #4.  get_frame_register_value in frame
#4 can return a lazy register value of frame #3.  So the next
iteration, frame_find_by_id tries to read the register from frame #3.
But, since frame #4 happens to have same id as frame #3,
frame_find_by_id returns frame #4 instead.  Rinse, repeat, and we have
an infinite loop.

This is an old latent problem, exposed by the recent addition of the
frame stash.  Before we had a stash, frame_find_by_id(frame_id_4)
would walk over all frames starting at the current frame, and would
always find #3 first.  The stash happens to return #4 instead:

struct frame_info *
frame_find_by_id (struct frame_id id)
{
  struct frame_info *frame, *prev_frame;

...
  /* Try using the frame stash first.  Finding it there removes the need
     to perform the search by looping over all frames, which can be very
     CPU-intensive if the number of frames is very high (the loop is O(n)
     and get_prev_frame performs a series of checks that are relatively
     expensive).  This optimization is particularly useful when this function
     is called from another function (such as value_fetch_lazy, case
     VALUE_LVAL (val) == lval_register) which already loops over all frames,
     making the overall behavior O(n^2).  */
  frame = frame_stash_find (id);
  if (frame)
    return frame;

  for (frame = get_current_frame (); ; frame = prev_frame)
    {

gdb/
2013-11-22  Pedro Alves  <palves@redhat.com>

	PR 16155
	* frame.c (get_prev_frame_1): Do the UNWIND_SAME_ID check between
	this frame and the new previous frame, not between this frame and
	the next frame.

gdb/testsuite/
2013-11-22  Pedro Alves  <palves@redhat.com>

	PR 16155
	* gdb.dwarf2/dw2-dup-frame.S: New file.
 	* gdb.dwarf2/dw2-dup-frame.c: New file.
 	* gdb.dwarf2/dw2-dup-frame.exp: New file.
monaka pushed a commit that referenced this issue May 15, 2014
…sniffer (move dwarf2_tailcall_sniffer_first elsewhere).

Two rationales, same patch.

TL;DR 1:

 dwarf2_frame_cache recursion is evil.  dwarf2_frame_cache calls
 dwarf2_tailcall_sniffer_first which then recurses into
 dwarf2_frame_cache.

TL;DR 2:

 An unwinder trying to unwind is evil.  dwarf2_frame_sniffer calls
 dwarf2_frame_cache which calls dwarf2_tailcall_sniffer_first which
 then tries to unwind the PC of the previous frame.

Avoid all that by deferring dwarf2_tailcall_sniffer_first until it's
really necessary.

Rationale 1
===========

A frame sniffer should not try to unwind, because that bypasses all
the validation checks done by get_prev_frame.  The UNWIND_SAME_ID
scenario is one such case where GDB is currently broken because (in
part) of this (the next patch adds a test that would fail without
this).

GDB goes into an infinite loop in value_fetch_lazy, here:

      while (VALUE_LVAL (new_val) == lval_register && value_lazy (new_val))
	{
	  frame = frame_find_by_id (VALUE_FRAME_ID (new_val));
...
	  new_val = get_frame_register_value (frame, regnum);
	}

(top-gdb) bt
#0  value_fetch_lazy (val=0x11516d0) at ../../src/gdb/value.c:3510
#1  0x0000000000584bd8 in value_optimized_out (value=0x11516d0) at ../../src/gdb/value.c:1096
#2  0x00000000006fe7a1 in frame_register_unwind (frame=0x1492600, regnum=16, optimizedp=0x7fffffffcdec, unavailablep=0x7fffffffcde8, lvalp=0x7fffffffcdd8, addrp=
    0x7fffffffcde0, realnump=0x7fffffffcddc, bufferp=0x7fffffffce10 "@\316\377\377\377\177") at ../../src/gdb/frame.c:940
#3  0x00000000006fea3a in frame_unwind_register (frame=0x1492600, regnum=16, buf=0x7fffffffce10 "@\316\377\377\377\177") at ../../src/gdb/frame.c:990
#4  0x0000000000473b9b in i386_unwind_pc (gdbarch=0xf54660, next_frame=0x1492600) at ../../src/gdb/i386-tdep.c:1771
#5  0x0000000000601dfa in gdbarch_unwind_pc (gdbarch=0xf54660, next_frame=0x1492600) at ../../src/gdb/gdbarch.c:2870
#6  0x0000000000693db5 in dwarf2_tailcall_sniffer_first (this_frame=0x1492600, tailcall_cachep=0x14926f0, entry_cfa_sp_offsetp=0x7fffffffcf00)
    at ../../src/gdb/dwarf2-frame-tailcall.c:389
#7  0x0000000000690928 in dwarf2_frame_cache (this_frame=0x1492600, this_cache=0x1492618) at ../../src/gdb/dwarf2-frame.c:1245
#8  0x0000000000690f46 in dwarf2_frame_sniffer (self=0x8e4980, this_frame=0x1492600, this_cache=0x1492618) at ../../src/gdb/dwarf2-frame.c:1423
#9  0x000000000070203b in frame_unwind_find_by_frame (this_frame=0x1492600, this_cache=0x1492618) at ../../src/gdb/frame-unwind.c:112
#10 0x00000000006fd681 in get_frame_id (fi=0x1492600) at ../../src/gdb/frame.c:408
#11 0x00000000007006c2 in get_prev_frame_1 (this_frame=0xdc1860) at ../../src/gdb/frame.c:1826
#12 0x0000000000700b7a in get_prev_frame (this_frame=0xdc1860) at ../../src/gdb/frame.c:2056
#13 0x0000000000514588 in frame_info_to_frame_object (frame=0xdc1860) at ../../src/gdb/python/py-frame.c:322
#14 0x000000000051784c in bootstrap_python_frame_filters (frame=0xdc1860, frame_low=0, frame_high=-1) at ../../src/gdb/python/py-framefilter.c:1396
#15 0x0000000000517a6f in apply_frame_filter (frame=0xdc1860, flags=7, args_type=CLI_SCALAR_VALUES, out=0xed7a90, frame_low=0, frame_high=-1)
    at ../../src/gdb/python/py-framefilter.c:1492
#16 0x00000000005e77b0 in backtrace_command_1 (count_exp=0x0, show_locals=0, no_filters=0, from_tty=1) at ../../src/gdb/stack.c:1777
#17 0x00000000005e7c0f in backtrace_command (arg=0x0, from_tty=1) at ../../src/gdb/stack.c:1891
#18 0x00000000004e37a7 in do_cfunc (c=0xda4fa0, args=0x0, from_tty=1) at ../../src/gdb/cli/cli-decode.c:107
#19 0x00000000004e683c in cmd_func (cmd=0xda4fa0, args=0x0, from_tty=1) at ../../src/gdb/cli/cli-decode.c:1882
#20 0x00000000006f35ed in execute_command (p=0xcc66c2 "", from_tty=1) at ../../src/gdb/top.c:468
#21 0x00000000005f8853 in command_handler (command=0xcc66c0 "bt") at ../../src/gdb/event-top.c:435
#22 0x00000000005f8e12 in command_line_handler (rl=0xfe05f0 "@") at ../../src/gdb/event-top.c:632
#23 0x000000000074d2c6 in rl_callback_read_char () at ../../src/readline/callback.c:220
#24 0x00000000005f8375 in rl_callback_read_char_wrapper (client_data=0x0) at ../../src/gdb/event-top.c:164
#25 0x00000000005f876a in stdin_event_handler (error=0, client_data=0x0) at ../../src/gdb/event-top.c:375
#26 0x00000000005f72fa in handle_file_event (data=...) at ../../src/gdb/event-loop.c:768
#27 0x00000000005f67a3 in process_event () at ../../src/gdb/event-loop.c:342
#28 0x00000000005f686a in gdb_do_one_event () at ../../src/gdb/event-loop.c:406
#29 0x00000000005f68bb in start_event_loop () at ../../src/gdb/event-loop.c:431
#30 0x00000000005f83a7 in cli_command_loop (data=0x0) at ../../src/gdb/event-top.c:179
#31 0x00000000005eeed3 in current_interp_command_loop () at ../../src/gdb/interps.c:327
#32 0x00000000005ef8ff in captured_command_loop (data=0x0) at ../../src/gdb/main.c:267
#33 0x00000000005ed2f6 in catch_errors (func=0x5ef8e4 <captured_command_loop>, func_args=0x0, errstring=0x8b6554 "", mask=RETURN_MASK_ALL)
    at ../../src/gdb/exceptions.c:524
#34 0x00000000005f0d21 in captured_main (data=0x7fffffffd9e0) at ../../src/gdb/main.c:1067
#35 0x00000000005ed2f6 in catch_errors (func=0x5efb9b <captured_main>, func_args=0x7fffffffd9e0, errstring=0x8b6554 "", mask=RETURN_MASK_ALL)
    at ../../src/gdb/exceptions.c:524
#36 0x00000000005f0d57 in gdb_main (args=0x7fffffffd9e0) at ../../src/gdb/main.c:1076
#37 0x000000000045bb6a in main (argc=4, argv=0x7fffffffdae8) at ../../src/gdb/gdb.c:34
(top-gdb)

GDB is trying to unwind the PC register of the previous frame (frame
#5 above), starting from the frame being sniffed (the THIS frame).
But the THIS frame's unwinder says the PC of the previous frame is
actually the same as the previous's frame's next frame (which is the
same frame we started with, the THIS frame), therefore it returns an
lval_register lazy value with frame set to THIS frame.  And so the
value_fetch_lazy loop never ends.


Rationale 2
===========

As an experiment, I tried making dwarf2-frame.c:read_addr_from_reg use
address_from_register.  That caused a bunch of regressions, but it
actually took me a long while to figure out what was going on.  Turns
out dwarf2-frame.c:read_addr_from_reg is called while computing the
frame's CFA, from within dwarf2_frame_cache.  address_from_register
wants to create a register with frame_id set to the frame being
constructed.  To create the frame id, we again call dwarf2_frame_cache,
which given:

static struct dwarf2_frame_cache *
dwarf2_frame_cache (struct frame_info *this_frame, void **this_cache)
{
...
  if (*this_cache)
    return *this_cache;

returns an incomplete object to the caller:
static void
dwarf2_frame_this_id (struct frame_info *this_frame, void **this_cache,
		      struct frame_id *this_id)
{
  struct dwarf2_frame_cache *cache =
    dwarf2_frame_cache (this_frame, this_cache);
...
 (*this_id) = frame_id_build (cache->cfa, get_frame_func (this_frame));
}

As cache->cfa is still 0 (we were trying to compute it!), and
get_frame_id recalls this id from here on, we end up with a broken
frame id in recorded for this frame.  Later, when inspecting locals,
the dwarf machinery needs to know the selected frame's base, which
calls get_frame_base:

CORE_ADDR
get_frame_base (struct frame_info *fi)
{
  return get_frame_id (fi).stack_addr;
}

which as seen above then returns 0 ...

So I gave up using address_from_register.

But, the pain of investigating this made me want to have GDB itself
assert that recursion never happens here.  So I wrote a patch to do
that.  But, it triggers on current mainline, because
dwarf2_tailcall_sniffer_first, called from dwarf2_frame_cache, unwinds
the this_frame.

A sniffer shouldn't be trying to unwind, exactly because of this sort
of tricky issue.  The patch defers calling
dwarf2_tailcall_sniffer_first until it's really necessary, in
dwarf2_frame_prev_register (thus actually outside the sniffer path).
As this makes the call to dwarf2_frame_sniffer in dwarf2_frame_cache
unnecessary again, the patch removes that too.

Tested on x86_64 Fedora 17.

gdb/
2013-11-22  Pedro Alves  <palves@redhat.com>

	PR 16155
	* dwarf2-frame.c (struct dwarf2_frame_cache)
	<checked_tailcall_bottom, entry_cfa_sp_offset,
	entry_cfa_sp_offset_p>: New fields.
	(dwarf2_frame_cache): Adjust to use the new cache fields instead
	of locals.  Don't call dwarf2_tailcall_sniffer_first here.
	(dwarf2_frame_prev_register): Call it here, but only once.
monaka pushed a commit that referenced this issue May 15, 2014
…sniffer (move dwarf2_tailcall_sniffer_first elsewhere).

Two rationales, same patch.

TL;DR 1:

 dwarf2_frame_cache recursion is evil.  dwarf2_frame_cache calls
 dwarf2_tailcall_sniffer_first which then recurses into
 dwarf2_frame_cache.

TL;DR 2:

 An unwinder trying to unwind is evil.  dwarf2_frame_sniffer calls
 dwarf2_frame_cache which calls dwarf2_tailcall_sniffer_first which
 then tries to unwind the PC of the previous frame.

Avoid all that by deferring dwarf2_tailcall_sniffer_first until it's
really necessary.

Rationale 1
===========

A frame sniffer should not try to unwind, because that bypasses all
the validation checks done by get_prev_frame.  The UNWIND_SAME_ID
scenario is one such case where GDB is currently broken because (in
part) of this (the next patch adds a test that would fail without
this).

GDB goes into an infinite loop in value_fetch_lazy, here:

      while (VALUE_LVAL (new_val) == lval_register && value_lazy (new_val))
	{
	  frame = frame_find_by_id (VALUE_FRAME_ID (new_val));
...
	  new_val = get_frame_register_value (frame, regnum);
	}

(top-gdb) bt
#0  value_fetch_lazy (val=0x11516d0) at ../../src/gdb/value.c:3510
#1  0x0000000000584bd8 in value_optimized_out (value=0x11516d0) at ../../src/gdb/value.c:1096
#2  0x00000000006fe7a1 in frame_register_unwind (frame=0x1492600, regnum=16, optimizedp=0x7fffffffcdec, unavailablep=0x7fffffffcde8, lvalp=0x7fffffffcdd8, addrp=
    0x7fffffffcde0, realnump=0x7fffffffcddc, bufferp=0x7fffffffce10 "@\316\377\377\377\177") at ../../src/gdb/frame.c:940
#3  0x00000000006fea3a in frame_unwind_register (frame=0x1492600, regnum=16, buf=0x7fffffffce10 "@\316\377\377\377\177") at ../../src/gdb/frame.c:990
#4  0x0000000000473b9b in i386_unwind_pc (gdbarch=0xf54660, next_frame=0x1492600) at ../../src/gdb/i386-tdep.c:1771
#5  0x0000000000601dfa in gdbarch_unwind_pc (gdbarch=0xf54660, next_frame=0x1492600) at ../../src/gdb/gdbarch.c:2870
#6  0x0000000000693db5 in dwarf2_tailcall_sniffer_first (this_frame=0x1492600, tailcall_cachep=0x14926f0, entry_cfa_sp_offsetp=0x7fffffffcf00)
    at ../../src/gdb/dwarf2-frame-tailcall.c:389
#7  0x0000000000690928 in dwarf2_frame_cache (this_frame=0x1492600, this_cache=0x1492618) at ../../src/gdb/dwarf2-frame.c:1245
#8  0x0000000000690f46 in dwarf2_frame_sniffer (self=0x8e4980, this_frame=0x1492600, this_cache=0x1492618) at ../../src/gdb/dwarf2-frame.c:1423
#9  0x000000000070203b in frame_unwind_find_by_frame (this_frame=0x1492600, this_cache=0x1492618) at ../../src/gdb/frame-unwind.c:112
#10 0x00000000006fd681 in get_frame_id (fi=0x1492600) at ../../src/gdb/frame.c:408
#11 0x00000000007006c2 in get_prev_frame_1 (this_frame=0xdc1860) at ../../src/gdb/frame.c:1826
#12 0x0000000000700b7a in get_prev_frame (this_frame=0xdc1860) at ../../src/gdb/frame.c:2056
#13 0x0000000000514588 in frame_info_to_frame_object (frame=0xdc1860) at ../../src/gdb/python/py-frame.c:322
#14 0x000000000051784c in bootstrap_python_frame_filters (frame=0xdc1860, frame_low=0, frame_high=-1) at ../../src/gdb/python/py-framefilter.c:1396
#15 0x0000000000517a6f in apply_frame_filter (frame=0xdc1860, flags=7, args_type=CLI_SCALAR_VALUES, out=0xed7a90, frame_low=0, frame_high=-1)
    at ../../src/gdb/python/py-framefilter.c:1492
#16 0x00000000005e77b0 in backtrace_command_1 (count_exp=0x0, show_locals=0, no_filters=0, from_tty=1) at ../../src/gdb/stack.c:1777
#17 0x00000000005e7c0f in backtrace_command (arg=0x0, from_tty=1) at ../../src/gdb/stack.c:1891
#18 0x00000000004e37a7 in do_cfunc (c=0xda4fa0, args=0x0, from_tty=1) at ../../src/gdb/cli/cli-decode.c:107
#19 0x00000000004e683c in cmd_func (cmd=0xda4fa0, args=0x0, from_tty=1) at ../../src/gdb/cli/cli-decode.c:1882
#20 0x00000000006f35ed in execute_command (p=0xcc66c2 "", from_tty=1) at ../../src/gdb/top.c:468
#21 0x00000000005f8853 in command_handler (command=0xcc66c0 "bt") at ../../src/gdb/event-top.c:435
#22 0x00000000005f8e12 in command_line_handler (rl=0xfe05f0 "@") at ../../src/gdb/event-top.c:632
#23 0x000000000074d2c6 in rl_callback_read_char () at ../../src/readline/callback.c:220
#24 0x00000000005f8375 in rl_callback_read_char_wrapper (client_data=0x0) at ../../src/gdb/event-top.c:164
#25 0x00000000005f876a in stdin_event_handler (error=0, client_data=0x0) at ../../src/gdb/event-top.c:375
#26 0x00000000005f72fa in handle_file_event (data=...) at ../../src/gdb/event-loop.c:768
#27 0x00000000005f67a3 in process_event () at ../../src/gdb/event-loop.c:342
#28 0x00000000005f686a in gdb_do_one_event () at ../../src/gdb/event-loop.c:406
#29 0x00000000005f68bb in start_event_loop () at ../../src/gdb/event-loop.c:431
#30 0x00000000005f83a7 in cli_command_loop (data=0x0) at ../../src/gdb/event-top.c:179
#31 0x00000000005eeed3 in current_interp_command_loop () at ../../src/gdb/interps.c:327
#32 0x00000000005ef8ff in captured_command_loop (data=0x0) at ../../src/gdb/main.c:267
#33 0x00000000005ed2f6 in catch_errors (func=0x5ef8e4 <captured_command_loop>, func_args=0x0, errstring=0x8b6554 "", mask=RETURN_MASK_ALL)
    at ../../src/gdb/exceptions.c:524
#34 0x00000000005f0d21 in captured_main (data=0x7fffffffd9e0) at ../../src/gdb/main.c:1067
#35 0x00000000005ed2f6 in catch_errors (func=0x5efb9b <captured_main>, func_args=0x7fffffffd9e0, errstring=0x8b6554 "", mask=RETURN_MASK_ALL)
    at ../../src/gdb/exceptions.c:524
#36 0x00000000005f0d57 in gdb_main (args=0x7fffffffd9e0) at ../../src/gdb/main.c:1076
#37 0x000000000045bb6a in main (argc=4, argv=0x7fffffffdae8) at ../../src/gdb/gdb.c:34
(top-gdb)

GDB is trying to unwind the PC register of the previous frame (frame
#5 above), starting from the frame being sniffed (the THIS frame).
But the THIS frame's unwinder says the PC of the previous frame is
actually the same as the previous's frame's next frame (which is the
same frame we started with, the THIS frame), therefore it returns an
lval_register lazy value with frame set to THIS frame.  And so the
value_fetch_lazy loop never ends.


Rationale 2
===========

As an experiment, I tried making dwarf2-frame.c:read_addr_from_reg use
address_from_register.  That caused a bunch of regressions, but it
actually took me a long while to figure out what was going on.  Turns
out dwarf2-frame.c:read_addr_from_reg is called while computing the
frame's CFA, from within dwarf2_frame_cache.  address_from_register
wants to create a register with frame_id set to the frame being
constructed.  To create the frame id, we again call dwarf2_frame_cache,
which given:

static struct dwarf2_frame_cache *
dwarf2_frame_cache (struct frame_info *this_frame, void **this_cache)
{
...
  if (*this_cache)
    return *this_cache;

returns an incomplete object to the caller:
static void
dwarf2_frame_this_id (struct frame_info *this_frame, void **this_cache,
		      struct frame_id *this_id)
{
  struct dwarf2_frame_cache *cache =
    dwarf2_frame_cache (this_frame, this_cache);
...
 (*this_id) = frame_id_build (cache->cfa, get_frame_func (this_frame));
}

As cache->cfa is still 0 (we were trying to compute it!), and
get_frame_id recalls this id from here on, we end up with a broken
frame id in recorded for this frame.  Later, when inspecting locals,
the dwarf machinery needs to know the selected frame's base, which
calls get_frame_base:

CORE_ADDR
get_frame_base (struct frame_info *fi)
{
  return get_frame_id (fi).stack_addr;
}

which as seen above then returns 0 ...

So I gave up using address_from_register.

But, the pain of investigating this made me want to have GDB itself
assert that recursion never happens here.  So I wrote a patch to do
that.  But, it triggers on current mainline, because
dwarf2_tailcall_sniffer_first, called from dwarf2_frame_cache, unwinds
the this_frame.

A sniffer shouldn't be trying to unwind, exactly because of this sort
of tricky issue.  The patch defers calling
dwarf2_tailcall_sniffer_first until it's really necessary, in
dwarf2_frame_prev_register (thus actually outside the sniffer path).
As this makes the call to dwarf2_frame_sniffer in dwarf2_frame_cache
unnecessary again, the patch removes that too.

Tested on x86_64 Fedora 17.

gdb/
2013-11-22  Pedro Alves  <palves@redhat.com>

	PR 16155
	* dwarf2-frame.c (struct dwarf2_frame_cache)
	<checked_tailcall_bottom, entry_cfa_sp_offset,
	entry_cfa_sp_offset_p>: New fields.
	(dwarf2_frame_cache): Adjust to use the new cache fields instead
	of locals.  Don't call dwarf2_tailcall_sniffer_first here.
	(dwarf2_frame_prev_register): Call it here, but only once.
monaka pushed a commit that referenced this issue May 15, 2014
The UNWIND_SAME_ID check is done between THIS_FRAME and the next frame
when we go try to unwind the previous frame.  But at this point, it's
already too late -- we ended up with two frames with the same ID in
the frame chain.  Each frame having its own ID is an invariant assumed
throughout GDB.  This patch applies the UNWIND_SAME_ID detection
earlier, right after the previous frame is unwound, discarding the dup
frame if a cycle is detected.

The patch includes a new test that fails before the change.  Before
the patch, the test causes an infinite loop in GDB, after the patch,
the UNWIND_SAME_ID logic kicks in and makes the backtrace stop with:

  Backtrace stopped: previous frame identical to this frame (corrupt stack?)

The test uses dwarf CFI to emulate a corrupted stack with a cycle.  It
has a function with registers marked DW_CFA_same_value (most
importantly RSP/RIP), so that GDB computes the same ID for that frame
and its caller.  IOW, something like this:

 #0 - frame_id_1
 #1 - frame_id_2
 #2 - frame_id_3
 #3 - frame_id_4
 #4 - frame_id_4  <<<< outermost (UNWIND_SAME_ID).

(The test's code is just a copy of dw2-reg-undefined.S /
dw2-reg-undefined.c, adjusted to use DW_CFA_same_value instead of
DW_CFA_undefined, and to mark a different set of registers.)

The infinite loop is here, in value_fetch_lazy:

      while (VALUE_LVAL (new_val) == lval_register && value_lazy (new_val))
	{
	  frame = frame_find_by_id (VALUE_FRAME_ID (new_val));
...
	  new_val = get_frame_register_value (frame, regnum);
	}

get_frame_register_value can return a lazy register value pointing to
the next frame.  This means that the register wasn't clobbered by
FRAME; the debugger should therefore retrieve its value from the next
frame.

To be clear, get_frame_register_value unwinds the value in question
from the next frame:

 struct value *
 get_frame_register_value (struct frame_info *frame, int regnum)
 {
   return frame_unwind_register_value (frame->next, regnum);
                                       ^^^^^^^^^^^
 }

In other words, if we get a lazy lval_register, it should have the
frame ID of the _next_ frame, never of FRAME.

At this point in value_fetch_lazy, the whole relevant chunk of the
stack up to frame #4 has already been unwound.  The loop always
"unlazies" lval_registers in the "next/innermost" direction, not in
the "prev/unwind further/outermost" direction.

So say we're looking at frame #4.  get_frame_register_value in frame
#4 can return a lazy register value of frame #3.  So the next
iteration, frame_find_by_id tries to read the register from frame #3.
But, since frame #4 happens to have same id as frame #3,
frame_find_by_id returns frame #4 instead.  Rinse, repeat, and we have
an infinite loop.

This is an old latent problem, exposed by the recent addition of the
frame stash.  Before we had a stash, frame_find_by_id(frame_id_4)
would walk over all frames starting at the current frame, and would
always find #3 first.  The stash happens to return #4 instead:

struct frame_info *
frame_find_by_id (struct frame_id id)
{
  struct frame_info *frame, *prev_frame;

...
  /* Try using the frame stash first.  Finding it there removes the need
     to perform the search by looping over all frames, which can be very
     CPU-intensive if the number of frames is very high (the loop is O(n)
     and get_prev_frame performs a series of checks that are relatively
     expensive).  This optimization is particularly useful when this function
     is called from another function (such as value_fetch_lazy, case
     VALUE_LVAL (val) == lval_register) which already loops over all frames,
     making the overall behavior O(n^2).  */
  frame = frame_stash_find (id);
  if (frame)
    return frame;

  for (frame = get_current_frame (); ; frame = prev_frame)
    {

gdb/
2013-11-22  Pedro Alves  <palves@redhat.com>

	PR 16155
	* frame.c (get_prev_frame_1): Do the UNWIND_SAME_ID check between
	this frame and the new previous frame, not between this frame and
	the next frame.

gdb/testsuite/
2013-11-22  Pedro Alves  <palves@redhat.com>

	PR 16155
	* gdb.dwarf2/dw2-dup-frame.S: New file.
	* gdb.dwarf2/dw2-dup-frame.c: New file.
	* gdb.dwarf2/dw2-dup-frame.exp: New file.
monaka pushed a commit that referenced this issue May 15, 2014
Given we already have the frame id stash, which holds the ids of all
frames in the chain, detecting corrupted stacks with wide stack cycles
with non-consecutive dup frame ids is just as cheap as just detecting
cycles in consecutive frames:

 #0 frame_id1
 #1 frame_id2
 #2 frame_id3
 #3 frame_id1
 #4 frame_id2
 #5 frame_id3
 #6 frame_id1
 ... forever ...

We just need to check whether the stash already knows about a given
frame id instead of comparing the ids of the previous/this frames.

Tested on x86_64 Fedora 17.

gdb/
2013-11-22  Pedro Alves  <palves@redhat.com>
	    Tom Tromey  <tromey@redhat.com>

	* frame.c (frame_stash_add): Now returns whether a frame with the
	same ID was already known.
	(compute_frame_id): New function, factored out from get_frame_id.
	(get_frame_id): No longer lazilly compute the frame id here.
	(get_prev_frame_if_no_cycle): New function.  Detects wider stack
	cycles.
	(get_prev_frame_1): Use it instead of get_prev_frame_raw directly,
	and checking for stack cycles here.
monaka pushed a commit that referenced this issue May 15, 2014
Debugging PR 16155 further, I found that the DWARF unwinder found the
function in question, but thought it had no registers saved
(fs->regs.num_regs == 0).

It seems to me that if a frame does not specify the return address
column, or if the return address column is explicitly marked as
DWARF2_FRAME_REG_UNSPECIFIED, then we should set the
"undefined_retaddr" flag and let the DWARF unwinder gracefully stop.

This patch implements that idea.

With this patch the backtrace works properly:

    (gdb) bt
    #0  0x0000007fb7ed485c in nanosleep () from /lib64/libc.so.6
    #1  0x0000007fb7ed4508 in sleep () from /lib64/libc.so.6
    #2  0x00000000004008bc in thread_function (arg=0x4) at threadapply.c:73
    #3  0x0000007fb7fad950 in start_thread () from /lib64/libpthread.so.0
    #4  0x0000007fb7f0956c in clone () from /lib64/libc.so.6

2013-11-22  Tom Tromey  <tromey@redhat.com>

	PR backtrace/16155:
	* dwarf2-frame.c (dwarf2_frame_cache): Set undefined_retaddr if
	the return address column is unspecified.

2013-11-22  Tom Tromey  <tromey@redhat.com>

	* gdb.dwarf2/dw2-bad-cfi.c: New file.
	* gdb.dwarf2/dw2-bad-cfi.exp: New file.
	* gdb.dwarf2/dw2-bad-cfi.S: New file.
monaka pushed a commit that referenced this issue May 15, 2014
Remote servers may cut the connection abruptly since they are not
required to reply to a 'k' (Kill) packet sent from GDB.

This patch addresses any issues arising from such scenario, which
leads to a GDB internal error due to an attempt to pop the target more
than once.  With the patch, this failure is handled gracefully.

Here's the GDB backtrace Maciej got running the testsuite against
QEMU.  Full paths edited out for brevity.

#0  0x55573430 in __kernel_vsyscall ()
#1  0x557a2951 in raise () from /lib32/libc.so.6
#2  0x557a5d82 in abort () from /lib32/libc.so.6
#3  0x0826e2e4 in dump_core ()
    at .../gdb/utils.c:635
#4  0x0826e5b6 in internal_vproblem (problem=0x85200c0,
    file=0x8416be8 ".../gdb/target.c", line=2861,
    fmt=0x84174ac "could not find a target to follow mourn inferior",
    ap=0xffa4796c "\f")
    at .../gdb/utils.c:804
#5  0x0826e5fb in internal_verror (
    file=0x8416be8 ".../gdb/target.c", line=2861,
    fmt=0x84174ac "could not find a target to follow mourn inferior",
    ap=0xffa4796c "\f")
    at .../gdb/utils.c:820
#6  0x0826e633 in internal_error (
    file=0x8416be8 ".../gdb/target.c", line=2861,
    string=0x84174ac "could not find a target to follow mourn inferior")
    at .../gdb/utils.c:830
#7  0x081b4ad0 in target_mourn_inferior ()
    at .../gdb/target.c:2861
#8  0x08082283 in remote_kill (ops=0x85245e0)
    at .../gdb/remote.c:7840
#9  0x081b06d1 in target_kill ()
    at .../gdb/target.c:486
#10 0x081b42f6 in dispose_inferior (inf=0xa501c60, args=0x0)
    at .../gdb/target.c:2570
#11 0x08290cfc in iterate_over_inferiors (
    callback=0x81b42af <dispose_inferior>, data=0x0)
    at .../gdb/inferior.c:396
#12 0x081b435a in target_preopen (from_tty=1)
    at .../gdb/target.c:2591
#13 0x0807c2c6 in remote_open_1 (name=0xa5538b6 "localhost:1237", from_tty=1,
    target=0x85245e0, extended_p=0)
    at .../gdb/remote.c:4292
#14 0x0807b7a8 in remote_open (name=0xa5538b6 "localhost:1237", from_tty=1)
    at .../gdb/remote.c:3655
#15 0x080a23d4 in do_cfunc (c=0xa464f30, args=0xa5538b6 "localhost:1237",
    from_tty=1)
    at .../gdb/cli/cli-decode.c:107
#16 0x080a4c3b in cmd_func (cmd=0xa464f30, args=0xa5538b6 "localhost:1237",
    from_tty=1)
    at .../gdb/cli/cli-decode.c:1882
#17 0x0826bebf in execute_command (p=0xa5538c3 "7", from_tty=1)
    at .../gdb/top.c:467
#18 0x08193f2d in command_handler (command=0xa5538a8 "")
    at .../gdb/event-top.c:435
#19 0x08194463 in command_line_handler (
    rl=0xa778198 "target remote localhost:1237")
    at .../gdb/event-top.c:633
#20 0x082ba92b in rl_callback_read_char ()
    at .../readline/callback.c:220
#21 0x08193adf in rl_callback_read_char_wrapper (client_data=0x0)
    at .../gdb/event-top.c:164
#22 0x08193e57 in stdin_event_handler (error=0, client_data=0x0)
    at .../gdb/event-top.c:375
#23 0x08192f29 in handle_file_event (data=...)
    at .../gdb/event-loop.c:768
#24 0x0819266a in process_event ()
    at .../gdb/event-loop.c:342
#25 0x08192708 in gdb_do_one_event ()
    at .../gdb/event-loop.c:394
#26 0x08192781 in start_event_loop ()
    at .../gdb/event-loop.c:431
#27 0x08193b08 in cli_command_loop (data=0x0)
    at .../gdb/event-top.c:179
#28 0x0818bc26 in current_interp_command_loop ()
    at .../gdb/interps.c:327
#29 0x0818c4e5 in captured_command_loop (data=0x0)
    at .../gdb/main.c:267
#30 0x0818a37f in catch_errors (func=0x818c4d0 <captured_command_loop>,
    func_args=0x0, errstring=0x8402108 "", mask=RETURN_MASK_ALL)
    at .../gdb/exceptions.c:524
#31 0x0818d736 in captured_main (data=0xffa47f10)
    at .../gdb/main.c:1067
#32 0x0818a37f in catch_errors (func=0x818c723 <captured_main>,
    func_args=0xffa47f10, errstring=0x8402108 "", mask=RETURN_MASK_ALL)
    at .../gdb/exceptions.c:524
#33 0x0818d76c in gdb_main (args=0xffa47f10)
    at .../gdb/main.c:1076
#34 0x0804dd1b in main (argc=5, argv=0xffa47fd4)
    at .../gdb/gdb.c:34

The corresponding gdb.log excerpt:

(gdb) PASS: gdb.base/bitfields.exp: bitfield uniqueness (u9)
cont
Continuing.

Breakpoint 1, break1 () at .../gdb/testsuite/gdb.base/bitfields.c:44
44	}
(gdb) PASS: gdb.base/bitfields.exp: continuing to break1 #9
print flags
$10 = {uc = 0 '\000', s1 = 0, u1 = 0, s2 = 0, u2 = 0, s3 = 0, u3 = 0, s9 = 0, u9 = 0, sc = 1 '\001'}
(gdb) PASS: gdb.base/bitfields.exp: bitfield uniqueness (sc)
delete breakpoints
Delete all breakpoints? (y or n) y
(gdb) info breakpoints
No breakpoints or watchpoints.
(gdb) delete breakpoints
(gdb) info breakpoints
No breakpoints or watchpoints.
(gdb) break break2
Breakpoint 2 at 0x85f8: file .../gdb/testsuite/gdb.base/bitfields.c, line 48.
(gdb) entering gdb_reload
target remote localhost:1235
A program is being debugged already.  Kill it? (y or n) y
Remote connection closed
.../gdb/target.c:2861: internal-error: could not find a target to follow mourn inferior
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) ^Ccontinue
Please answer y or n.
.../gdb/target.c:2861: internal-error: could not find a target to follow mourn inferior
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) Resyncing due to internal error.
n
.../gdb/target.c:2861: internal-error: could not find a target to follow mourn inferior
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) y
Command aborted.
(gdb) print/x flags
$11 = {uc = 0x0, s1 = 0x0, u1 = 0x0, s2 = 0x0, u2 = 0x0, s3 = 0x0, u3 = 0x0, s9 = 0x0, u9 = 0x0, sc = 0x0}
(gdb) FAIL: gdb.base/bitfields.exp: bitfield containment #1
cont
The program is not being run.
(gdb) FAIL: gdb.base/bitfields.exp: continuing to break2 (the program is no longer running)
print/x flags
$12 = {uc = 0x0, s1 = 0x0, u1 = 0x0, s2 = 0x0, u2 = 0x0, s3 = 0x0, u3 = 0x0, s9 = 0x0, u9 = 0x0, sc = 0x0}
(gdb) FAIL: gdb.base/bitfields.exp: bitfield containment #2
delete breakpoints
Delete all breakpoints? (y or n) y
(gdb) info breakpoints
No breakpoints or watchpoints.
(gdb) delete breakpoints
(gdb) info breakpoints
No breakpoints or watchpoints.
(gdb) break break3
Breakpoint 3 at 0x8604: file .../gdb/testsuite/gdb.base/bitfields.c, line 52.
(gdb) entering gdb_reload
target remote localhost:1236
Remote debugging using localhost:1236
Reading symbols from .../lib/ld-linux.so.3...done.
Loaded symbols for .../lib/ld-linux.so.3
0x41001b80 in _start () from .../lib/ld-linux.so.3
(gdb) continue
Continuing.

Breakpoint 3, break3 () at .../gdb/testsuite/gdb.base/bitfields.c:52
52	}
(gdb) print flags
$13 = {uc = 0 '\000', s1 = 0, u1 = 1, s2 = 0, u2 = 3, s3 = 0, u3 = 7, s9 = 0, u9 = 511, sc = 0 '\000'}
(gdb) PASS: gdb.base/bitfields.exp: unsigned bitfield ranges

gdb/
2013-12-02  Pedro Alves  <pedro@codesourcery.com>
            Maciej W. Rozycki  <macro@codesourcery.com>

	* remote.c (putpkt_for_catch_errors): Remove function.
	(remote_kill): Handle TARGET_CLOSE_ERROR from the kill packet
	gracefully.
monaka pushed a commit that referenced this issue May 15, 2014
With a simple Ada program where I have 3 functions, one just calling
the next, the backtrace is currently broken when GDB is compiled
at -O2:

   #0  hello.first () at hello.adb:5
   #1  0x0000000100001475 in hello.second () at hello.adb:10
   Backtrace stopped: previous frame inner to this frame (corrupt stack?)

It turns out that a recent patch deleted the assignment of variable
this_id, making it an unitialized variable:

        * frame-unwind.c (default_frame_unwind_stop_reason): Return
        UNWIND_OUTERMOST if the frame's ID is outer_frame_id.
        * frame.c (get_prev_frame_1): Remove outer_frame_id check.

The hunk in question starts with:

-  /* Check that this frame is not the outermost.  If it is, don't try
-     to unwind to the prev frame.  */
-  this_id = get_frame_id (this_frame);
-  if (frame_id_eq (this_id, outer_frame_id))

(the code was removed as redundant - but removing the assignment
was in fact not intentional).

There is no other code in this function that sets the variable.
Instead of re-adding the statement in the lone section where it is
actually used, I inlined it, and then got rid of the variable
altogether.  This way, and until we start needing this frame ID
in another location within that function, we dont' have to worry
about the variable's validity/lifetime.

gdb/ChangeLog:

        * frame.c (get_prev_frame_1): Delete variable "this_id".
        Replace its use by a call to get_frame_id.
monaka pushed a commit that referenced this issue May 15, 2014
Doing "info frame" in the outermost frame, when that was indicated by
the next frame saying the unwound PC is undefined/not saved, results
in error and incomplete output:

 (gdb) bt
 #0  thread_function0 (arg=0x0) at threads.c:63
 #1  0x00000034cf407d14 in start_thread (arg=0x7ffff7fcb700) at pthread_create.c:309
 #2  0x000000323d4f168d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

 (gdb) frame 2
 #2  0x000000323d4f168d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
 115             call    *%rax

 (gdb) info frame
 Stack level 2, frame at 0x0:
  rip = 0x323d4f168d in clone (../sysdeps/unix/sysv/linux/x86_64/clone.S:115); saved rip Register 16 was not saved
 (gdb)

Not saved register values are treated as optimized out values
internally throughout.  stack.c:frame_info is handing unvailable
values, but not optimized out ones.  The patch deletes the
frame_unwind_caller_pc_if_available wrapper function and instead lets
errors propagate to frame_info (it's only user).

As frame_unwind_pc now needs to be able to handle and cache two
different error scenarios, the prev_pc.p variable is replaced with an
enumeration.

(FWIW, I looked into making gdbarch_unwind_pc or a variant return
struct value's instead, but it results in lots of boxing and unboxing
for no real gain -- e.g., the mips and arm implementations need to do
computation on the unboxed PC value.  Might as well throw an error on
first attempt to get at invalid contents.)

After the patch, we get:

 (gdb) info frame
 Stack level 2, frame at 0x0:
  rip = 0x323d4f168d in clone (../sysdeps/unix/sysv/linux/x86_64/clone.S:115); saved rip = <not saved>
  Outermost frame: outermost
  caller of frame at 0x7ffff7fcafc0
  source language asm.
  Arglist at 0x7ffff7fcafb8, args:
  Locals at 0x7ffff7fcafb8, Previous frame's sp is 0x7ffff7fcafc8
 (gdb)

A new test is added.  It's based off dw2-reg-undefined.exp, and tweaked to
mark the return address (rip) of "stop_frame" as undefined.

Tested on x86_64 Fedora 17.

gdb/
2013-12-06  Pedro Alves  <palves@redhat.com>

	* frame.c (enum cached_copy_status): New enum.
	(struct frame_info) <prev_pc.p>: Change type to enum
	cached_copy_status.
	(fprint_frame): Handle not saved and unavailable prev_pc values.
	(frame_unwind_pc_if_available): Delete and merge contents into ...
	(frame_unwind_pc): ... here.  Handle OPTIMIZED_OUT_ERROR.  Adjust
	to use enum cached_copy_status.
	(frame_unwind_caller_pc_if_available): Delete.
	(create_new_frame): Adjust.
	* frame.h (frame_unwind_caller_pc_if_available): Delete
	declaration.
	* stack.c (frame_info): Use frame_unwind_caller_pc instead of
	frame_unwind_caller_pc_if_available, and handle
	NOT_AVAILABLE_ERROR and OPTIMIZED_OUT_ERROR errors.
	* valprint.c (val_print_optimized_out): Use val_print_not_saved.
	(val_print_not_saved): New function.
	* valprint.h (val_print_not_saved): Declare.

gdb/testsuite/
2013-12-06  Pedro Alves  <palves@redhat.com>

	* gdb.dwarf2/dw2-undefined-ret-addr.S: New file.
	* gdb.dwarf2/dw2-undefined-ret-addr.c: New file.
	* gdb.dwarf2/dw2-undefined-ret-addr.exp: New file.
monaka pushed a commit that referenced this issue May 15, 2014
We observed on Windows 2012 that we were unable to unwind past
exception handlers. For instance, with any Ada program raising
an exception that does not get handled:

    % gnatmake -g a -bargs -shared
    % gdb a
    (gdb) start
    (gdb) catch exception unhandled
    Catchpoint 2: unhandled Ada exceptions
    (gdb) c
    Catchpoint 2, unhandled CONSTRAINT_ERROR at <__gnat_unhandled_exception> (
        e=0x645ff820 <constraint_error>) at s-excdeb.adb:53
    53      s-excdeb.adb: No such file or directory.

At this point, we can already see that something went wrong, since
the frame selected by the debugger corresponds to a runtime function
rather than the function in the user code that caused the exception
to be raised (in our case procedure A).

This is further confirmed by the fact that we are unable to unwind
all the way to procedure A:

    (gdb) bt
    #0  <__gnat_unhandled_exception> (e=0x645ff820 <constraint_error>)
        at s-excdeb.adb:53
    #1  0x000000006444e9a3 in <__gnat_notify_unhandled_exception> (excep=0x284d2
+0)
        at a-exextr.adb:144
    #2  0x00000000645f106a in __gnat_personality_imp ()
       from C:\[...]\libgnat-7.3.dll
    #3  0x000000006144d1b7 in _GCC_specific_handler (ms_exc=0x242fab0,
        this_frame=0x242fe60, ms_orig_context=0x242f5c0, ms_disp=0x242ef70,
        gcc_per=0x645f0960 <__gnat_personality_imp>)
        at ../../../src/libgcc/unwind-seh.c:289
    #4  0x00000000645f1211 in __gnat_personality_seh0 ()
       from C:\[...]\libgnat-7.3.dll
    #5  0x000007fad3879f4d in ?? ()
    Backtrace stopped: previous frame inner to this frame (corrupt stack?)

It turns out that the unwinder has been doing its job flawlessly
up until frame #5. The address in frame #5 is correct, but GDB
is not able to associate it with any symbol or unwind record.

And this is because this address is inside ntdll.dll, and when
we received the LOAD_DLL_DEBUG_EVENT for that DLL, the system
was not able to tell us the name of the library, thus causing us
to silently ignoring the event. Because GDB does not know about
ntdll.dll, it is unable to access the unwind information from it.
And because the function at that address does not use a frame
pointer, the unwinding becomes impossible.

This patch helps recovering ntdll.dll at the end of the "run/attach"
phase, simply by trying to locate that specific DLL again.

In terms of our medium to long term planning, it seems to me that
we should be able to simplify the code by ignoring LOAD_DLL_DEBUG_EVENT
during the startup phase, and modify windows_ensure_ntdll_loaded
to then detect and report all shared libraries after we've finished
inferior creation.  But for a change just before 7.7 branch creation,
I thought it was safest to just handle ntdll.dll specifically. This
is less intrusive, and ntdll is the only DLL affected by the problem
I know so far.

gdb/ChangeLog:

	* windows-nat.c (handle_load_dll): Add comments.
        (windows_ensure_ntdll_loaded): New function.
	(do_initial_windows_stuff): Use windows_ensure_ntdll_loaded.
        Add FIXME comment.
monaka pushed a commit that referenced this issue May 15, 2014
abfd->section_count unexpectedly changes between 218 and 248 in:

150 bfd_simple_get_relocated_section_contents (bfd *abfd,
[...]
218   saved_offsets = malloc (sizeof (struct saved_output_info)
219                           * abfd->section_count);
[...]
230	  _bfd_generic_link_add_symbols (abfd, &link_info);
[...]
248   bfd_map_over_sections (abfd, simple_restore_output_info, saved_offsets);

_bfd_generic_link_add_symbols increases section_count

and simple_restore_output_info later reads unallocated part of saved_offsets.

READ of size 8 at 0x601c0000c5c0 thread T0
    #0 0x1124770 in simple_restore_output_info (.../gdb/gdb+0x1124770)
    #1 0x10ecd51 in bfd_map_over_sections (.../gdb/gdb+0x10ecd51)
    #2 0x1125150 in bfd_simple_get_relocated_section_contents (.../gdb/gdb+0x1125150)

bfd/
2014-02-17  Jan Kratochvil  <jan.kratochvil@redhat.com>

	PR binutils/16595
	* simple.c (struct saved_offsets): New.
	(simple_save_output_info): Use it for ptr.
	(simple_restore_output_info): Use it for ptr.  Check section_count.
	(bfd_simple_get_relocated_section_contents): Use it for saved_offsets.
monaka pushed a commit that referenced this issue May 15, 2014
binutils readelf -wi:
 <4><a2>: Abbrev Number: 26 (DW_TAG_inlined_subroutine)
    <a3>   DW_AT_abstract_origin: <0x5a>
    <a7>   DW_AT_low_pc      : 0x400590
    <ab>   DW_AT_high_pc     : 0x4
    <af>   DW_AT_call_file   : 1
    <b0>   DW_AT_call_line   : 20
    <b1>   DW_AT_sibling     : <0xb8>
 <2><b8>: Abbrev Number: 35 (DW_TAG_inlined_subroutine)
    <b9>   DW_AT_abstract_origin: <0x5a>
    <bd>   DW_AT_low_pc      : 0x400590
    <c1>   DW_AT_high_pc     : 0x4
    <c5>   DW_AT_call_file   : 1
    <c6>   DW_AT_call_line   : 29

<b1> DW_AT_sibling points to the next DIE - but that DIE is 2 levels
upwards - definitely not a sibling.  This confuses GDB up to a crash:

==32143== ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6024000198ac at pc 0xb4d104 bp 0x7fff63e96e70 sp
0x7fff63e96e60
READ of size 1 at 0x6024000198ac thread T0
    #0 0xb4d103 in read_unsigned_leb128 (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0xb4d103)
    #1 0xb15f3c in peek_die_abbrev (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0xb15f3c)
    #2 0xb46185 in load_partial_dies (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0xb46185)
    #3 0xb103fb in process_psymtab_comp_unit_reader (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0xb103fb)
    #4 0xb0d2a9 in init_cutu_and_read_dies (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0xb0d2a9)
    #5 0xb1115f in process_psymtab_comp_unit (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0xb1115f)
    #6 0xb1235f in dwarf2_build_psymtabs_hard (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0xb1235f)
    #7 0xb05536 in dwarf2_build_psymtabs (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0xb05536)
    #8 0x86d5a5 in read_psyms (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0x86d5a5)
    #9 0x9b1c37 in require_partial_symbols (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0x9b1c37)
    #10 0x9bf2d0 in read_symbols (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0x9bf2d0)
    #11 0x9c014c in syms_from_objfile_1 (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0x9c014c)

gdb/testsuite/
2014-02-25  Jan Kratochvil  <jan.kratochvil@redhat.com>

	Fix dw2-icycle.exp -fsanitize=address GDB crash.
	* gdb.dwarf2/dw2-icycle.S: Remove all DW_AT_sibling.

Message-ID: <20140224201011.GA28926@host2.jankratochvil.net>
monaka pushed a commit that referenced this issue May 15, 2014
…T_WAITKIND_NO_RESUMED).

GDBserver currently hangs forever in waitpid if the leader thread
exits before other threads, or if all resumed threads exit - e.g.,
next over a thread exit with sched-locking on.  This is exposed by
leader-exit.exp.  leader-exit.exp is part of a series of tests for a
set of related problems.  See
<http://www.sourceware.org/ml/gdb-patches/2011-10/msg00704.html>:

 "
 To recap, on the Linux kernel, ptrace/waitpid don't allow reaping the
 leader thread until all other threads in the group are reaped.  When
 the leader exits, it goes zombie, but waitpid will not return an exit
 status until the other threads are gone.  This is presently exercised
 by the gdb.threads/leader-exit.exp test.  The fix for that test, in
 linux-nat.c:wait_lwp, handles the case where we see the leader gone
 when we're stopping all threads to report an event to some other
 thread to the core.

 (...)

 The latter bit about not blocking if there no resumed threads in the
 process also applies to some other thread exiting, not just the main
 thread.  E.g., this test starts a thread, and runs to a breakpoint in
 that thread:

 ...
 (gdb) c
 Continuing.
 [New Thread 0x7ffff75a4700 (LWP 23397)]
 [Switching to Thread 0x7ffff75a4700 (LWP 23397)]

 Breakpoint 2, thread_a (arg=0x0) at ../../../src/gdb/testsuite/gdb.threads/no-unwaited-for-left.c:28
 28        return 0; /* break-here */
 (gdb) info threads
 * 2 Thread 0x7ffff75a4700 (LWP 23397)  thread_a (arg=0x0) at ../../../src/gdb/testsuite/gdb.threads/no-unwaited-for-left.c:28
   1 Thread 0x7ffff7fcb720 (LWP 23391)  0x00007ffff7bc606d in pthread_join (threadid=140737343276800, thread_return=0x0) at pthread_join.c:89

 The thread will exit as soon as we resume it.  But if we only resume
 that thread, leaving the rest of the threads stopped:

 (gdb) set scheduler-locking on
 (gdb) c
 Continuing.
 ^C^C^C^C^C^C^C^C
 "

This patch fixes the issues by implementing TARGET_WAITKIND_NO_RESUMED
on GDBserver, similarly to what the patch above did for native
Linux GDB.

gdb.threads/leader-exit.exp now passes.

gdb.threads/no-unwaited-for-left.exp now at least errors out instead
of hanging:

 continue
 Continuing.
 warning: Remote failure reply: E.No unwaited-for children left.

 [Thread 15454] #1 stopped.
 0x00000034cf408e60 in pthread_join (threadid=140737353922368, thread_return=0x0) at pthread_join.c:93
 93          lll_wait_tid (pd->tid);
 (gdb) FAIL: gdb.threads/no-unwaited-for-left.exp: continue stops when the main thread exits

The gdb.threads/non-ldr-exc-*.exp tests are skipped because GDBserver
unfortunately doesn't support fork/exec yet, but I'm confident this
fixes the related issues.

I'm leaving modeling TARGET_WAITKIND_NO_RESUMED in the RSP for a
separate pass.

(BTW, in case of error in response to a vCont, it would be better for
GDB to query the target for the current thread, or re-select one,
instead of assuming current inferior_ptid is still the selected
thread.)

This implementation is a little different from GDB's, because I'm
avoiding bringing in more of this broken use of waitpid(PID) into
GDBserver.  Specifically, this avoids waitpid(PID) when stopping all
threads.  There's really no need for wait_for_sigstop to wait for each
LWP in turn.  Instead, with some refactoring, we make it reuse
linux_wait_for_event.

gdb/gdbserver/
2014-02-27  Pedro Alves  <palves@redhat.com>

	PR 12702
	* inferiors.h (A_I_NEXT, ALL_INFERIORS_TYPE, ALL_PROCESSES): New
	macros.
	* linux-low.c (delete_lwp, handle_extended_wait): Add debug
	output.
	(last_thread_of_process_p): Take a PID argument instead of a
	thread pointer.
	(linux_wait_for_lwp): Delete.
	(num_lwps, check_zombie_leaders, not_stopped_callback): New
	functions.
	(linux_low_filter_event): New function, party factored out from
	linux_wait_for_event.
	(linux_wait_for_event): Rename to ...
	(linux_wait_for_event_filtered): ... this.  Add new filter ptid
	argument.  Partly rewrite.  Always use waitpid(-1, WNOHANG) and
	sigsuspend.  Check for zombie leaders.
	(linux_wait_for_event): Reimplement as wrapper around
	linux_wait_for_event_filtered.
	(linux_wait_1): Handle TARGET_WAITKIND_NO_RESUMED.  Assume that if
	a normal or signal exit is seen, it's the whole process exiting.
	(wait_for_sigstop): No longer a for_each_inferior callback.
	Rewrite on top of linux_wait_for_event_filtered.
	(stop_all_lwps): Call wait_for_sigstop directly.
	* server.c (resume, handle_target_event): Handle
	TARGET_WAITKIND_NO_RESUMED.
monaka pushed a commit that referenced this issue May 15, 2014
runtest gdb.base/corefile.exp

==23174== ERROR: AddressSanitizer: heap-use-after-free on address 0x604400008c88 at pc 0x68f0be bp 0x7fffae9d7490 sp
0x7fffae9d7480
READ of size 8 at 0x604400008c88 thread T0
    #0 0x68f0bd in svr4_read_so_list (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0x68f0bd)
    #1 0x68f64e in svr4_current_sos_direct (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0x68f64e)
    #2 0x68f757 in svr4_current_sos (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0x68f757)
    #3 0xcebbff in update_solib_list (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0xcebbff)
0x604400008c88 is located 8 bytes inside of 1104-byte region [0x604400008c80,0x6044000090d0)
freed by thread T0 here:
    #0 0x7f52677500f9 (/lib64/libasan.so.0+0x160f9)
    #1 0xd2c68a in xfree (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0xd2c68a)
    #2 0xceb364 in free_so (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0xceb364)
    #3 0xca59f8 in do_free_so (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0xca59f8)
    #4 0x93432a in do_my_cleanups (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0x93432a)
    #5 0x934406 in do_cleanups (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0x934406)
    #6 0x68efa9 in svr4_read_so_list (/home/jkratoch/redhat/gdb-clean/gdb/gdb+0x68efa9)

I did not notice it during my review in:
	Re: [PATCH v2] Skip vDSO when reading SO list (PR 8882)
	https://sourceware.org/ml/gdb-patches/2013-09/msg00888.html

gdb/
2014-02-27  Jan Kratochvil  <jan.kratochvil@redhat.com>

	Additional PR 8882 fix.
	* solib-svr4.c (svr4_read_so_list): Change first to first_l_name.

Message-ID: <20140226220918.GA10431@host2.jankratochvil.net>
monaka pushed a commit that referenced this issue May 15, 2014
With target async enabled, py-finish-breakpoint.exp triggers an
assertion failure.

The failure occurs because execute_command re-enters the event loop in
some circumstances, and in this case resets the sync_execution flag.
Then later GDB reaches this assertion in normal_stop:

      gdb_assert (sync_execution || !target_can_async_p ());

In detail:

#1 - A synchronous execution command is run.  sync_execution is set.

#2 - A python breakpoint is hit (TARGET_WAITKIND_STOPPED), and the
     corresponding Python breakpoint's stop method is executed.  When
     and while python commands are executed, interpreter_async is
     forced to 0.

#3 - The Python stop method happens to execute a not-execution-related
     gdb command.  In this case, "where 1".

#4 - Seeing that sync_execution is set, execute_command nests a new
     event loop (although that wasn't necessary; this is the problem).

#5 - The linux-nat target's pipe in the event loop happens to be
     marked.  That's normal, due to this in linux_nat_wait:

     /* If we requested any event, and something came out, assume there
        may be more.  If we requested a specific lwp or process, also
        assume there may be more.  */

     The nested event loop thus immediately wakes up and calls
     target_wait.  No thread is actually executing in the inferior, so
     the target returns TARGET_WAITKIND_NO_RESUMED.

#6 - normal_stop is reached.  GDB prints "No unwaited-for children
     left.", and resets the sync_execution flag (IOW, there are no
     resumed threads left, so the synchronous command is considered
     completed.)  This is already bogus.  We were handling a
     breakpoint!

#7 - the nested event loop unwinds/ends.  GDB is now back to handling
     the python stop method (TARGET_WAITKIND_STOPPED), which decides
     the breakpoint should stop.  normal_stop is called for this
     event.  However, normal_stop actually works with the _last_
     reported target status:

   void
   normal_stop (void)
   {
    struct target_waitstatus last;
    ptid_t last_ptid;
    struct cleanup *old_chain = make_cleanup (null_cleanup, NULL);

    ...
    get_last_target_status (&last_ptid, &last);

    ...
    if (last.kind == TARGET_WAITKIND_NO_RESUMED)
      {
        gdb_assert (sync_execution || !target_can_async_p ());

        target_terminal_ours_for_output ();
        printf_filtered (_("No unwaited-for children left.\n"));
      }

   And due to the nesting in execute command, the last event is now
   TARGET_WAITKIND_NO_RESUMED, not the actual breakpoint event being
   handled.  This could be seen to be broken in itself, but we can
   leave fixing that for another pass.  The assertion is reached, and
   fails.

execute_command has a comment explaining when it should synchronously
wait for events:

      /* If the interpreter is in sync mode (we're running a user
	 command's list, running command hooks or similars), and we
	 just ran a synchronous command that started the target, wait
	 for that command to end.  */

However, the code did not follow this comment -- it didn't check to
see if the command actually started the target, just whether the
target was executing a sync command at this point.

This patch fixes the problem by noting whether the target was
executing in sync_execution mode before running the command, and then
augmenting the condition to test this as well.

2014-03-20  Tom Tromey  <tromey@redhat.com>

	PR gdb/14135
	* top.c (execute_command): Only dispatch events if the command
	started the target.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant