Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save and restore frame pointer across Iextcall on ARM64 #13079

Merged
merged 1 commit into from
Apr 11, 2024

Conversation

tmcgilchrist
Copy link
Contributor

This PR restores full backtraces on macOS ARM64 with lldb.

In OCaml 5.1.1 lldb looses the full backtrace once it reaches OCaml code as shown in this lldb session using this example program:

(* fib.ml *)
let rec fib n =
  if n = 0 then 0
  else if n = 1 then 1
  else fib (n-1) + fib (n-2)

let main () =
  let a = 10 in
  let r = fib a in
  Printf.printf "fib(%d) = %d" a r

let _ = main ()
(lldb) br list
Current breakpoints:
1: name = 'caml_program', locations = 1, resolved = 1, hit count = 1
  1.1: where = fib-5.1.1.exe`caml_program, address = 0x0000000100003c10, resolved, hit count = 1 

3: address = fib-5.1.1.exe[0x0000000100006930], locations = 1
  3.1: where = fib-5.1.1.exe`camlFib.main_271, address = 0x0000000100006930, unresolved, hit count = 0 

(lldb) run
Process 64477 launched: '/Users/tsmc/projects/ocaml/fib-5.1.1.exe' (arm64)
Process 64477 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100003c10 fib-5.1.1.exe`caml_program
fib-5.1.1.exe`caml_program:
->  0x100003c10 <+0>:  ldr    x16, [x28, #0x28]
    0x100003c14 <+4>:  add    x16, x16, #0x148
    0x100003c18 <+8>:  cmp    sp, x16
    0x100003c1c <+12>: b.lo   0x100003c00               ; caml_startup.code_begin + 8
Target 0: (fib-5.1.1.exe) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100003c10 fib-5.1.1.exe`caml_program
    frame #1: 0x0000000100060290 fib-5.1.1.exe`caml_start_program + 132
    frame #2: 0x000000010005fb2c fib-5.1.1.exe`caml_startup_common(argv=0x000000016fdff330, pooling=<unavailable>) at startup_nat.c:132:9 [opt]
    frame #3: 0x000000010005fb90 fib-5.1.1.exe`caml_main [inlined] caml_startup_exn(argv=<unavailable>) at startup_nat.c:139:10 [opt]
    frame #4: 0x000000010005fb88 fib-5.1.1.exe`caml_main [inlined] caml_startup(argv=<unavailable>) at startup_nat.c:144:15 [opt]
    frame #5: 0x000000010005fb88 fib-5.1.1.exe`caml_main(argv=<unavailable>) at startup_nat.c:151:3 [opt]
    frame #6: 0x000000010004ebd4 fib-5.1.1.exe`main(argc=<unavailable>, argv=<unavailable>) at main.c:37:3 [opt]
    frame #7: 0x0000000197c860e0 dyld`start + 2360
(lldb) c
Process 64477 resuming
Process 64477 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 3.1
    frame #0: 0x0000000100006930 fib-5.1.1.exe`camlFib.main_271
fib-5.1.1.exe`camlFib.main_271:
->  0x100006930 <+0>:  ldr    x16, [x28, #0x28]
    0x100006934 <+4>:  add    x16, x16, #0x158
    0x100006938 <+8>:  cmp    sp, x16
    0x10000693c <+12>: b.lo   0x100006920               ; camlFib.fib_269 + 128
Target 0: (fib-5.1.1.exe) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 3.1
  * frame #0: 0x0000000100006930 fib-5.1.1.exe`camlFib.main_271
    frame #1: 0x0000000100006a1c fib-5.1.1.exe`camlFib.entry + 108
    frame #2: 0x0000000100003dec fib-5.1.1.exe`caml_program + 476
    frame #3: 0x0000000100060290 fib-5.1.1.exe`caml_start_program + 132
    frame #4: 0x0000000100003dec fib-5.1.1.exe`caml_program + 476

With this change the full backtrace is present:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 3.1
  * frame #0: 0x0000000100004420 fib.exe`camlFib.fib_270
    frame #1: 0x00000001000044dc fib.exe`camlFib.main_272 + 44
    frame #2: 0x00000001000045ac fib.exe`camlFib.entry + 124
    frame #3: 0x000000010000196c fib.exe`caml_program + 476
    frame #4: 0x00000001000640b4 fib.exe`caml_start_program + 132
    frame #5: 0x0000000100063948 fib.exe`caml_startup_common(argv=0x000000016fdff340, pooling=<unavailable>) at startup_nat.c:128:9 [opt]
    frame #6: 0x00000001000639ac fib.exe`caml_main [inlined] caml_startup_exn(argv=<unavailable>) at startup_nat.c:135:10 [opt]
    frame #7: 0x00000001000639a4 fib.exe`caml_main [inlined] caml_startup(argv=<unavailable>) at startup_nat.c:140:15 [opt]
    frame #8: 0x00000001000639a4 fib.exe`caml_main(argv=<unavailable>) at startup_nat.c:147:3 [opt]
    frame #9: 0x000000010004e640 fib.exe`main(argc=<unavailable>, argv=<unavailable>) at main.c:37:3 [opt]
    frame #10: 0x0000000197c860e0 dyld`start + 2360

Additionally on linux arm64 using gdb, adding CFI_SIGNAL_FRAME to caml_start_program replaces that frame with <signal handler called> :

(gdb) bt
#0  camlFib.main_272 () at fib.ml:7
#1  0x0000aaaaaaaf1584 in camlFib.entry () at fib.ml:12
#2  0x0000aaaaaaaeea44 in caml_program ()
#3  <signal handler called>
#4  0x0000aaaaaab4a178 in caml_startup_common (pooling=-1430712296, argv=0x10) at runtime/startup_nat.c:128
#5  caml_startup_common (argv=0x10, pooling=-1430712296) at runtime/startup_nat.c:87
#6  0x0000aaaaaab4a1f0 in caml_startup_exn (argv=<optimized out>) at runtime/startup_nat.c:135
#7  caml_startup (argv=<optimized out>) at runtime/startup_nat.c:140
#8  caml_main (argv=<optimized out>) at runtime/startup_nat.c:147
#9  0x0000aaaaaaaee6d0 in main (argc=<optimized out>, argv=<optimized out>) at runtime/main.c:37

This matches the backtraces generated on linux x86_64 but is not necessary to provide a full backtrace. Linux LLDB is unaffected by this change, it already had working backtraces.

asmcomp/arm64/emit.mlp Outdated Show resolved Hide resolved
asmcomp/arm64/emit.mlp Outdated Show resolved Hide resolved
@tmcgilchrist
Copy link
Contributor Author

More context on how this is working and why I ended up with this implementation.

By default macOS has frame pointers enabled for clang/Xcode on ARM64, except for leaf functions. I haven't found a quotable documentation link for this but it can be determined experimentally by either compiling a simple C program or running lldb on the initial C parts of the runtime (startup_nat.c).

Given this backtrace:

(lldb) br s -n caml_program
Breakpoint 3: where = fib.exe`caml_program, address = 0x0000000100001790
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100001790 fib.exe`caml_program
    frame #1: 0x00000001000640b4 fib.exe`caml_start_program + 132
    frame #2: 0x0000000100063948 fib.exe`caml_startup_common(argv=0x000000016fdff340, pooling=<unavailable>) at startup_nat.c:128:9 [opt]
    frame #3: 0x00000001000639ac fib.exe`caml_main [inlined] caml_startup_exn(argv=<unavailable>) at startup_nat.c:135:10 [opt]
    frame #4: 0x00000001000639a4 fib.exe`caml_main [inlined] caml_startup(argv=<unavailable>) at startup_nat.c:140:15 [opt]
    frame #5: 0x00000001000639a4 fib.exe`caml_main(argv=<unavailable>) at startup_nat.c:147:3 [opt]
    frame #6: 0x000000010004e640 fib.exe`main(argc=<unavailable>, argv=<unavailable>) at main.c:37:3 [opt]
    frame #7: 0x0000000197c860e0 dyld`start + 2360

What is happening in caml_program? Caml_program corresponds to the entry_point function in cmm_helpers.ml which appears to run the intialisation code for things like camlCamlinternalFormatBasics.entry camlStdlib.entry etc. The assembly generated for each has a sequence where it copies sp to x29, then copies x16 into sp and then expects to restore sp from x29. Excerpt below:

mov      x29, sp
ldr      x16, [x28, #0x40]
mov      sp, x16
bl       0x100064440               ; symbol stub for: caml_system__code_end + 168
mov      sp, x29

At this point lldb unwinding breaks because it can't find the previous x29 value corresponding to frame #2. log enable lldb unwind provides verbose diagnotics of how unwinding and frames are calculated. Included below is the failing section from those logs:

    th1/fr3 CFA is 0x140030010: Register fp (29) contents are 0x140030000, offset is 16
    th1/fr3 supplying caller's saved pc (32)'s location using arm64-apple-darwin default unwind plan UnwindPlan
    th1/fr3 supplying caller's register pc (32) from the stack, saved at CFA plus offset -8 [saved at 0x140030008]
    th1/fr3 trying to unwind from this function with the UnwindPlan 'arm64-apple-darwin default unwind plan' because UnwindPlan 'EmulateInstructionARM64' failed.
    th1/fr3 supplying caller's saved fp (29)'s location using arm64-apple-darwin default unwind plan UnwindPlan
    th1/fr3 supplying caller's register fp (29) from the stack, saved at CFA plus offset -16 [saved at 0x140030000]
    th1/fr3 supplying caller's saved pc (32)'s location, cached
    th1/fr3 supplying caller's saved pc (32)'s location, cached
     th1/fr4 pc = 0x100002fdc
    th1/fr3 supplying caller's saved fp (29)'s location, cached
     th1/fr4 fp = 0x0
    th1/fr3 supplying caller's saved sp (31)'s location using arm64-apple-darwin default unwind plan UnwindPlan
    th1/fr3 did not supply reg location for sp (31) because it is volatile
     th1/fr4 with pc value of 0x100002fdc, symbol name is 'caml_program'
     th1/fr4 Backing up the pc value of 0x100002fdc by 1 and re-doing symbol lookup; old symbol was caml_program
     th1/fr4 Symbol is now caml_program
     th1/fr4 Using full unwind plan 'eh_frame CFI'
     th1/fr4 active row: 0x0000000100002fd4: CFA=sp+16 => lr=[CFA-8]
    th1/fr3 supplying caller's saved sp (31)'s location using arm64-apple-darwin default unwind plan UnwindPlan
    th1/fr3 did not supply reg location for sp (31) because it is volatile
     th1/fr4 failed to get cfa
     Frame 4 invalid RegisterContext for this frame, stopping stack walk
     th1/fr4 no save location for fp (29) via 'eh_frame CFI'
    th1/fr3 supplying caller's saved fp (29)'s location, cached
     th1/fr4 requested caller's saved PC but this UnwindPlan uses a RA reg; getting lr (30) instead
     th1/fr4 supplying caller's saved lr (30)'s location using eh_frame CFI UnwindPlan
     th1/fr4 supplying caller's register lr (30) from the stack, saved at CFA plus offset -8 [saved at 0x1400300a8]
     th1/fr4 requested caller's saved PC but this UnwindPlan uses a RA reg; getting lr (30) instead
     th1/fr4 supplying caller's saved lr (30)'s location using eh_frame CFI UnwindPlan
     th1/fr4 supplying caller's register lr (30) from the stack, saved at CFA plus offset -8 [saved at 0x1400300a8]
      th1/fr5 pc = 0x0
     th1/fr4 no save location for fp (29) via 'eh_frame CFI'
    th1/fr3 supplying caller's saved fp (29)'s location, cached
      th1/fr5 fp = 0x0
     th1/fr4 supplying caller's saved sp (31)'s location, cached
      th1/fr5 sp = 0x1400300b0
      th1/fr5 this frame has a pc of 0x0
      Frame 5 invalid RegisterContext for this frame, stopping stack walk
 th1 Unwind of this thread is complete.
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step over
  * frame #0: 0x0000000100064440 fib.exe`caml_system__code_end + 168
    frame #1: 0x0000000100006f34 fib.exe`camlCamlinternalFormatBasics.entry + 44
    frame #2: 0x0000000100002fdc fib.exe`caml_program + 28
    frame #3: 0x00000001000640b4 fib.exe`caml_start_program + 132
    frame #4: 0x0000000100002fdc fib.exe`caml_program + 28

At this point I am not 100% certain which Unwind plan is failing. But you can clearly see th1/fr4 no save location for fp (29) via 'eh_frame CFI' and failure to setup a RegisterContext. At this point I could have either, 1. debugged further into what CFI information needs to be emitted or 2 saved the value x29. I did try the former option, which seems to be what the amd64/emit.mlp is essentially doing by stashing the frame pointer into %rbx and restoring it later. I wasn't able to get this to work. So I went with the option of saving / restoring x29 which works on macOS arm64 and doesn't break linux arm64 at the cost of a few extra instructions.

@kayceesrk
Copy link
Contributor

I had a conversation with @tmcgilchrist on a different channel. I learnt a few things about this PR, which others may appreciate.

This work is part of the larger effort to improve lldb debugging support on macOS Arm64 platform. Note that gdb does not work on macOS Arm64 and lldb is the only native debugger. The experience of lldb on macOS Arm64 is considerably worse than debugging with gdb on Linux. Given that macOS with Apple Silicon is a popular developer machine, it is useful to put effort into improving the lldb debugging support on macOS Arm64 backend.

Reading the PR message, which mentions the frame pointer register, I assumed that this PR improves backtraces on Arm64 in frame pointer mode. In fact, Arm64 does not yet have support for frame pointers and the PR improves backtraces without them. What is interesting is that, unlike Amd64 backend, on arm64, the frame pointer register x29 is not used as a general purpose register by OCaml code in the non-frame-pointer mode (the only model on Arm64 backend). The only place in the Arm64 codegen where we use the frame pointer register x29 is used is in the fast path of the Iextcall (no alloc, no stack arguments). Given that x29 is unused by the OCaml codegen and is a callee-saved register (in C), x29 is used to store the OCaml stack pointer.

This PR saves and restores x29 around the fast Iextcall. The slower paths caml_c_call and caml_c_call_stack_args already save and restore x29.

What I still find surprising is that, lldb is able to produce better backtraces now for the nested recursive calls of fib.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
  * frame #0: 0x0000000100004424 a.out`camlFib.fib_270 + 4
    frame #1: 0x0000000100004474 a.out`camlFib.fib_270 + 84
    frame #2: 0x0000000100004484 a.out`camlFib.fib_270 + 100
    frame #3: 0x0000000100004474 a.out`camlFib.fib_270 + 84
    frame #4: 0x0000000100004474 a.out`camlFib.fib_270 + 84
    frame #5: 0x0000000100004474 a.out`camlFib.fib_270 + 84
    frame #6: 0x00000001000044dc a.out`camlFib.main_272 + 44
    frame #7: 0x00000001000045ac a.out`camlFib.entry + 124
    frame #8: 0x000000010000196c a.out`caml_program + 476
    frame #9: 0x00000001000640b4 a.out`caml_start_program + 132
    frame #10: 0x0000000100063948 a.out`caml_startup_common(argv=0x000000016fdff3f8, pooling=<unavailable>) at startup_nat.c:128:9 [opt]
    frame #11: 0x00000001000639ac a.out`caml_main [inlined] caml_startup_exn(argv=<unavailable>) at startup_nat.c:135:10 [opt]
    frame #12: 0x00000001000639a4 a.out`caml_main [inlined] caml_startup(argv=<unavailable>) at startup_nat.c:140:15 [opt]
    frame #13: 0x00000001000639a4 a.out`caml_main(argv=<unavailable>) at startup_nat.c:147:3 [opt]
    frame #14: 0x000000010004e640 a.out`main(argc=<unavailable>, argv=<unavailable>) at main.c:37:3 [opt]
    frame #15: 0x0000000184cb20e0 dyld`start + 2360

On trunk, I see

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
  * frame #0: 0x0000000100003810 a.out`camlFib__code_begin + 8
    frame #1: 0x000000010000384c a.out`camlFib__code_begin + 68
    frame #2: 0x000000010000384c a.out`camlFib__code_begin + 68
    frame #3: 0x000000010000385c a.out`camlFib__code_begin + 84
    frame #4: 0x000000010000384c a.out`camlFib__code_begin + 68
    frame #5: 0x0000000100003894 a.out`camlFib__main_269 + 28
    frame #6: 0x0000000100003908 a.out`camlFib__entry + 48
    frame #7: 0x0000000100000e04 a.out`caml_program + 396
    frame #8: 0x000000010004df50 a.out`caml_start_program + 104
    frame #9: 0x0000000100029c34 a.out`caml_startup_common(argv=0x000000000000000f, pooling=<unavailable>) at startup_nat.c:160:9 [opt]
    frame #10: 0x0000000100029ca8 a.out`caml_main [inlined] caml_startup_exn(argv=<unavailable>) at startup_nat.c:167:10 [opt]
    frame #11: 0x0000000100029ca0 a.out`caml_main [inlined] caml_startup(argv=<unavailable>) at startup_nat.c:172:15 [opt]
    frame #12: 0x0000000100029ca0 a.out`caml_main(argv=<unavailable>) at startup_nat.c:179:3 [opt]
    frame #13: 0x0000000100029d08 a.out`main(argc=<unavailable>, argv=<unavailable>) at main.c:37:3 [opt]
    frame #14: 0x0000000184cb20e0 dyld`start + 2360
(lldb) 

I suspect that lldb is able to do a best-effort work at producing backtraces with the available DWARF info and now the correctly saved and restored x29.

Copy link
Contributor

@kayceesrk kayceesrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the change is correct. Marking caml_start_program as a signal frame seems fine to me for consistency of backtraces across platforms.

@jmid
Copy link
Contributor

jmid commented Apr 9, 2024

By default macOS has frame pointers enabled for clang/Xcode on ARM64, except for leaf functions. I haven't found a quotable documentation link for this but it can be determined experimentally by either compiling a simple C program or running lldb on the initial C parts of the runtime (startup_nat.c).

Not clang documentation, but this seems to align with https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms and this in particular:

  • The frame pointer register (x29) must always address a valid frame record. Some functions — such as leaf functions or tail calls — may opt not to create an entry in this list. As a result, stack traces are always meaningful, even without debug information.

@kayceesrk
Copy link
Contributor

@tmcgilchrist can you rebase the changes against the latest updates on trunk?

@tmcgilchrist tmcgilchrist force-pushed the arm_backtraces branch 2 times, most recently from 00ef6be to 3158a07 Compare April 10, 2024 00:11
runtime/arm64.S Outdated Show resolved Hide resolved
Copy link
Contributor

@kayceesrk kayceesrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kayceesrk
Copy link
Contributor

There is an unrelated assertion failure in the CI: https://github.com/ocaml/ocaml/actions/runs/8623935626/job/23638056562?pr=13079#step:7:77. I'll open a separate issue to track this.

Restores full backtraces on macOS ARM64 where the ABI expects the
frame pointer register (x29) to always address a valid frame record.
Copy link
Contributor

@kayceesrk kayceesrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. Thanks @tmcgilchrist.

@kayceesrk kayceesrk merged commit c39c3f9 into ocaml:trunk Apr 11, 2024
14 checks passed
@anmonteiro
Copy link
Contributor

Should this be considered for 5.2?

@tmcgilchrist
Copy link
Contributor Author

Some context on the general state of macOS LLDB debugging to help make an informed decision.

With this change you have backtraces plus you can set breakpoints based on addresses in lldb.
Using image lookup -r -n <caml_name> to find the right address and br s -a <0x00001> to set a breakpoint.

If you combine this change with #13050 then you get proper backtraces, setting breakpoints based on addresses and the ability to set breakpoints via symbols. eg br s -n camlFib$main_271.

Setting breakpoints based on source file and line number is still broken but I am working on it.

@gasche
Copy link
Member

gasche commented Apr 11, 2024

Three remarks on this change:

  • We are unconditionally adding a small amount of work for something that (if I understand correctly) only makes sense on one of the arm64 systems. I wonder if we should have made the new instructions conditional on the platform.
  • I think it would be nice if the code explained why we care to save the frame pointer, even outside our own frame-pointers mode. Otherwise it is easy for someone working on the code in the future to assume that this was meant for the frame-pointers mode, and make it conditional on Config.with_frame_pointers.
  • As far as I know, we don't have testsuite coverage for the behavior. I wonder if we could have a test, to detect regressions in the future. (We would want record a filtered output to not depend too much on the details of lldb's formatting, but maybe we could grep for a known symbol for example.)

@Octachron
Copy link
Member

If someone can document in this discussion that the performance impact is negligible, it sounds like a good idea to cherry-pick the change to 5.2 as a "bugfix for a feature we didn't know we nearly had".

@kayceesrk
Copy link
Contributor

kayceesrk commented Apr 11, 2024

We are unconditionally adding a small amount of work for something that (if I understand correctly) only makes sense on one of the arm64 systems. I wonder if we should have made the new instructions conditional on the platform.

x29 is saved and restored unconditionally in other Iextcall cases, caml_c_call and caml_c_call_stack_args. Hence, it seems fine to leave this as an unconditional change.

I think it would be nice if the code explained why we care to save the frame pointer, even outside our own frame-pointers mode. Otherwise it is easy for someone working on the code in the future to assume that this was meant for the frame-pointers mode, and make it conditional on Config.with_frame_pointers.

Is it useful to link to the recommendation in #13079 (comment)? Not saving and restoring x29 will make it point to an invalid frame record; after a noalloc C call, x29 may eventually point outside the stack if not for this change. With this change, it points to some valid frame record, not the most recent one in the call stack.

As far as I know, we don't have testsuite coverage for the behavior. I wonder if we could have a test, to detect regressions in the future. (We would want record a filtered output to not depend too much on the details of lldb's formatting, but maybe we could grep for a known symbol for example.)

IINM, we generally don't have coverage in the testsuite for native debugger behaviours. For gdb, one could automate this by scripting gdb to produce backtraces and compare it against a reference one. Unsure about lldb. Either way, do you think this PR should address this?

Octachron pushed a commit that referenced this pull request Apr 15, 2024
Save and restore frame pointer across Iextcall on ARM64

(cherry picked from commit c39c3f9)
@Octachron
Copy link
Member

Cherry-picked to 5.2.0 as a small bug fix yielding noticeable developer experience improvement in 5f767ec .

@tmcgilchrist tmcgilchrist deleted the arm_backtraces branch May 1, 2024 05:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants