Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken stack (with "1 instruction trace errors" warning) on non-tailrecursive code #246

Open
copy opened this issue Jul 13, 2022 · 5 comments
Labels
bug Something isn't working

Comments

@copy
Copy link

copy commented Jul 13, 2022

For example, running magic-trace run -full-execution on this code:

let rec go dir =
  if (Unix.lstat dir).st_kind = Unix.S_DIR then
    let rec handle_dir d =
      match Unix.readdir d with
      | exception End_of_file -> Unix.closedir d; ()
      | "." | ".." -> handle_dir d
      | subdir -> go (Filename.concat dir subdir); handle_dir d
    in
    handle_dir (Unix.opendir dir)
  else
    ()

(* a large directory structure, for example the ocaml source code checked out via git *)
let dir = "repositories/ocaml"
let () = ignore (go dir)

Prints Warning: 1 instruction trace errors and shows a cut-off stack as shown below (if a bigger directory is chosen, it happens more than once):

Screenshot_2022-07-13_21-06-00

The issue happens in similar code that doesn't use exceptions, so I don't think irregular control flow is the issue. Just for reference, this is in a 4.14 flambda switch and the executable is built with dune --profile=release.

@copy copy added the bug Something isn't working label Jul 13, 2022
@cgaebel
Copy link
Contributor

cgaebel commented Jul 13, 2022

First of all, thanks for reporting this. A broken stack given only one trace error certainly sounds like it could be tickling a bug. I'll dig into this properly this weekend.

In case you just want a usable trace, there are two remedies for trace errors that tend to work for most people:

  • Try -timer-resolution Low. Your timestamps will be less accurate (there is only a timer update every 1us-ish). But, in exchange, you will be far less likely to see trace errors.
  • Switch to a later generation Intel CPU. Intel has been actively improving IPT, and each generation tends to be much better than the previous one about this.

Out of curiosity, what CPU are you using?

@copy
Copy link
Author

copy commented Jul 13, 2022

Thanks! -timer-resolution Low doesn't seem to make much of a difference. It looks like the first broken stack happens later, but after that a similar breakage happens. If you can't reproduce it with my example above, try pointing it at a larger directory. The number of trace errors goes up the deeper the stack.

Out of curiosity, what CPU are you using?

Intel i9-9900KS

@cgaebel
Copy link
Contributor

cgaebel commented Aug 6, 2022

I find that on my machine (Intel i5-1145G7) running magic-trace with the environment variable MAGIC_TRACE_NO_DLFILTER=1 seems to help. There are still lots of decode errors and exceptional control flow (not in your code, but in core startup code when I adapted your example to use Core_unix), but those are known problems.

For fellow magic-trace devs:

It looks like the dlfilter is eating trace errors. Here's a relevant excerpt when running without dlfilter:

1539481/1539481 2418006.848063682:          1 branches:uH:   call                       556ff0af70b8 unix_inet_addr_of_string+0x38 (/tmp/issue_246) =>     7fc9ca8563e0 inet_pton+0x0 (/usr/lib/libc.so.6)
1539481/1539481 2418006.848063890:          1 branches:uH:   tr end                     7fc9ca8563e0 inet_pton+0x0 (/usr/lib/libc.so.6) =>                0 [unknown] ([unknown])
1539481/1539481 2418006.848065765:          1 branches:uH:   tr strt                               0 [unknown] ([unknown]) =>     7fc9ca8563e0 inet_pton+0x0 (/usr/lib/libc.so.6)
1539481/1539481 2418006.848065765:          1 branches:uH:   call                       7fc9ca8563f3 inet_pton+0x13 (/usr/lib/libc.so.6) =>     7fc9ca74c2b0 [unknown] (/usr/lib/libc.so.6)
1539481/1539481 2418006.848065765:          1 branches:uH:   jmp                        7fc9ca74c2b4 [unknown] (/usr/lib/libc.so.6) =>     7fc9ca8964c0 [unknown] (/usr/lib/libc.so.6)
1539481/1539481 2418006.848065765:          1 branches:uH:   return                     7fc9ca8964ef [unknown] (/usr/lib/libc.so.6) =>     7fc9ca8563f8 inet_pton+0x18 (/usr/lib/libc.so.6)
1539481/1539481 2418006.848065765:          1 branches:uH:   jcc                        7fc9ca8563fb inet_pton+0x1b (/usr/lib/libc.so.6) =>     7fc9ca856420 inet_pton+0x40 (/usr/lib/libc.so.6)
1539481/1539481 2418006.848065765:          1 branches:uH:   jmp                        7fc9ca85642e inet_pton+0x4e (/usr/lib/libc.so.6) =>     7fc9ca856090 [unknown] (/usr/lib/libc.so.6)
1539481/1539481 2418006.848065973:          1 branches:uH:   jmp                        7fc9ca8560c3 [unknown] (/usr/lib/libc.so.6) =>     7fc9ca8560f8 [unknown] (/usr/lib/libc.so.6)
1539481/1539481 2418006.848065973:          1 branches:uH:   jcc                        7fc9ca856105 [unknown] (/usr/lib/libc.so.6) =>     7fc9ca8560c8 [unknown] (/usr/lib/libc.so.6)
1539481/1539481 2418006.848065973:          1 branches:uH:   jcc                        7fc9ca856126 [unknown] (/usr/lib/libc.so.6) =>     7fc9ca8560f8 [unknown] (/usr/lib/libc.so.6)
1539481/1539481 2418006.848065973:          1 branches:uH:   jcc                        7fc9ca856105 [unknown] (/usr/lib/libc.so.6) =>     7fc9ca8560c8 [unknown] (/usr/lib/libc.so.6)
1539481/1539481 2418006.848065973:          1 branches:uH:   jcc                        7fc9ca856126 [unknown] (/usr/lib/libc.so.6) =>     7fc9ca8560f8 [unknown] (/usr/lib/libc.so.6)
1539481/1539481 2418006.848065973:          1 branches:uH:   jcc                        7fc9ca856105 [unknown] (/usr/lib/libc.so.6) =>     7fc9ca8560c8 [unknown] (/usr/lib/libc.so.6)
1539481/1539481 2418006.848065973:          1 branches:uH:   jcc                        7fc9ca856126 [unknown] (/usr/lib/libc.so.6) =>     7fc9ca8560f8 [unknown] (/usr/lib/libc.so.6)
1539481/1539481 2418006.848065973:          1 branches:uH:   jcc                        7fc9ca856105 [unknown] (/usr/lib/libc.so.6) =>     7fc9ca8560c8 [unknown] (/usr/lib/libc.so.6)
1539481/1539481 2418006.848065973:          1 branches:uH:   jcc                        7fc9ca8560f6 [unknown] (/usr/lib/libc.so.6) =>     7fc9ca856128 [unknown] (/usr/lib/libc.so.6)
1539481/1539481 2418006.848065973:          1 branches:uH:   jmp                        7fc9ca856139 [unknown] (/usr/lib/libc.so.6) =>     7fc9ca85615a [unknown] (/usr/lib/libc.so.6)
1539481/1539481 2418006.848065973:          1 branches:uH:   return                     7fc9ca85616e [unknown] (/usr/lib/libc.so.6) =>     556ff0af70be unix_inet_addr_of_string+0x3e (/tmp/issue_246)
1539481/1539481 2418006.848065973:          1 branches:uH:   jcc                        556ff0af70c0 unix_inet_addr_of_string+0x40 (/tmp/issue_246) =>     556ff0af7100 unix_inet_addr_of_string+0x80 (/tmp/issue_246)
1539481/1539481 2418006.848065973:          1 branches:uH:   call                       556ff0af7103 unix_inet_addr_of_string+0x83 (/tmp/issue_246) =>     556ff0afc0b0 alloc_inet_addr+0x0 (/tmp/issue_246)

And here it is with the dlfilter:

1539481/1539481 2418006.848063682:          1 branches:uH:   call                       556ff0af70b8 unix_inet_addr_of_string+0x38 (/tmp/issue_246) =>     7fc9ca8563e0 inet_pton+0x0 (/usr/lib/libc.so.6)
1539481/1539481 2418006.848065973:          1 branches:uH:   call                       556ff0af7103 unix_inet_addr_of_string+0x83 (/tmp/issue_246) =>     556ff0afc0b0 alloc_inet_addr+0x0 (/tmp/issue_246)

Both these excerpts are from the same spot in the same trace file.

@cgaebel
Copy link
Contributor

cgaebel commented Aug 6, 2022

Hmm we only added dlfilter recently, so it's just in master. Were you experiencing this issue using the latest release of magic-trace?

@cgaebel
Copy link
Contributor

cgaebel commented Aug 6, 2022

To be clear, the trace I'm seeing without dlfilter doesn't look very good. But all the brokenness I see is due to decode errors, and only Intel can do anything about those. When a big tower of stackframes ends abruptly, see if there's a little triangle at the bottom. If you click that and it says [Decode error: Overflow error], that's a decode error.

cgaebel added a commit to cgaebel/magic-trace that referenced this issue Aug 7, 2022
Per janestreet#246, I think it's busted right now in the face of trace errors.
Lets turn it off by default until we ship a solution.
Xyene pushed a commit that referenced this issue Sep 25, 2022
Per #246, I think it's busted right now in the face of trace errors.
Lets turn it off by default until we ship a solution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants