Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0007 CRASH detected in debug_framepc due to a fault at or near 0x000000000000004e leading to SIGSEGV #1196

Open
pwnhacker0x18 opened this issue Apr 27, 2024 · 3 comments

Comments

@pwnhacker0x18
Copy link

                    53: BCPos debug_framepc(L = (lua_State *)0x7ffff7e53380, fn = (GCfunc *)0x7ffff7e557d0, nextframe = (cTValue *)0x7ffff748ded8) {

                   |||:

                   |||: /* Local reference: BCPos pos = <optimized out>; */

                   |||: /* Local reference: GCproto * pt = 0x7ffff7e5ee48; */

                   |||: /* Local reference: const BCIns * ins = 0x52; */

                   102: #if LJ_HASJIT

                   103:   if (pos > pt->sizebc) {  /* Undo the effects of lj_trace_exit for JLOOP. */

                   104:     if (bc_isret(bc_op(ins[-1]))) {

                   |||:

                   ---: }

                   at lj_debug.c:104

poc.txt

@corsix
Copy link

corsix commented May 9, 2024

Simpler POC:

function bisect(f, a, b, fa, fb)
  local c = tonumber("", nil, nil)
  local d, e, g, h
  solve(f, a, b)
  local i, j, k, l, m, n
end

function solve(f, a, b)
  bisect(f, a, b, f(a), f(b))
end

function f() end

solve(f)

Key points seem to be a stack overflow, combined with a trace stitch (for the dubious-looking tonumber call).

I've not yet properly analysed the cause, but a temporary triage is:

diff --git a/src/lj_debug.c b/src/lj_debug.c
index 8d8b9eb5..bfae7615 100644
--- a/src/lj_debug.c
+++ b/src/lj_debug.c
@@ -64,7 +64,7 @@ static BCPos debug_framepc(lua_State *L, GCfunc *fn, cTValue *nextframe)
     if (cf == NULL || (char *)cframe_pc(cf) == (char *)cframe_L(cf))
       return NO_BCPOS;
     ins = cframe_pc(cf);  /* Only happens during error/hook handling. */
-    if (!ins) return NO_BCPOS;
+    if (!ins || ((ptrdiff_t)ins & FRAME_TYPE) != FRAME_LUA) return NO_BCPOS;
   } else {
     if (frame_islua(nextframe)) {
       ins = frame_pc(nextframe);
@@ -95,7 +95,7 @@ static BCPos debug_framepc(lua_State *L, GCfunc *fn, cTValue *nextframe)
        }
       }
       ins = cframe_pc(cf);
-      if (!ins) return NO_BCPOS;
+      if (!ins || ((ptrdiff_t)ins & FRAME_TYPE) != FRAME_LUA) return NO_BCPOS;
     }
   }
   pt = funcproto(fn);

@MikePall
Copy link
Member

IMHO that workaround is covering up the real issue. It shouldn't end up there with a non-Lua frame and NULL nextframe -- I should probably add an assert.

There's a first silent stack overflow in lj_snap_restore. The frame chain still seems to be OK at that point. The caught overflow causes the trace to exit with an error. Then another stack overflow happens with an improper frame chain, which makes it crash in debug_framepc.

It have run out of time to dissect the frames in both cases. My gut feeling says to check the order of operations in lj_snap_restore and/or add a possible fixup step in case of stack overflow or OOM.

@corsix
Copy link

corsix commented May 29, 2024

Indeed, the first interesting event is lj_snap_restore failing to grow the stack:

lj_vm_exit_handler
  lj_trace_exit
    lj_vm_cpcall
      trace_exit_cp
        lj_snap_restore
          lj_state_growstack
            lj_err_stkov

lj_snap_restore will have done very little at this point (though if we'd hit an OOM rather than a STKOV then lj_snap_restore could instead have partially restored from a snapshot). Notably, lj_snap_restore will have done setcframe_pc(cframe_raw(L->cframe), pc+1);, and it won't have done L->base += (map[nent+LJ_BE] & 0xff) (though in !LJ_FR2, it might have done some L->base = o+1;).

The error is caught by the lj_vm_cpcall, which does very little other than remove the L->cframe it added.

lj_trace_exit then does not do:

  pc = exd.pc;
  cf = cframe_raw(L->cframe);
  setcframe_pc(cf, pc);

Instead, all lj_trace_exit does is:

    return -errcode;

The logic in lj_vm_exit_handler continues, doing very little other than calling lj_err_trace, which leads us to:

lj_vm_exit_handler
  lj_err_trace
    lj_err_run
      lj_state_growstack
        lj_err_stkov

We proceed to segfault in debug_framepc because cframe_pc(cf) is 0x52.


The logic to restore the outer cframe's pc could be moved from lj_trace_exit to lj_snap_restore:

diff --git a/src/lj_snap.c b/src/lj_snap.c
index 6fda08ba..27f9c8e5 100644
--- a/src/lj_snap.c
+++ b/src/lj_snap.c
@@ -957,7 +957,8 @@ const BCIns *lj_snap_restore(jit_State *J, void *exptr)
   lua_State *L = J->L;
 
   /* Set interpreter PC to the next PC to get correct error messages. */
-  setcframe_pc(cframe_raw(L->cframe), pc+1);
+  setcframe_pc(L->cframe, pc+1);
+  setcframe_pc(cframe_raw(cframe_prev(L->cframe)), pc);
 
   /* Make sure the stack is big enough for the slots from the snapshot. */
   if (LJ_UNLIKELY(L->base + snap->topslot >= tvref(L->maxstack))) {

Alternatively, lj_trace_exit could set the outer cframe pc to something semi-valid in this case:

diff --git a/src/lj_trace.c b/src/lj_trace.c
index a5e316e1..f2a69cc5 100644
--- a/src/lj_trace.c
+++ b/src/lj_trace.c
@@ -905,8 +905,10 @@ int LJ_FASTCALL lj_trace_exit(jit_State *J, void *exptr)
   exd.J = J;
   exd.exptr = exptr;
   errcode = lj_vm_cpcall(L, NULL, &exd, trace_exit_cp);
-  if (errcode)
+  if (errcode) {
+    setcframe_pc(cframe_raw(L->cframe), L);
     return -errcode;  /* Return negated error code. */
+  }
 
   if (exitcode) copyTV(L, L->top++, &exiterr);  /* Anchor the error object. */

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants