Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing unsafe references to L #503

Open
mwild1 opened this issue Jun 30, 2020 · 6 comments
Open

Storing unsafe references to L #503

mwild1 opened this issue Jun 30, 2020 · 6 comments
Labels

Comments

@mwild1
Copy link

mwild1 commented Jun 30, 2020

In a number of places luv stores a long-lived reference to L such as src/luv.c:712.

This is not safe, however - L may refer to the current thread (coroutine) that the code is being called in, and threads are subject to garbage collection. After the thread (coroutine) is collected, the stored L value points to freed memory.

This materialized as a segfault in a complex application. It's hard to reduce to a simple test case that reliably segfaults, but running the below sample under valgrind will show the invalid memory accesses that are happening:

local uv

local f = coroutine.wrap(function ()
	uv = require "luv"
	uv.new_timer();
end);
f()
f = nil;

for i = 1, 10 do
	collectgarbage("collect")
end

uv = nil

for i = 1, 10 do
	collectgarbage("collect")
end

Valgrind output:

$ valgrind lua5.2 luv-crash.lua
==31681== Memcheck, a memory error detector
==31681== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==31681== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==31681== Command: lua5.2 luv-crash.lua
==31681== 
==31681== Invalid read of size 8
==31681==    at 0x10FAE3: lua_settop (lapi.c:165)
==31681==    by 0x64718E4: luv_close_cb (handle.c:81)
==31681==    by 0x6471D88: luv_gc_cb (handle.c:108)
==31681==    by 0x6695D7F: uv_run (in /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0)
==31681==    by 0x6479361: loop_gc (luv.c:690)
==31681==    by 0x11300C: luaD_precall (ldo.c:319)
==31681==    by 0x113371: luaD_call (ldo.c:401)
==31681==    by 0x11292E: luaD_rawrunprotected (ldo.c:131)
==31681==    by 0x1135DE: luaD_pcall (ldo.c:603)
==31681==    by 0x11487E: GCTM (lgc.c:824)
==31681==    by 0x115E1D: callallpendingfinalizers (lgc.c:978)
==31681==    by 0x115E1D: luaC_freeallobjects (lgc.c:988)
==31681==    by 0x11A04D: close_state (lstate.c:226)
(...truncated...)
@mwild1
Copy link
Author

mwild1 commented Jun 30, 2020

Some possible solutions:

  1. Don't (if possible) hold onto L, because the lifetime isn't guaranteed
  2. Resolve it to the main thread (5.2+ stores it in the registry at index LUA_RIDX_MAINTHREAD)
  3. Update it when possible (e.g. if __gc is called with L that differs to what is in the struct, update the struct)
  4. Store a reference to the current thread to prevent it being GC'd (over the top if we don't really need anything in that thread) (not a solution, see following comment)
  5. Throw error if L is not the main thread (would be annoying as a user)

@zhaozg zhaozg added the bug label Jun 30, 2020
@mwild1
Copy link
Author

mwild1 commented Jun 30, 2020

This is a second test case related to this issue:

local thread = coroutine.create(function ()
	local uv = require "luv"
	coroutine.yield();
end);
coroutine.resume(thread);

local uv = require "luv";
local t = uv.new_timer();
t:start(200, 0, function ()
	t:start(200); -- throws an error
end);
uv.run("default");

The initial thread is not garbage collected, but is suspended. Because the timer callback is using the L of a suspended thread to call into the Lua API, some things are in an inconsistent state. The error gets raised inside the suspended thread, and Lua accesses uninitialized memory while trying to build the stack trace (again, doesn't always segfault but valgrind shows the invalid access).

This demonstrates that option 4 in my previous comment would not be a valid solution to the problem.

@squeek502
Copy link
Member

Thank you for the detailed report!

@mwild1
Copy link
Author

mwild1 commented Oct 18, 2020

I hear rumours of a release. Is a fix for this something we could get into it? I can try to take a look this week if it helps.

I think from my list option 3 is the most robust solution.

@squeek502
Copy link
Member

Yeah, I'd be interested in seeing what a solution might look like.

(side note: I can reproduce the Valgrind error with the first test case, but not with the second test case [tried with Lua 5.1, Lua 5.4, and LuaJIT])

@SinisterRectus SinisterRectus mentioned this issue Oct 22, 2020
@dibyendumajumdar
Copy link

Hi - I think the proper solution is to add the lua context to userdata value field, using lua_setuservalue(). Then the Lua garbage collector won't collect the lua_State if the context is alive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants