Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threads.@spawn can fatally crash PyCall @async task #1006

Open
the-noble-argon opened this issue Sep 16, 2022 · 1 comment
Open

Threads.@spawn can fatally crash PyCall @async task #1006

the-noble-argon opened this issue Sep 16, 2022 · 1 comment

Comments

@the-noble-argon
Copy link

the-noble-argon commented Sep 16, 2022

I'm trying to use PyCall in a Julia solution for various IO tasks (such as handling Parquet files and interacting with various Azure resources). I would like to have these PyCall tasks run in the background on one thread, doing a bunch of IO work using a fancy Python SDK while I do multithreaded number-crunching in Julia. I'm running into a weird issue though. With this script I wrote, I'm getting a fatal error if I run the @sync python task while running a Threads.@Spawn task, but everything is fine if I use Threads.@threads. Is there any reason why @threads works while @Spawn doesn't? Is there a way to make this safe with Threads.@Spawn because we can't know, unless we look at the code, if an external library uses Threads.@Spawn under the hood.

I'm going through all the docs of PyCall and looking at issues with multi-threading (and there are a few of them that discuss this, like #882 and #883) but we need a better understanding of what we can and can't do in Julia while PyCall is doing something. I have a script that puts locks around the python process, and only executes Python as an @async, from the main thread so it should always happen on the main thread as suggested in #882 (I think?). Anyway, I'm getting fatal errors in Windows (and segfaults in Linux) when I try to run the script where a Threads.@Spawn happens while PyCall is running, but it's fine if I use a Threads.@threads

Anyway, here's the script:

using PyCall
const PY_LOCK = ReentrantLock()
const PY_JSON = pyimport("json")


const PYLOCK = Ref{ReentrantLock}()
PYLOCK[] = ReentrantLock()

# acquire the lock before any code calls Python
pylock(f::Function) = Base.lock(f, PYLOCK[])


function write_json_file(fileName::String, outputData::Dict)
    pylock() do
        open(fileName, "w") do outputFile
            PY_JSON.dump(deepcopy(outputData), outputFile)
        end
    end
    return nothing
end

function file_operation(fileName::String)
    outputData = Dict("a"=>randn(), "b"=>randn())
    write_json_file(fileName, outputData)
    return outputData
end

function multithread_calc(x::Vector)
    y = zeros(Float64, length(x))
    Threads.@threads for ii in eachindex(y)
        y[ii] = log(exp(x[ii]))
    end
    return y
end

function calc_task(x::Vector)
    t = Threads.@spawn log.(exp.(x))
    return fetch(t)
end

calcInput = randn(10000000)
for ii in 1:100
    display(ii)
    file_operation("testfile.json")
    fileTasks   = [@async file_operation("testfile$(ii).json") for ii in 1:30]
    #calcResults =  multithread_calc(calcInput)
    calcResults = calc_task(calcInput)
    wait.(fileTasks)
end

Now, if I modify this script so that it executes a multithreaded calculation using Threads.@threads in multithread_calc() I get no issue

calcInput = randn(10000000)
for ii in 1:100
    display(ii)
    file_operation("testfile.json")
    fileTasks   = [@async file_operation("testfile$(ii).json") for ii in 1:30]
    calcResults =  multithread_calc(calcInput)
    #calcResults = calc_task(calcInput)
    wait.(fileTasks)
end

Otherwise, if I use the calc_task() function that has Threads.@Spawn while running the Python tasks, I get the following error.

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffa4e385adb -- PyUnicode_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
in expression starting at g:\My Drive\tests\julia_pylock_json.jl:42
PyUnicode_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicode_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyLong_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicode_DecodeUTF8Stateful at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicode_FromId at C:\Users\user\Miniconda3\python39.dll (unknown line)
Py_FinalizeEx at C:\Users\user\Miniconda3\python39.dll (unknown line)
Py_FinalizeEx at C:\Users\user\Miniconda3\python39.dll (unknown line)
Py_Finalize at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyinit.jl:125
unknown function (ip: 00000000618a7a23)
_atexit at .\initdefs.jl:372
unknown function (ip: 00000000618a7013)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
ijl_atexit_hook at /cygdrive/c/buildbot/worker/package_win64/build/src\init.c:219
ijl_exit at /cygdrive/c/buildbot/worker/package_win64/build/src\jl_uv.c:640
jl_exception_handler at /cygdrive/c/buildbot/worker/package_win64/build/src\signals-win.c:322
__julia_personality at /cygdrive/c/buildbot/worker/package_win64/build/src\win32_ucontext.c:28
_chkstk at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
RtlRaiseException at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
KiUserExceptionDispatcher at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
PyUnicode_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicode_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyLong_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicodeWriter_WriteASCIIString at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicode_FromFormatV at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyErr_Format at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyObject_GetBuffer at C:\Users\user\Miniconda3\python39.dll (unknown line)
isbuftype! at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pybuffer.jl:134 [inlined]
isbuftype at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pybuffer.jl:148 [inlined]
pysequence_query at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:759
pytype_query at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:773
#36 at .\none:0 [inlined]
iterate at .\generator.jl:47
unknown function (ip: 000000006189dc50)
do_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:703
typetuple at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:745
unknown function (ip: 000000006189cd9a)
pysequence_query at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:754
pytype_query at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:773
pytype_query at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:806 [inlined]
convert at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:831
julia_args at C:\Users\user\.julia\packages\PyCall\ygXW2\src\callback.jl:18 [inlined]
_pyjlwrap_call at C:\Users\user\.julia\packages\PyCall\ygXW2\src\callback.jl:24
unknown function (ip: 000000006189cc0a)
pyjlwrap_call at C:\Users\user\.julia\packages\PyCall\ygXW2\src\callback.jl:44
unknown function (ip: 000000006187c398)
PyObject_Call at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyEval_EvalFrameDefault at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyFunction_Vectorcall at C:\Users\user\Miniconda3\python39.dll (unknown line)
Py_NewReference at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyEval_EvalFrameDefault at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyFunction_Vectorcall at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyFunction_Vectorcall at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyVectorcall_Call at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyObject_Call at C:\Users\user\Miniconda3\python39.dll (unknown line)
macro expansion at C:\Users\user\.julia\packages\PyCall\ygXW2\src\exception.jl:95 [inlined]
#107 at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:43 [inlined]
disable_sigint at .\c.jl:473 [inlined]
__pycall! at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:42 [inlined]
_pycall! at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:29
_pycall! at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:11
unknown function (ip: 000000006188c5f5)
#_#114 at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:86
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
do_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:730
PyObject at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:86
#2 at g:\My Drive\tests\julia_pylock_json.jl:16484
#open#378 at .\io.jl:384
open at .\io.jl:381 [inlined]
#1 at g:\My Drive\tests\julia_pylock_json.jl:15 [inlined]
lock at .\lock.jl:185
pylock at g:\My Drive\tests\julia_pylock_json.jl:10 [inlined]
write_json_file at g:\My Drive\tests\julia_pylock_json.jl:14 [inlined]
file_operation at g:\My Drive\tests\julia_pylock_json.jl:24
#11 at .\task.jl:484
unknown function (ip: 00000000618a5de3)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
start_task at /cygdrive/c/buildbot/worker/package_win64/build/src\task.c:931
Allocations: 10783290 (Pool: 10776630; Big: 6660); GC: 14
@the-noble-argon the-noble-argon changed the title Need better docs on understanding PyCall and multithreading Threads.@spawn fatally crashes PyCall @async task, but Threads.@threads works fine Sep 17, 2022
@the-noble-argon the-noble-argon changed the title Threads.@spawn fatally crashes PyCall @async task, but Threads.@threads works fine Threads.@spawn can fatally crash PyCall @async task, but Threads.@threads works fine Sep 17, 2022
@the-noble-argon
Copy link
Author

the-noble-argon commented Sep 19, 2022

I managed to find a temporary workaround for this issue. I had to modify the pylock() function to not only to do lock/unlock but to also disable garbage collection while the Python code is running (which is probably not ideal).

pylock(f::Function) = Base.lock(PYLOCK[]) do
    prev_gc = GC.enable(false)
    try 
        return f()
    finally
        GC.enable(prev_gc) # recover previous state
    end
end

This highlights the importance of #883 in making sure that PyCall tasks can't be corrupted by garbage-collections triggered by other threads, becuase I don't think this drastic kludge of disabling/enabling the garbage collector is the kind of solution Julia devs would encourage.

@the-noble-argon the-noble-argon changed the title Threads.@spawn can fatally crash PyCall @async task, but Threads.@threads works fine Threads.@spawn can fatally crash PyCall @async task Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant