Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread safety #882

Open
stevengj opened this issue Feb 25, 2021 · 30 comments · May be fixed by #883
Open

thread safety #882

stevengj opened this issue Feb 25, 2021 · 30 comments · May be fixed by #883
Labels

Comments

@stevengj
Copy link
Member

stevengj commented Feb 25, 2021

If Julia's Threads.nthreads() > 1, we may want to do some more work to ensure thread safety. In particular:

  • Call PyEval_InitThreads() in __init__ (this is only needed for Python ≤ 3.6, and is a no-op in Python ≥ 3.7).
  • Acquire the GIL in the PyObject finalizer, by calling PyGILState_Ensure (returns an enum, i.e. a Cint in Julia) and PyGILState_Release. This is because the Julia GC may be called from threads other than the main thread, so we need to ensure that we hold Python's GIL before decref-ing.

We might also expose an API to acquire the GIL, but in general I would recommend that user code should only call Python from the main thread.

See this forum post and #881 for a possible test case.

@marius311
Copy link
Contributor

Thanks, yea this sounds like it could be the cause. I tried your suggestion and it deadlocks. My guess is you have to do something like here. Playing with it, although I'm not an expert.

@marius311 marius311 linked a pull request Feb 25, 2021 that will close this issue
@stevengj
Copy link
Member Author

I hope we don't have to do something as complicated as this code in FFTW.jl.

See the discussion in JuliaMath/FFTW.jl#141 where we ran into deadlocks during GC because FFTW was trying to acquire a mutex lock during plan destruction.

@stevengj
Copy link
Member Author

Hmm, unfortunately I think we may be in the same situation as FFTW, because Python can call back to Julia. That is, the following may occur:

  1. We call a Python function, which acquires the GIL.
  2. Python calls back to Julia.
  3. During the Julia call, Julia decides to run GC, which tries to finalize some PyObject
  4. The PyObject finalizer tries to acquire the GIL. Deadlock ensues.

So, we may need something like the FFTW.jl solution after all.

@marius311
Copy link
Contributor

Thanks, need to think about this and read that code, but one thing I dont totally follow is that in solution in my PR, the PyObject finalizer never tries to acquire the GIL. So are we fine?

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Feb 26, 2021

For some uses PythonCall.jl [EDIT: renamed from Python.jl] may be preferable? I'm not yet clear about the pros and cons, or why that new package needed, but at least it has some GIL-related code, so you could either use that package or maybe it's helpful to look at the code to fix here:

https://github.com/cjdoris/PythonCall.jl/blob/6bf7575247b0354009ae8f97387817f60bf45442/src/gil.jl

@secondaryfox
Copy link

I'm trying to run pytorch code from julia's pycall. I know that the pytorch code does not require GIL so I would like to run multiple of this code with julia threads -> pycall -> without GIL. Is this automatically achieved or should I do anything extra?

@mirkobunse
Copy link

mirkobunse commented Jul 15, 2022

Users of PyCall can use a global lock that every call to Python has to acquire.

module MyModule

const PYLOCK = Ref{ReentrantLock}()

function __init__() # instantiate the lock
    PYLOCK[] = ReentrantLock()
end

# acquire the lock before any code calls Python
pylock(f::Function) = Base.lock(f, PYLOCK[])

function example_function()
    pylock() do
       # any code that calls Python
    end
end

end # module

This approach works at least if Julia calls Python without any call-back to Julia. I assume that it also works with a call-back, as long as this call-back does not call Python again.


EDIT

I have verified the above approach with the following example, where we intend to fill the large array a with random numbers from numpy.random.rand().

using MyModule, PyCall

a = zeros(10000)
np_random = pyimport("numpy.random")

The following code, which does not acquire a lock, segfaults:

Threads.@threads for i in 1:length(a)
    a[i] = np_random.rand()
end

However, we can acquire the lock to achieve the desired effect.

Threads.@threads for i in 1:length(a)
    MyModule.pylock() do
        a[i] = np_random.rand()
    end
end

Of course, this locking mechanism requires all Python calls to acquire the lock. And acquiring the lock limits the degree to which our program parallelizes. In the above example, there is indeed no parallelism at all because the lock has to be acquired for each, complete iteration of the loop.

@the-noble-argon
Copy link

@mirkobunse

I recently tried doing what you suggested with a simple JSON-writing exercise (I know Julia has the same feature, this is just a minimal example)

using PyCall
const PY_LOCK = ReentrantLock()
const PY_JSON = pyimport("json")


const PYLOCK = Ref{ReentrantLock}()
PYLOCK[] = ReentrantLock()

# acquire the lock before any code calls Python
pylock(f::Function) = Base.lock(f, PYLOCK[])

function example_function()
    pylock() do
       # any code that calls Python
    end
end

function write_json_file(fileName::String, outputData::Dict)
    
    pylock() do
        open(fileName, "w") do outputFile
            PY_JSON.dump(deepcopy(outputData), outputFile)
        end
    end

    return nothing
end


function file_operation(fileName::String)
    outputData = Dict("a"=>randn(), "b"=>randn())

    write_json_file(fileName, outputData)
    return outputData
end

file_operation("testfile.json")
firstTask = Threads.@spawn log.(exp.(randn(10000000)))
taskList = [Threads.@spawn file_operation("testfile$(ii).json") for ii in 1:30]

However, I get this error:

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffa6e17001c -- PyDict_GetItem at C:\Users\user\Miniconda3\python39.dll (unknown line)
in expression starting at none:0
PyDict_GetItem at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyObject_GetAttrString at C:\Users\user\Miniconda3\python39.dll (unknown line)
_getproperty at C:\Users\user\.julia\packages\PyCall\ygXW2\src\PyCall.jl:300
__getproperty at C:\Users\user\.julia\packages\PyCall\ygXW2\src\PyCall.jl:312 [inlined]
getproperty at C:\Users\user\.julia\packages\PyCall\ygXW2\src\PyCall.jl:318
#4 at g:\My Drive\julia_pycall_threadjson.jl:22
#open#378 at .\io.jl:384
open at .\io.jl:381 [inlined]
#3 at g:\My Drive\julia_pycall_threadjson.jl:21 [inlined]
lock at .\lock.jl:185
pylock at g:\My Drive\julia_pycall_threadjson.jl:10 [inlined]
write_json_file at g:\My Drive\julia_pycall_threadjson.jl:20 [inlined]
file_operation at g:\My Drive\julia_pycall_threadjson.jl:33
#10 at .\threadingconstructs.jl:258
unknown function (ip: 0000000061883ea3)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
start_task at /cygdrive/c/buildbot/worker/package_win64/build/src\task.c:931
Allocations: 10553387 (Pool: 10546750; Big: 6637); GC: 11

@mirkobunse
Copy link

mirkobunse commented Sep 14, 2022

@the-noble-argon

Thanks for reporting! I figured that the error occurs because your threads are still running when Julia exits (EDIT: to be more clear, my pylock is not the cause of this error). A solution is to add two lines to your script:

wait(firstTask)
wait.(taskList)

This will make your code wait for all threads before Julia exits.

@the-noble-argon
Copy link

@mirkobunse
I think that solves it (which is nice, because I was sure the pylock solution should work). Interestingly enough, this script I wrote runs fine on linux even without the wait at the end. When I saw some .dll files buried in the error message, I was wondering if it was a Python-Windows problem; sure enough, when I ran it on Ubuntu, everything was fine. I can't vouch for Mac but I'd assume it would run. For me this was actually fine, because I plan on deploying my solution to a linux machine anyway, it's just that my work-dev machine is a windows box.

Anyway, this is a huge leap in usability from when I tried to use PyCall in a multithreaded environment a couple of years ago. Whoever did this, thanks a lot!

@the-noble-argon
Copy link

the-noble-argon commented Sep 16, 2022

Unfortunately, I spoke too soon. I realized that on the linux command line version, I forgot to enable multiple threads. I'm really trying to do this in a way that is safe. I'm putting all my python objects as global constants to they don't get unintentionally garbage collected, and I'm implementing the lock functionality. Here is my latest script

using PyCall
const PY_LOCK = ReentrantLock()
const PY_JSON = pyimport("json")

const PYLOCK = Ref{ReentrantLock}()
PYLOCK[] = ReentrantLock()

# acquire the lock before any code calls Python
pylock(f::Function) = Base.lock(f, PYLOCK[])


function write_json_file(fileName::String, outputData::Dict)
    pylock() do
        open(fileName, "w") do outputFile
            PY_JSON.dump(deepcopy(outputData), outputFile)
        end
    end
    return nothing
end


function file_operation(fileName::String)
    outputData = Dict("a"=>randn(), "b"=>randn())

    write_json_file(fileName, outputData)
    return outputData
end

for ii in 1:100
    file_operation("testfile.json")
    firstTask = Threads.@spawn log.(exp.(randn(10000000)))
    taskList = [Threads.@spawn file_operation("testfile$(ii).json") for ii in 1:30]

    wait(firstTask)
    wait.(taskList)
end

I'm able to get crashes in both windows and linux. For windows:

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffa4db45adb -- PyUnicode_New at C:\user\Miniconda3\python39.dll (unknown line)
in expression starting at G:\My Drive\testing\julia_pylock_json.jl:31
PyUnicode_New at C:\user\Miniconda3\python39.dll (unknown line)
PyFunction_Vectorcall at C:\user\Miniconda3\python39.dll (unknown line)
Py_NewReference at C:\user\Miniconda3\python39.dll (unknown line)
PyEval_EvalFrameDefault at C:\user\Miniconda3\python39.dll (unknown line)
PyFunction_Vectorcall at C:\user\Miniconda3\python39.dll (unknown line)
Py_NewReference at C:\user\Miniconda3\python39.dll (unknown line)
PyEval_EvalFrameDefault at C:\user\Miniconda3\python39.dll (unknown line)
PyFunction_Vectorcall at C:\user\Miniconda3\python39.dll (unknown line)
PyFunction_Vectorcall at C:\user\Miniconda3\python39.dll (unknown line)
PyVectorcall_Call at C:\user\Miniconda3\python39.dll (unknown line)
PyObject_Call at C:\user\Miniconda3\python39.dll (unknown line)
macro expansion at C:\user\.julia\packages\PyCall\ygXW2\src\exception.jl:95 [inlined]
#107 at C:\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:43 [inlined]
disable_sigint at .\c.jl:473 [inlined]
__pycall! at C:\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:42 [inlined]
_pycall! at C:\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:29
_pycall! at C:\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:11
unknown function (ip: 0000000062bab9e5)
#_#114 at C:\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:86
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
do_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:730
PyObject at C:\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:86
#2 at G:\My Drive\testing\julia_pylock_json.jl:17
#open#378 at .\io.jl:384
open at .\io.jl:381 [inlined]
#1 at G:\My Drive\testing\julia_pylock_json.jl:16 [inlined]
lock at .\lock.jl:185
pylock at G:\My Drive\testing\julia_pylock_json.jl:11 [inlined]
write_json_file at G:\My Drive\testing\julia_pylock_json.jl:15 [inlined]
file_operation at G:\My Drive\testing\julia_pylock_json.jl:27
#7 at .\threadingconstructs.jl:258
unknown function (ip: 0000000062bc6f53)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
start_task at /cygdrive/c/buildbot/worker/package_win64/build/src\task.c:931
Allocations: 11028177 (Pool: 11021443; Big: 6734); GC: 14

For linux, I have:

signal (11): Segmentation fault
in expression starting at /home/user/julia_pylock_json.jl:31
pymalloc_alloc.isra.1 at /opt/conda/conda-bld/python-split_1649141344976/work/Objects/obmalloc.c:1621 [inlined]
_PyObject_Malloc at /opt/conda/conda-bld/python-split_1649141344976/work/Objects/obmalloc.c:1640 [inlined]
PyObject_Malloc at /opt/conda/conda-bld/python-split_1649141344976/work/Objects/obmalloc.c:685
_PyObject_GC_Alloc at /opt/conda/conda-bld/python-split_1649141344976/work/Modules/gcmodule.c:2261 [inlined]
_PyObject_GC_Malloc at /opt/conda/conda-bld/python-split_1649141344976/work/Modules/gcmodule.c:2288
_PyObject_GC_New at /opt/conda/conda-bld/python-split_1649141344976/work/Modules/gcmodule.c:2300
PyCell_New at /opt/conda/conda-bld/python-split_1649141344976/work/Objects/cellobject.c:11
_PyEval_EvalCode at /opt/conda/conda-bld/python-split_1649141344976/work/Python/ceval.c:4283
_PyFunction_Vectorcall at /opt/conda/conda-bld/python-split_1649141344976/work/Objects/call.c:396
_PyObject_VectorcallTstate at /opt/conda/conda-bld/python-split_1649141344976/work/Include/cpython/abstract.h:118 [inlined]
PyObject_Vectorcall at /opt/conda/conda-bld/python-split_1649141344976/work/Include/cpython/abstract.h:127 [inlined]
call_function at /opt/conda/conda-bld/python-split_1649141344976/work/Python/ceval.c:5077 [inlined]
_PyEval_EvalFrameDefault at /opt/conda/conda-bld/python-split_1649141344976/work/Python/ceval.c:3520
_PyEval_EvalFrame at /opt/conda/conda-bld/python-split_1649141344976/work/Include/internal/pycore_ceval.h:40 [inlined]
_PyEval_EvalCode at /opt/conda/conda-bld/python-split_1649141344976/work/Python/ceval.c:4329
_PyFunction_Vectorcall at /opt/conda/conda-bld/python-split_1649141344976/work/Objects/call.c:396
_PyObject_VectorcallTstate at /opt/conda/conda-bld/python-split_1649141344976/work/Include/cpython/abstract.h:118 [inlined]
PyObject_Vectorcall at /opt/conda/conda-bld/python-split_1649141344976/work/Include/cpython/abstract.h:127 [inlined]
call_function at /opt/conda/conda-bld/python-split_1649141344976/work/Python/ceval.c:5077 [inlined]
_PyEval_EvalFrameDefault at /opt/conda/conda-bld/python-split_1649141344976/work/Python/ceval.c:3506
_PyEval_EvalFrame at /opt/conda/conda-bld/python-split_1649141344976/work/Include/internal/pycore_ceval.h:40 [inlined]
_PyEval_EvalCode at /opt/conda/conda-bld/python-split_1649141344976/work/Python/ceval.c:4329
_PyFunction_Vectorcall at /opt/conda/conda-bld/python-split_1649141344976/work/Objects/call.c:396
PyVectorcall_Call at /opt/conda/conda-bld/python-split_1649141344976/work/Objects/call.c:231
macro expansion at /home/user/.julia/packages/PyCall/ygXW2/src/exception.jl:95 [inlined]
#107 at /home/user/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:43 [inlined]
disable_sigint at ./c.jl:473 [inlined]
__pycall! at /home/user/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:42 [inlined]
_pycall! at /home/user/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:29
_pycall! at /home/user/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:11
unknown function (ip: 0x7ff6d9b1a223)
_jl_invoke at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-8/src/gf.c:2549
#_#114 at /home/user/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:86
_jl_invoke at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-8/src/gf.c:2549
jl_apply at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-8/src/julia.h:1838 [inlined]
do_apply at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-8/src/builtins.c:730
PyObject at /home/user/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:86
_jl_invoke at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-8/src/gf.c:2549
#2 at /home/user/julia_pylock_json.jl:17
#open#377 at ./io.jl:384
open at ./io.jl:381 [inlined]
#1 at /home/user/julia_pylock_json.jl:16 [inlined]
lock at ./lock.jl:185
pylock at /home/user/julia_pylock_json.jl:11 [inlined]
write_json_file at /home/user/julia_pylock_json.jl:15 [inlined]
file_operation at /home/user/julia_pylock_json.jl:27
#7 at ./threadingconstructs.jl:258
unknown function (ip: 0x7ff6d9b369bf)
_jl_invoke at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-8/src/gf.c:2549
jl_apply at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-8/src/julia.h:1838 [inlined]
start_task at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-8/src/task.c:931
Allocations: 11530031 (Pool: 11523201; Big: 6830); GC: 49
Segmentation fault (core dumped)

@mirkobunse
Copy link

Sorry, I'm not able to fix this example. Using @sync and replacing wait with fetch seems to bring some stability, i.e., the example works more often than it breaks. Since these changes are no definite fixes, however, I will not further elaborate on them.

@the-noble-argon
Copy link

Yeah, this is a tough one that's hard to track down. I got it to a point in #1006 where I could still use @async on PyCall and @threads for the multithreaded Julia code, I wasn't able to crash that. I think there is something about @Spawn that really trips up a running PyCall task. Only being able to use @threads while PyCall isn't running isn't ideal, but at least that's better than Python, and @threads got better with Julia 1.8. Honestly, I mostly use PyCall for IO related tasks on Azure, so doing an @async on that will likely be just as good as multithreading, if we could use @Spawn with impunity, but I guess I can settle on @threads.

@the-noble-argon
Copy link

the-noble-argon commented Sep 19, 2022

@mirkobunse I found a solution that might interest you. I think garbage collection was being triggered in another thread (which is why the segfault happened indeterministically) and probably nuked something running in PyCall. The solution is a bit kludgy, but I fixed this by modifying the pylock() function to temporarily disable GC in the same manner that the lock/unlock is being performed under the hood.

function pylock(f::Function)
    GC.enable(false)
    lock(PYLOCK[])
    try 
        return f()
    finally
        unlock(PYLOCK[])
        GC.enable(true)
    end
end

This of course validates yet again the importance of this issue with respect to protecting PyCall tasks from GC calls from other threads.

@mirkobunse
Copy link

This solution is amazing, thanks!

The GC.enable should come after the lock, though. Otherwise, the GC is likely disabled for most of the time because some thread is always waiting at this point before the GC is enabled again.

I have also replaced the wait with @sync and @async. This change makes the execution more dynamic in the sense that the individual iterations of the loop can be interleaved. The speedup is quite considerable: I achieve 20s execution time instead of 50s with 4 threads.

This is your example with the changes I propose:

using PyCall
const PY_JSON = pyimport("json")

const PYLOCK = Ref{ReentrantLock}()
PYLOCK[] = ReentrantLock()

# acquire the lock before any code calls Python
pylock(f::Function) = Base.lock(PYLOCK[]) do
    prev_gc = GC.enable(false)
    try 
        return f()
    finally
        GC.enable(prev_gc) # recover previous state
    end
end

function write_json_file(fileName::String, outputData::Dict)
    pylock() do
        open(fileName, "w") do outputFile
            PY_JSON.dump(deepcopy(outputData), outputFile)
        end
    end
end

@sync @async for i in 1:100
    write_json_file("testfile.json", Dict("a"=>randn()))
    Threads.@spawn log.(exp.(randn(10000000)))
    @async for ii in 1:30
        Threads.@spawn write_json_file("testfile$(i)_$(ii).json", Dict("a"=>randn()))
    end
end

@the-noble-argon
Copy link

the-noble-argon commented Sep 20, 2022

Right, put the GC.enable inside the lock. That makes sense. Moreover, your new pylock() function increases generality in case the GC was already turned off. I mean, right now that's a kludge until we can figure out how to make the Python code safer against garbage collection from other threads. However, as you showed, this is a viable workaround as it allows us to really weave in the Python calls with all the other multithreaded tasks. You have no idea how excited I am about this solution, I've been suffering from this problem for a couple of years!

@mirkobunse
Copy link

mirkobunse commented Sep 20, 2022

Same for me :D Thanks so much for the GC.enable fix!

@martinmestre
Copy link

martinmestre commented Aug 16, 2023

Hi, I am obtaining segmentation fault when trying to use multithreading with NOMAD.jl where my loss functions are in Python and I am using PyCall.jl :
The error says:
Caught seg fault in thread 0

Has anyone had a similar problem?

@the-noble-argon
Copy link

Hi, I am obtaining segmentation fault when trying to use multithreading with NOMAD.jl where my loss functions are in Python and I am using PyCall.jl : The error says: Caught seg fault in thread 0

Has anyone had a similar problem?

Multiple times. When calling Python in a multithreaded environment you have to use the pylock() pattern we figured out in the discussion above. Issue #1006 also has this workaround. First, you have to build a PyLock function that locks the python process and disables the garbage collector while the Python code is running.

const PYLOCK = Ref{ReentrantLock}()
PYLOCK[] = ReentrantLock()

# acquire the lock before any code calls Python
pylock(f::Function) = Base.lock(PYLOCK[]) do
    prev_gc = GC.enable(false)
    try 
        return f()
    finally
        GC.enable(prev_gc) # recover previous state
    end
end

You apply the pylock function around your PyCall code.

    pylock() do
        #your python code here
    end

@martinmestre
Copy link

Thank you! I will try as you suggest.

@martinmestre
Copy link

Hi @the-noble-argon , when I do

    pylock() do
         χ² = stream.chi2_full(θ, ω, β, m, ic, r☼)
    end
return χ²

the first time that χ² appears in the editor is marked as not used. Does pylock introduce a new scope?

@stevengj
Copy link
Member Author

stevengj commented Aug 17, 2023

Yes, because it's a standard do construct in Julia, which is sugar for an anonymous function. Try

χ² = pylock() do
    stream.chi2_full(θ, ω, β, m, ic, r☼)
end

@martinmestre
Copy link

Thanks @stevengj , now it starts running but doesn't use the 4 threads I set with julia -t 4 and it stops with the following error:

Caught seg fault in thread 0
Caught seg fault in thread 0
terminate called after throwing an instance of 'NOMAD_4_3::Exception'
terminate called recursively
  what():  NOMAD::Exception thrown (/workspace/srcdir/nomad/src/Algos/Step.cpp, 106) Caught seg fault

[110481] signal (6.-6): Aborted

Is that a known problem?

@the-noble-argon
Copy link

the-noble-argon commented Aug 17, 2023

This doesn't look like the kinds of errors I get when PyCall segfaults on me, in such cases I get very long messages. I'm not sure this is a PyCall.jl issue. Can you try calling your loss function repeatedly in a multithreaded loop without NOMAD.jl and see if you get any more informative errors? Is there a way to track what NOMAD.jl uses as input to your loss function and build a multithreaded for loop for that?

Before trying this however, we may need to take a step backward. You should be aware that a python process can't be run in a multithreaded manner (blame Python's GIL which I hear they're trying to remove yet again). If the pylock() pattern is properly used, it will prevent any python code from running in parallel and prevent any garbage collection while python code is running. So if most of your computational effort is spent running your Python loss function, multithreading will achieve little more than causing you anguish. You may need to rewrite your loss function in Julia if you want your loss function to multithread.

@martinmestre
Copy link

martinmestre commented Aug 17, 2023

Thanks! Good to know what you say so that I stop trying with this path. The Python loss function is by orders of manitude that most time consuming part of my code. For a future project I will re-write it in julia. If I use distributed processes, will I have the same problem with Python's GIL ?
I still do not understand why Python can't work in real multithreading, I have used the standard libray within Python.

@the-noble-argon
Copy link

the-noble-argon commented Aug 17, 2023

I think that should work. Distributed processes have their own memory spaces managed by the O.S. so you don't run into the nasty kinds of problems of multiple threads changing things in the same process memory space. You can have as many Python processes running on one machine as you want (obviously within reason).

Python was never meant for high performance computing, it was meant to glue many (potentially high performance) packages together. Many Python packages do multithreading under the hood because they are written in high performance multithreaded languages. It may surprise you, but most data science isn't actually executed in Python, it's executed with Python libraries written in languages like C++. The core aim of many Python developers is to spend as little compute time in Python as possible, but have most compute done in high-performance libraries.

I use a lot of Python, but what I DONT use it for is optimization, especially if I have to write the objective/loss function myself. These functions get called over and over again, so there is a huge benefit from having the loss function compiled (which Python doesn't do unless you dabble in Numba/Cython, but those are finicky about the data types you can use). Because of Julia's flexibility, ease of writing, and JIT compiling nature, it is hands-down the best language I've ever used for optimizing custom objective functions. If this wasn't enough, Julia also has some really good automatic differentiation tools that analyze your code and writes the derivatives for you (if your loss function is differentiable and entirely written in Julia). When I write Julia/Python combinations, I use PyCall to call the third party APIs/SDKs (because everyone writes APIs/SDKs for Python) and Julia for the heavy number crunching.

@martinmestre
Copy link

Yes, you are right. The loss function is in python because of a long story but at present I write everything in Julia. Anyway I will have to call some coordinate transformations from Astropy (Python api). Can I do automatic differentiation throgh julia code that calls PyCall or PythonCall ?

@the-noble-argon
Copy link

the-noble-argon commented Aug 17, 2023

No, you can't differentiate through anything that isn't Julia, you would have to write your own differentiation rule for anything that leaves the Julia environment. I learned that the hard way when trying to do a maximum-likelihood estimate of a gamma distribution, which called the incomplete gamma function which called an R function written in Fortran under the hood. But I COULD maximize a normal distribution, because the erf implementation was in Julia. Go figure.

@martinmestre
Copy link

Thanks a lot!

@mirkobunse
Copy link

If differentiation is your main concern, you might want to try Python's JAX package. It has a numpy wrapper, which can be used just like regular numpy, and which makes most numpy functions automatically differentiable. You just need to import this wrapper

import jax.numpy as jnp

instead of importing the original numpy and use automatic differentiation

import jax
def f(x): # custom function
    return jnp.tanh(jnp.sqrt(x) + 1)
g = jax.grad(f)
g(0.2) # = 0.22217956

to differentiate (almost) any numpy-based function.

Translating your original numpy code to JAX code might be easier than translating everything to Julia.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants