Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow statically linked Python like PythonCall? #496

Open
marius311 opened this issue May 8, 2022 · 14 comments
Open

Allow statically linked Python like PythonCall? #496

marius311 opened this issue May 8, 2022 · 14 comments

Comments

@marius311
Copy link
Contributor

marius311 commented May 8, 2022

I'm not an expert and don't know the internals, but is there a reason PyCall can't do whatever PythonCall / juliacall does that lets the user use any Python executable, including ones with a statically linked libpython? Is there anything preventing what they're doing to be used here? A probably related question posted here: JuliaPy/PyCall.jl#988

@oschulz
Copy link

oschulz commented Jul 17, 2022

That would be so awesome!

@mkitti
Copy link
Member

mkitti commented Nov 8, 2022

Here is cjdoris's response to marius311 on the topic:
https://discourse.julialang.org/t/ann-pythoncall-and-juliacall/76778/16

I’ve encountered that issue before in pyjulia but don’t actually know its cause.

I imagine the difference is in how the packages load libpython. In JuliaCall, we pass ctypes.pythonapi._handle to PythonCall, which is a pointer to an already-open libpython. I assume PyJulia/PyCall opens libpython itself.

Indeed, he's right:
https://docs.python.org/3/library/ctypes.html

ctypes.pythonapi
An instance of PyDLL that exposes Python C API functions as attributes. Note that all these functions are assumed to return C int, which is of course not always the truth, so you have to assign the correct restype attribute to use these functions.

$ ldd `which python`
	linux-vdso.so.1 (0x00007ffe269cb000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fcf5c547000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fcf5c53f000)
	libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007fcf5c537000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fcf5c52f000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fcf5c447000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fcf5c21f000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fcf5c917000)

$ `which python`
Python 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:06:46) [GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
>>> ctypes.pythonapi
<PyDLL 'None', handle 7f106618a2e0 at 0x7f10655c2980>

@oschulz
Copy link

oschulz commented Nov 8, 2022

@mkitti so PyCall/pyjulia could do that as well?

@mkitti
Copy link
Member

mkitti commented Nov 8, 2022

I think so. We technically just need the pointer.

@oschulz
Copy link

oschulz commented Nov 8, 2022

Oh that would be awesome! I guess Packages like PySr (@MilesCranmer), diffeqpy (@ChrisRackauckas) and so on would profit a lot from that as well.

@mkitti
Copy link
Member

mkitti commented Nov 23, 2022

I would like to review the situation here.

Part of the issue is that pyjulia is only half of the equation here. The other half is PyCall.jl.

In JuliaPy/PyCall.jl#612, they were trying to load the python executable as libpython due to PIE (Position Independent Executables).

In the linked comment above, @cjdoris demonstrates that we do not need to load python executable or libpython since we could just reuse ctypes.pythonapi._handle as is done in juliacall / PythonCall. In juliacall, the pointer is passed through an environment variable.

How is ctypes.pythonapi._handle loaded when Python is statically linked to libpython?

Looking into ctypes we see that pythonapi is set to PyDLL(None). The name argument and the _name field of PyDLL, a subclass of CDLL is set to None.

>>> import ctypes
>>> ctypes.pythonapi
<PyDLL 'None', handle 7f084713e2e0 at 0x7f08464d3e10>
>>> ctypes.pythonapi._name
>>> ctypes.pythonapi._name == None
True

_name is subsequently passed to _dlopen which on POSIX systems is just libdl C routine dlopen.

If we look at the man page for dlopen(3) we see this call to dlopen will return a handle to the executable.

If filename is NULL, then the returned handle is for the main program.

Can we obtain the pythonapi pointer handle with dlopen in Julia?

This suggests that we can use dlopen from Julia to obtain the same pointer. While there are a few layers of indirection involved, passing an empty string to Julia's Libdl.dlopen appears to work.

# Start from ipython
In [1]: import ctypes

In [2]: hex(ctypes.pythonapi._handle)
Out[2]: '0x7f054fcae2e0'

In [3]: from julia.api import LibJulia

In [4]: api = LibJulia.load()

In [5]: api.init_julia()

In [6]: api
Out[6]: <julia.libjulia.LibJulia at 0x7f054c735fd0>

# Launch Julia REPL from Python
In [7]: api.jl_eval_string(b"""
   ...: import REPL;
   ...: term = REPL.Terminals.TTYTerminal("dumb", stdin, stdout, stderr);
   ...: repl = REPL.LineEditREPL(term, true);
   ...: REPL.run_repl(repl);
   ...: """)
julia> using Libdl

julia> python_ptr = dlopen("")
Ptr{Nothing} @0x00007f054fcae2e0

We see above that the pointer from ctypes.pythonapi._handle is exactly the same pointer we obtain by invoking Libdl.dlopen("") in Julia.

Can we obtain symbols from this pointer?

julia> Py_IsInitialized = dlsym(python_ptr, :Py_IsInitialized)
Ptr{Nothing} @0x0000555d21fd3890

julia> ccall(Py_IsInitialized, Cint, ())
1

julia> Py_GetVersion = dlsym(python_ptr, :Py_GetVersion)
Ptr{Nothing} @0x0000555d21fe0580

julia> ccall(Py_GetVersion, Cstring, ()) |> unsafe_string
"3.11.0 | packaged by conda-forge | (main, Oct 25 2022, 06:24:40) [GCC 10.4.0]"

Concluding statements

We can obtain ctypes.pythonapi._handle by calling dlopen("") in Julia when started from Python. For juliacall an environment variable may not have be used to transmit the pointer. For pyjulia and PyCall.jl this simplifies the method to obtain pythonapi pointer.

@cjdoris
Copy link

cjdoris commented Nov 23, 2022

That's cool!

I just took a quick look from JuliaCall and it's true dlopen("") returns the same handle on Linux, but it throws an error on Windows:

could not load library ""
The parameter is incorrect.

Plus the behaviour of dlopen("") is undocumented, so personally I'm steering clear of it.

@mkitti
Copy link
Member

mkitti commented Nov 23, 2022

While I agree that dlopen("") is undocumented at the Julia API level, it does correspond to the documented behavior at thr C API level.

The use of ctypes.pythonapi._handle is also equally undocumented. The underlying mechanism basically depends on the same behavior.

@cjdoris
Copy link

cjdoris commented Nov 23, 2022

Actually ctypes.pythonapi is documented to be a PyDLL and PyDLL._handle is documented to be the system handle - in this case the underscore is not indicating an internal attribute, but is to avoid name clashes with symbols in the DLL.

@mkitti
Copy link
Member

mkitti commented Nov 23, 2022

You're right, I concede the point.

https://docs.python.org/3/library/ctypes.html#ctypes.PyDLL._handle

Also dlopen("") does not work on macOS and really should be dlopen(C_NULL) which doesn't work. See JuliaLang/julia#22318. One would have to do

ccall(:jl_load_dynamic_library, Ptr{Cvoid}, (Ptr{Nothing},UInt32,Cint), C_NULL, RTLD_GLOBAL, Cint(1))

That does work.

xref: JuliaLang/julia#22318

@mkitti
Copy link
Member

mkitti commented Nov 23, 2022

On macOS ctypes.pythonapi._handle is 0xfffffffffffffffe.

In [1]: import ctypes

In [2]: ctypes.pythonapi._handle
Out[2]: 18446744073709551614

In [3]: hex(ctypes.pythonapi._handle)
Out[3]: '0xfffffffffffffffe'

This is actually the value of RTLD_DEFAULT: https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/dlsym.3.html

If dlsym() is called with the special handle RTLD_DEFAULT, then all mach-o macho
o images in the process (except those loaded with dlopen(xxx,
RTLD_LOCAL)) are searched in the order they were loaded. This can be a
costly search and should be avoided.

@cjdoris
Copy link

cjdoris commented Nov 23, 2022

🤯

I've never actually tried JuliaCall on Mac. I wonder if it works. I should really set up tests and CI.

Edit: It works fine! And indeed the handle is that special value.

That very last sentence ("this can be a costly search") may explain why loading in ~100 symbols takes so long in PythonCall (~1sec), one reason why PyCall is much faster to load.

@mkitti
Copy link
Member

mkitti commented Nov 23, 2022

On macOS, you can just dlopen the executable. At the moment the timing does not look terrible.

In [1]: from julia.api import LibJulia

In [2]: api = LibJulia.load()

In [3]: api.init_julia()

In [4]: api.jl_eval_string(b"""
   ...: import REPL;
   ...: term = REPL.Terminals.TTYTerminal("dumb", stdin, stdout, stderr);
   ...: repl = REPL.LineEditREPL(term, true);
   ...: REPL.run_repl(repl);
   ...: """)

julia> python_path = ccall(:_dyld_get_image_name, Cstring, (UInt32,), 0) |> unsafe_string
"~/miniforge3-x86_64/envs/pyjulia_test_x86_64/bin/python3.11"

julia> python_handle = dlopen(python_path)
Ptr{Nothing} @0x000000021ba297e0

julia> Py_IsInitialized = dlsym(python_handle, :Py_IsInitialized)
Ptr{Nothing} @0x0000000104d78020

julia> ccall(Py_IsInitialized, Cint, ())
1

julia> @btime dlsym(python_handle, :Py_IsInitialized)
  253.612 ns (1 allocation: 16 bytes)
Ptr{Nothing} @0x0000000104d78020

julia> RTLD_DEFAULT = Ptr{Nothing}(0xfffffffffffffffe)
Ptr{Nothing} @0xfffffffffffffffe

julia> @btime dlsym(RTLD_DEFAULT, :Py_IsInitialized)
  270.565 ns (1 allocation: 16 bytes)
Ptr{Nothing} @0x0000000104d78020

PyCall.jl does a lot of symbol loading during precompilation. That is also going to make it difficult for using this pointer though and is also why it doesn't work with a statically linked python executable unless compiled_modules = false (e.g. no precompilation).

My thought is that this could benefit from a lazy symbol loading scheme such as the one I put into GR.jl:
https://github.com/jheinen/GR.jl/blob/db3e5f53738be892b23317d673179a32b0e50910/src/funcptrs.jl#L74-L86

@cjdoris
Copy link

cjdoris commented Nov 24, 2022

My thought is that this could benefit from a lazy symbol loading scheme such as the one I put into GR.jl:
https://github.com/jheinen/GR.jl/blob/db3e5f53738be892b23317d673179a32b0e50910/src/funcptrs.jl#L74-L86

Yeah thanks, I've got something similar in a branch somewhere....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants