Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nuitka extension file loader needs to load extension modules in "create_module" not only "exec_module" was (Pymongo lazy import) #2833

Open
sullivan50909 opened this issue May 2, 2024 · 17 comments
Assignees
Milestone

Comments

@sullivan50909
Copy link

  • Nuitka version, full Python version, flavor, OS, etc. as output by this exact command.

python3 -m nuitka --version
2.2
Commercial: None
Python: 3.9.18 (main, Jan 24 2024, 00:00:00)
Flavor: Fedora Python
Executable: /usr/bin/python3
OS: Linux
Arch: x86_64
Distribution: Rhel (based on Fedora) 9.3
Version C compiler: /usr/bin/gcc (gcc 11)

  • How did you install Nuitka and Python
    Started from clean ubi9 docker container which has python pre-installed. No virtualenv.
    yum install gcc python3-devel
    pip3 install nuitka
    pip3 install patchelf

  • The specific PyPI names and versions
    The problem is with pymongo.
    This works:
    pymongo==4.6.3
    Any newer version fails (4.7, 4.7.1 and their factory branch)

  • Also supply a Short, Self Contained, Correct, Example

Start from clean ubi9 docker container (I don't think there is anything special about ubi9 or docker, but it is how I tested it)
Yum install pip
Option1: Working Version:
pip install pymongo==4.6.3
Option2: Non-working Version:
pip install pymongo==4.7.0

create hello.py with the following content:

from pymongo import MongoClient
print("hello from pymongo test")

Demonstrate that it works without compilation:
python3 hello.py
Compile

yum install gcc python3-devel
pip3 install nuitka
pip3 install patchelf
python3 -m nuitka --onefile hello.py
./hello.bin

At this point, what happens depends on the version of pymongo. Hello.bin replies as expected for version 4.6.3. With pymongo's factory branch, it errors out during compilation. With version 4.7, it errors when attempting to run hello.bin.

I know that this example makes it apperar that the problem is with pymongo, but I think there are a few issues going on. I'm working the pymongo root-cause issues separately. Note that before compilation, both versions of pymongo work.
The reason I am contacting you is that I think Nuitka is having trouble with the lazy import features that pymongo added in version 4.7. I think that Nuitka is failing to include a module, and then pymongo is having trouble reporting the error.

  • Note if this is a regression

No

@sullivan50909
Copy link
Author

I have the nuitka-crash-report.xml file, but github gives me an error when I try to attach it.
Github error when I try to attach the xml file:
We don’t support that file type.
Try again with GIF, JPEG, JPG, MOV, MP4, PNG, SVG, WEBM, CPUPROFILE, CSV, DMP, DOCX, FODG, FODP, FODS, FODT, GZ, JSON, JSONC, LOG, MD, ODF, ODG, ODP, ODS, ODT, PATCH, PDF, PPTX, TGZ, TXT, XLS, XLSX or ZIP.

@KRRT7 KRRT7 self-assigned this May 2, 2024
@blink1073
Copy link

blink1073 commented May 2, 2024

pymongo developer here, as best as I can tell Nuitka does not fully support lazy imports.

Example code:

import importlib.util
import sys
def lazy_import(name):
    spec = importlib.util.find_spec(name)
    loader = importlib.util.LazyLoader(spec.loader)
    spec.loader = loader
    module = importlib.util.module_from_spec(spec)
    sys.modules[name] = module
    loader.exec_module(module)
    return module
lazy_zlib = lazy_import("zlib")
print(lazy_zlib.crc32)
$ python3 hello.py
<built-in function crc32>
$ python3 -m nuitka --onefile hello.py
$ ./hello.bin
Traceback (most recent call last):
  File "/tmp/onefile_6892_1714611657_437491/hello.py", line 12, in <module>
    print(lazy_zlib.crc32)
  File "/tmp/onefile_6892_1714611657_437491/importlib/util.py", line 250, in __getattribute__
ValueError: module object for 'zlib' substituted in sys.modules during a lazy load

Is there a way we can detect that we are running under Nuitka and eagerly import instead?

@blink1073
Copy link

Another aspect of the problem is the handling of ModuleNotFoundError.

$ cat hello.py
e = ModuleNotFoundError(name="foo")
print(e.name)
$ python3 hello.py
foo
$ python3 -m nuitka --onefile hello.py
$ ./hello.bin
Traceback (most recent call last):
  File "/tmp/onefile_7078_1714612025_862116/hello.py", line 1, in <module>
    e = ModuleNotFoundError(name="foo")
TypeError: exceptions.ModuleNotFoundError does not take keyword arguments

@KRRT7
Copy link
Contributor

KRRT7 commented May 2, 2024

@blink1073 if you'd like to add in nuitka support from within pymongo, then it'd look like something like this

    if spec is None:
        cond = "__compiled__" in globals()
        if cond:
            sys.exit(f"Nuitka: For this environment to load, need to use this as an option to compile with --include-module={name}` with Nuitka to include it")
        raise ModuleNotFoundError(name=name)

@KRRT7
Copy link
Contributor

KRRT7 commented May 2, 2024

@kayhayen
the signature says

class ModuleNotFoundError(
    *args: object,
    name: str | None = ...,
    path: str | None = ...
)

so it needs to accept those.

@kayhayen
Copy link
Member

kayhayen commented May 2, 2024

There is special handling for ImportError and its arg only, I don't think I noticed ModuleNotFoundError yet.

I also never saw importlib.util.LazyLoader so far, I will check it out. Seems also old, since you saw that on 3.9, given the function you need to use there, I wonder

I don't want there to be a need to change the code for Nuitka of course. Compiled modules interact themselves with sys.modules and put themselves there, somehow LazyLoader seems to dislike that, but we will find ways to cooperate I guess.

@kayhayen kayhayen added the bug label May 2, 2024
@kayhayen kayhayen self-assigned this May 2, 2024
@kayhayen kayhayen added this to the 2.2 milestone May 2, 2024
@kayhayen
Copy link
Member

kayhayen commented May 2, 2024

I will start to add support for the ModuleNotFoundError keyword arguments. Since we got a good precedent with ImportError surely that cannot be too hard, although it's besides the point, but visibility of that issue will go away once we fix the other bug.

@blink1073
Copy link

Thanks all, here is what I have for now as a workaround:

def lazy_import(name: str) -> ModuleType:
    """Lazily import a module by name

    From https://docs.python.org/3/library/importlib.html#implementing-lazy-imports
    """
    # Eagerly import on Nuitka
    if "__compiled__" in globals():
        return importlib.import_module(name)

    try:
        spec = importlib.util.find_spec(name)
    except ValueError:
       raise ImportError(name) from None  # use ImportError instead of ModuleNotFoundError
    if spec is None:
        raise ImportError(name)
    ...

@sullivan50909, I'll update my PR accordingly

@kayhayen
Copy link
Member

kayhayen commented May 2, 2024

Thanks for your report, this is worked on the factory branch, which is a development version under rapid development. You can try it out by going here: https://nuitka.net/doc/factory.html

Feedback on whether this is working is very welcome. Please do not share plans to do it; only confirm or deny that it is working.

@kayhayen
Copy link
Member

kayhayen commented May 2, 2024

So, it seems lazy loader for the example code is working as expected. As for lazy loading, I would normally recommend to use the lazy package, that works out of the box. If you have a minimal reproducer for this in pymongo, that would be great, the example you give is not doing it for me, maybe because zlib is built-in or not something for Fedora Python vs. my Ubuntu Python, I have to check.

@kayhayen
Copy link
Member

kayhayen commented May 2, 2024

Yes, being built-in, means Nuitka doesn't step it, trying with my badly self-compiled 3.9 now, that probably doesn't do it, since it's an extension module there. Extension module loading will also be different from compiled module loading.

@kayhayen
Copy link
Member

kayhayen commented May 2, 2024

Yeah, I managed to reproduce it, checking the source code now, I came across this:

    def __getattribute__(self, attr):
        """Trigger the load of the module and return the attribute."""
        # All module metadata must be garnered from __spec__ in order to avoid
        # using mutated values.
        # Stop triggering this method.
        self.__class__ = types.ModuleType

Seriously?

@KRRT7
Copy link
Contributor

KRRT7 commented May 2, 2024

Thanks all, here is what I have for now as a workaround:

def lazy_import(name: str) -> ModuleType:
    """Lazily import a module by name

    From https://docs.python.org/3/library/importlib.html#implementing-lazy-imports
    """
    # Eagerly import on Nuitka
    if "__compiled__" in globals():
        return importlib.import_module(name)

    try:
        spec = importlib.util.find_spec(name)
    except ValueError:
       raise ImportError(name) from None  # use ImportError instead of ModuleNotFoundError
    if spec is None:
        raise ImportError(name)
    ...

@sullivan50909, I'll update my PR accordingly

if you keep the snippet i have, it'll be true lazy loading and will also help with exe's debloat, snappy i found to be a large dependency which is why i did it that way.

@kayhayen
Copy link
Member

kayhayen commented May 2, 2024

So, it seems there is an incompatibility between the meta path-based loader of Nuitka and the one of Python3.9 or higher. Nuitka loads the extension module and executes a "def" if found (not the case for zlib) during exec_module. That changes the module in sys.modules then, as zlib does that during its execution. In the ExtensionFileLoader at that point, nothing is happening anymore, there is no "def" to execute anymore, this has happened when create_module is called, and there it re-uses the value in sys.modules it seems, but it doesn't do that for Nuitka somehow in exec_module, so I don't know yet, what to do yet.

Somehow, imp.create_dynamic must be capable of enforcing load of the module within the existing value.

@kayhayen
Copy link
Member

kayhayen commented May 2, 2024

So, this can be distilled down to this:

import importlib.util
import sys
spec = importlib.util.find_spec("zlib")
module = importlib.util.module_from_spec(spec)
sys.modules["zlib"].crc32(b"a")

Basically, module_from_spec, which only calls create_module and, depending on the extension module, may or may not need exec_module to be called, should already populate sys.modules, but in Nuitka, it never does that until exec_module is called.

@kayhayen
Copy link
Member

kayhayen commented May 2, 2024

This is an invasive change, but not too bad, I will try and tackle it for the next round of hotfixes, 2.2.1 has to go out now, 2.2.2 will be a week later likely.

@kayhayen kayhayen changed the title Pymongo lazy import Nuitka extension file loader needs to load extension modules in "create_module" not only "exec_module" was (Pymongo lazy import) May 2, 2024
@sullivan50909
Copy link
Author

@blink1073 created a workaround: mongodb/mongo-python-driver@ff950f0
The workaround solves my immediate issue, so there is no rush on the Nuitka fix. I do think it will be good to have it fixed in Nuitka in case this happens again with a different 3rd party code package that isn't as responsive.
Thanks @kayhayen @KRRT7 and @blink1073 for working this so quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants