Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with non-ascii characters in the JVM path on Windows #1111

Open
villares opened this issue Nov 14, 2022 · 16 comments
Open

Issue with non-ascii characters in the JVM path on Windows #1111

villares opened this issue Nov 14, 2022 · 16 comments

Comments

@villares
Copy link

villares commented Nov 14, 2022

Context: in Brazil Around the world, many users have names, usernames and various directories containing non-ascii characters. I found this issue because some students couldn't run the same project/tools that other students (and myself) could run.

A mininmal way to reproduce is to put the JDK in a non-ascii path location. And then try the REPL steps bellow:

Python 3.10.6 (C:\Users\abav\OneDrive\Desktop-non-ascii-path\python.exe)
>>> import jpype
>>> jpype.getDefaultJVMPath()
'C:\\Users\\abav\\OneDrive\\Desktop\\é-non-ascii-path\\user_data\\jdk-17\\bin\\server\\jvm.dll'
>>> jpype.startJVM(jvmpath='C:\Users\abav\OneDrive\Desktop\é-non-ascii-path\user_data\jdk-17\bin\server\jvm.dll')
  File "<stdin>", line 1
    jpype.startJVM(jvmpath='C:\Users\abav\OneDrive\Desktop\é-non-ascii-path\user_data\jdk-17\bin\server\jvm.dll')
                                                                                                                ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
>>> jpype.startJVM()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\abav\OneDrive\Desktop\é-non-ascii-path\lib\site-packages\jpype\_core.py", line 218, in startJVM
    _jpype.startup(jvmpath, tuple(args),
OSError: [WinError 126] JVM DLL not found: C:\Users\abav\OneDrive\Desktop-non-ascii-path\user_data\jdk-17\bin\server\jvm.dll

Removing the é character from the folder name allows the JVM to start.

This seems to be a Windows issue, tested on Windows 10 21H1, and It seems not to be present on Linux.

@Thrameos
Copy link
Contributor

I am going to have to depend on you to find a way to address the issue, as my local system likely won't produce the correct result. Given that it is a SystemError rather than a Java error of some kind I am not sure if it is actually getting to the Java code. Most likely some encoding converter when passing the string from Python to Java or in this case the C library forget to specify the encoding.

The specific code path is

Raw char* gets passed to Windows LoadLibrary() call

        virtual void loadLibrary(const char* path) override
        {
                JP_TRACE_IN("Win32PlatformAdapter::loadLibrary");
                jvmLibrary = LoadLibrary(path); // <=====
                if (jvmLibrary == NULL)
                {
                        JP_RAISE_OS_ERROR_WINDOWS( GetLastError(), path);
                }
                JP_TRACE_OUT;
        }

Which was called from JPContext::loadEntryPoints()

void JPContext::loadEntryPoints(const string& path)
{
        JP_TRACE_IN("JPContext::loadEntryPoints");
        JPPlatformAdapter *platform = JPPlatformAdapter::getAdapter();
        // Load symbols from the shared library
        platform->loadLibrary((char*) path.c_str());  // <===
        CreateJVM_Method = (jint(JNICALL *)(JavaVM **, void **, void *) )platform->getSymbol("JNI_CreateJavaVM");
        GetCreatedJVMs_Method = (jint(JNICALL *)(JavaVM **, jsize, jsize*))platform->getSymbol("JNI_GetCreatedJavaVMs");        JP_TRACE_OUT;
}

Which was called from JPContext::startJVM

void JPContext::startJVM(const string& vmPath, const StringVector& args,
                bool ignoreUnrecognized, bool convertStrings, bool interrupt)
{
        JP_TRACE_IN("JPContext::startJVM");

        JP_TRACE("Convert strings", convertStrings);
        m_ConvertStrings = convertStrings;

        // Get the entry points in the shared library
        try
        {
                JP_TRACE("Load entry points");
                loadEntryPoints(vmPath);  // <====
        } catch (JPypeException& ex)
        {
                ex.getMessage();
                throw;
        }
   ....
}

Which came through PyJPModule_startup

static PyObject* PyJPModule_startup(PyObject* module, PyObject* pyargs)
{
        JP_PY_TRY("PyJPModule_startup");

        PyObject* vmOpt;
        PyObject* vmPath;
        char ignoreUnrecognized = true;
        char convertStrings = false;
        char interrupt = false;

        if (!PyArg_ParseTuple(pyargs, "OO!bbb", &vmPath, &PyTuple_Type, &vmOpt,
                        &ignoreUnrecognized, &convertStrings, &interrupt))
                return NULL;

        if (!(JPPyString::check(vmPath)))
        {
                PyErr_SetString(PyExc_TypeError, "Java JVM path must be a string");
                return NULL;
        }

        string cVmPath = JPPyString::asStringUTF8(vmPath);
        JP_TRACE("vmpath", cVmPath);

        StringVector args;
        JPPySequence seq = JPPySequence::use(vmOpt);

        for (int i = 0; i < seq.size(); i++)
        {
                JPPyObject obj(seq[i]);

                if (JPPyString::check(obj.get()))
                {
                        // TODO support unicode
                        string v = JPPyString::asStringUTF8(obj.get());  // <=====
                        JP_TRACE("arg", v);
                        args.push_back(v);
                } else
                {
                        PyErr_SetString(PyExc_TypeError, "VM Arguments must be strings");
                        return NULL;
                }
        }

        // This section was moved down to make it easier to cover error cases
        if (JPContext_global->isRunning())
        {
                PyErr_SetString(PyExc_OSError, "JVM is already started");
                return NULL;
        }

        // install the gc hook
        PyJPModule_installGC(module);
        PyJPModule_loadResources(module);
        JPContext_global->startJVM(cVmPath, args, ignoreUnrecognized != 0, convertStrings != 0, interrupt != 0);

        Py_RETURN_NONE;
        JP_PY_CATCH(NULL);
}

So we can see where the party responsible for encoding that string was

string JPPyString::asStringUTF8(PyObject* pyobj)
{
        JP_TRACE_IN("JPPyUnicode::asStringUTF8");
        ASSERT_NOT_NULL(pyobj);

        if (PyUnicode_Check(pyobj))
        {
                Py_ssize_t size = 0;
                char *buffer = NULL;
                JPPyObject val = JPPyObject::call(PyUnicode_AsEncodedString(pyobj, "UTF-8", "strict")); // <===== (1)
                PyBytes_AsStringAndSize(val.get(), &buffer, &size);
                JP_PY_CHECK();
                if (buffer != NULL)
                        return string(buffer, size);
                else
                        return string();
        } else if (PyBytes_Check(pyobj))
        {
                Py_ssize_t size = 0;
                char *buffer = NULL;
                PyBytes_AsStringAndSize(pyobj, &buffer, &size);  // <===== (2)
                JP_PY_CHECK();
                return string(buffer, size);
        }
        // GCOVR_EXCL_START
        JP_RAISE(PyExc_TypeError, "Failed to convert to string.");
        return string();
        JP_TRACE_OUT;
        // GCOVR_EXCL_STOP
}

So if it is bytes it would not be converted, but if it was unicode it would be UTF-8 with strict. As you can see the encoder that did the work was Python. So either we need to change the encoding or push the error upstream as the encoding came from a Python call not a JPype one.

My guess is that the é character bare without encoding on the file system using a code which is above 128 which would be in the UTF range. The UTF-8 encoder rendered it with UTF codes but the LoadLibrary then didn't recognize.

Looking for similar issues:

https://forums.sketchup.com/t/help-with-win32api-loadlibrary-and-utf-8-paths/95376/4
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/67561296-a414-4364-a917-2c826053f503/loadlibrary-failed-with-use-multibyte-character-set?forum=windowsgeneraldevelopmentissues

So one solution would be to convert to wide characters from UTF-8 then call LoadLibraryW. Or we could try a different converter path on Python encoder rather than calling strict. It should just be a simple matter of trying combinations until it works and submitting a patch to consider for inclusion. (Even if it fixes your problem but breaks something else, I can work from the patch to make this code path special.)

Of course just fixing the startJVM may not get you working as you will also need to see if the classpath works as well. Given most of the Java API requires UTF-8 it likely will work.

Hopefully that helps you identify the source of the issue.

@villares
Copy link
Author

villares commented Nov 14, 2022

Cheers @Thrameos, thank you for looking into it so promptly!

I'm afraid this is well beyond my skills and knowledge. I'd like to ask some help from @vepo and @py5coding ... let's see what happens.

@Thrameos
Copy link
Contributor

Okay. I will try to see if I can replicate when I get a chance. If you do want to mess with it there are instructions for building JPype from source on the web, and at least for the encoding options it should be as little as changing the string value from "strict" to "ignore" or one of the other options and testing. Not sure if that would be fruitful but if it is then all I need is that info so I can start working on a production patch.

Assuming that changing the encoder doesn't work, the LoadLibraryW would be harder as you would need to pattern to convert a UTF-8 char* string into a wide string in C++. I don't recall it either and would have to research, but perhaps others can help you there. Perhaps I can make a proposed patch and you can test it?

Either way once we can replicate it, it should be fixable.

@hx2A
Copy link

hx2A commented Feb 15, 2024

Hi @Thrameos , I'd like to work on this issue so it can be fixed for our non-english speaking Windows users.

To make progress here I need to be able to fiddle with the code, run the jpype build, and test. Building jpype on Linux is easy and worked on the first try. I can't run the build on Windows though...can you tell me what I am doing wrong? I have installed Microsoft Visual Studio 2022 Community Edition on my machine and java 17.

This is the output when I run the build:

>python -m build .
* Creating venv isolated environment...
* Installing packages in isolated environment... (setuptools)
* Getting build dependencies for sdist...
running egg_info
writing JPype1.egg-info\PKG-INFO
writing dependency_links to JPype1.egg-info\dependency_links.txt
writing entry points to JPype1.egg-info\entry_points.txt
writing requirements to JPype1.egg-info\requires.txt
writing top-level names to JPype1.egg-info\top_level.txt
reading manifest file 'JPype1.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.jar' under directory 'native'
warning: no files found matching '*.class' under directory 'native'
no previously-included directories found matching 'test\classes\*'
adding license file 'LICENSE'
adding license file 'NOTICE'
adding license file 'AUTHORS.rst'
writing manifest file 'JPype1.egg-info\SOURCES.txt'
* Building sdist...
running sdist
running build_ext
Jar cache is missing, using --enable-build-jar to recreate it.
  javac -cp lib/asm-8.0.1.jar -d build\temp.win-amd64-cpython-38\Release\org.jpype\classes -g:none -source 1.8 -target 1.8 -encoding UTF-8 native\java\org\jpype\JPypeContext.java native\java\org\jpype\JPypeKeywords.java native\java\org\jpype\JPypeSignal.java native\java\org\jpype\JPypeUtilities.java native\java\org\jpype\PyExceptionProxy.java native\java\org\jpype\classloader\DynamicClassLoader.java native\java\org\jpype\classloader\JPypeClassLoader.java native\java\org\jpype\html\AttrGrammar.java native\java\org\jpype\html\AttrParser.java native\java\org\jpype\html\Html.java native\java\org\jpype\html\HtmlGrammar.java native\java\org\jpype\html\HtmlHandler.java native\java\org\jpype\html\HtmlParser.java native\java\org\jpype\html\HtmlTreeHandler.java native\java\org\jpype\html\HtmlWriter.java native\java\org\jpype\html\Parser.java native\java\org\jpype\javadoc\DomUtilities.java native\java\org\jpype\javadoc\Javadoc.java native\java\org\jpype\javadoc\JavadocException.java native\java\org\jpype\javadoc\JavadocExtractor.java native\java\org\jpype\javadoc\JavadocRenderer.java native\java\org\jpype\javadoc\JavadocTransformer.java native\java\org\jpype\manager\ClassDescriptor.java native\java\org\jpype\manager\MethodResolution.java native\java\org\jpype\manager\ModifierCode.java native\java\org\jpype\manager\TypeAudit.java native\java\org\jpype\manager\TypeFactory.java native\java\org\jpype\manager\TypeFactoryNative.java native\java\org\jpype\manager\TypeManager.java native\java\org\jpype\pickle\ByteBufferInputStream.java native\java\org\jpype\pickle\Decoder.java native\java\org\jpype\pickle\Encoder.java native\java\org\jpype\pkg\JPypePackage.java native\java\org\jpype\pkg\JPypePackageManager.java native\java\org\jpype\proxy\JPypeProxy.java native\java\org\jpype\ref\JPypeReference.java native\java\org\jpype\ref\JPypeReferenceNative.java native\java\org\jpype\ref\JPypeReferenceQueue.java native\java\org\jpype\ref\JPypeReferenceSet.java
warning: [options] bootstrap class path not set in conjunction with -source 8
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
1 warning
Copy file native/java\org\jpype\html\entities.txt build\temp.win-amd64-cpython-38\Release\org.jpype\classes\org\jpype\html\entities.txt
  jar cvf build\lib.win-amd64-cpython-38\org.jpype.jar -C build\temp.win-amd64-cpython-38\Release\org.jpype\classes .
error: [WinError 2] The system cannot find the file specified

ERROR Backend subprocess exited when trying to invoke build_sdist>python -m build .

The directory build\temp.win-amd64-cpython-38\Release\org.jpype\classes exists and has stuff in it. The directory build\lib.win-amd64-cpython-38 exists but is empty. What file is it trying to find there?

Although on Windows I can install JPype with the tar.gz source distribution and I can create new source distributions on my Linux machine. If that's a sufficient workflow for what needs to be done here, I can go with that instead.

As to testing, if I can't reproduce it on my machine I'll get help from @villares . Between the two of us, I think we can get to the bottom of this.

@hx2A
Copy link

hx2A commented Feb 15, 2024

Ooops, I just needed to add C:\Program Files\Java\jdk-17\bin to my PATH so it could find the jar.exe executable. God, I hate software development on Windows. In any case, I can now create a Windows wheel.

Next I'll try to reproduce @villares 's issue on my Windows machine and will try making that "strict" change you mentioned earlier. If I can't reproduce here, @villares will be able to help test.

@Thrameos
Copy link
Contributor

Thrameos commented Feb 15, 2024

Sounds good. This was one I wasn't able to replicate and slipped through the cracks. Thanks for taking it up.

@hx2A
Copy link

hx2A commented Feb 16, 2024

OK, I have created several JPype builds that change the errors param of PyUnicode_AsEncodedString in JPPyString::asStringUTF8 in the file jp_pythontypes.cpp. I tried 'ignore', 'replace', 'xmlcharrefreplace', and 'backslashreplace', which are the options mentioned in the Python documentation. There's also a baseline build which uses the original value 'strict'.

I emailed a folder with the compiled wheels to @villares for testing. The baseline build should reproduce the issue. Hopefully one of the other builds lets @villares start the JVM, which will be an important clue about what is going on and what we can do to fix it.

@villares
Copy link
Author

Hi, I tested all the wheels built by @hx2A on Windows 11 today, and got the same results as the baseline build. They work with ASCII paths, and break with non-ASCII paths:

working example

not_working

@hx2A
Copy link

hx2A commented Feb 20, 2024

It seems that changing the encoding didn't work, but the fact that it didn't work is useful information for planning next steps. @Thrameos , what would you like us to test next?

@Thrameos
Copy link
Contributor

I can give it another shot. My plan was to call the Python char to charge converter then LoadLibraryW. After all this is a Python string we are passing to a system call, and not really anything to do with Javas wacky encoding. The problem is that LoadLibraryW is a system call and I have no idea whether this wants the wide char for the displayed character or perhaps something from the region character set. Thus all I can do is hope Python knows how to get the right encoding.

Not having a region encoded Windows version means I may not get the same behavior.

See
https://docs.python.org/3/c-api/sys.html#c.Py_DecodeLocale

@hx2A
Copy link

hx2A commented Feb 21, 2024

@Thrameos , If you'd like to create a branch with some test code for us to try, I can take care of preparing the wheel and providing it to @villares for testing.

@Thrameos
Copy link
Contributor

Thrameos commented Feb 21, 2024 via email

@hx2A
Copy link

hx2A commented Apr 21, 2024

@Thrameos , thank you for working on this and creating a potential fix. Unfortunately I overlooked the issue notification and didn't see this until now.

@villares , you can test @Thrameos 's fix? I believe you can install it with the following command:

pip install git+https://github.com/Thrameos/jpype@windows-locale

If for some reason that doesn't work, use the wheel I created on Windows in the folder link I just DM'd you.

@villares
Copy link
Author

villares commented Apr 23, 2024

Yay! I think it works! I had to use the wheel you sent me via DM (I tried pip but it failed to build the wheel)

image

@hx2A
Copy link

hx2A commented Apr 23, 2024

Wow, excellent!

Building the wheel requires compilation, which I know on Windows is a chore.

@Thrameos , thank you so much for fixing this!

@Thrameos
Copy link
Contributor

Great I will push it out in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants