Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advice: Threading and JPype - JVM never ends #1169

Open
pelson opened this issue Jan 25, 2024 · 10 comments
Open

Advice: Threading and JPype - JVM never ends #1169

pelson opened this issue Jan 25, 2024 · 10 comments

Comments

@pelson
Copy link
Contributor

pelson commented Jan 25, 2024

I've read the docs on threading and JPype https://jpype.readthedocs.io/en/latest/userguide.html#threading. In a context where the thread can do arbitrary work (e.g. it is in a pool, or runs an asyncio loop) it isn't clear how to know when to call java.lang.Thread.detach().

To give the most basic example:

import threading

def main():
    import jpype as jp
    jp.startJVM()

t = threading.Thread(target=main, name='jvm-starter')
t.start()

The result will be a never-ending process, even though the main thread finishes the program, and the jvm-starter thread has ended.

As documented, adding the jp.java.lang.Thread.detach() call in the main function results in the process ending correctly. The problem is that it is not always obvious when to actually make this call. Take the following example:

import multiprocessing.pool
import threading

def main():
    import jpype as jp
    if not jp.isJVMStarted():
        jp.startJVM()
    # We should be detaching here

t = multiprocessing.pool.ThreadPool(1)
f1 = t.apply(main)
f2 = t.apply(main)

t.close()
t.join()

In this example, the detach call is fairly easy to do, but in general JPype may be very deeply nested, and the underlying use might not even know that it is going to be called outside of the main thread.

Therefore, is it the case that wherever you use JPype, you must also always call detach when you are done in order for your code to reasonably work in a thread other than the main one?

This behaviour changed a few versions ago (looks like between 1.3 and 1.4) - in the past, I understood that this was automatically handled (and can imagine it was flaky, buggy, and costly).
Some historical docs on threading behaviour https://jpype.sourceforge.net/doc/user-guide/userguide.html#python_threads.

This could also be the problem encountered in #996, though that particular issue lacks sufficient detail to reproduce / know.

What I'm looking for: canonical advice on threading and JPype. Is it really the case that every use of JPype should be tailed by a detach in order to ensure that the process ends cleanly? If not, I would be happy to enable a tracing build to track down what it is that is keeping the process alive.

@Thrameos
Copy link
Contributor

Thrameos commented Jan 25, 2024 via email

@pelson
Copy link
Contributor Author

pelson commented Feb 2, 2024

Thanks for all this context - really valuable!

If the thread ends and it wasn’t detached, then it becomes a memory leak. The prevent shutdown can be changed by attaching as daemon.

If you are creating threads and Java is not shutting down, then you need to call attach as daemon when the thread is first created.

Upon re-reading the docs (source):

Rather that crashing randomly, JPype automatically attachs[sic] any
thread that invokes a Java method. These threads are attached automatically as
daemon threads so that will not prevent the JVM from shutting down properly
upon request.

I can see also that the docstring for Thread.attachAsDaemon() is consistent here:

JPype automatically attaches any threads that call Java resources as daemon threads.

I am therefore quite confused...

import multiprocessing.pool
import threading

def main():
    import jpype as jp
    if not jp.isJVMStarted():
        jp.startJVM()
        jp.java.lang.Thread.detach()
        jp.java.lang.Thread.attachAsDaemon()

t = multiprocessing.pool.ThreadPool(1)
f1 = t.apply(main)
f2 = t.apply(main)

t.close()
t.join()

Now does the right thing (seemingly), and exits nicely. This is consistent with your advice 👍.

However, if I'm detaching, then proceed to attach as daemon, why does this make a difference at all if the docs are correct and we were already attached as daemon? Are the docs wrong on this?

@Thrameos
Copy link
Contributor

Thrameos commented Feb 2, 2024 via email

@pelson
Copy link
Contributor Author

pelson commented Feb 5, 2024

I think the port of confusion is the any thread “other than main” is attached as daemon automatically

In the example above, I was able to see a different behaviour on non-main thread depending on whether I run detach() followed by attachAsDaemon():

import time
import threading

def main():
    import jpype as jp
    jp.startJVM()
    time.sleep(1)
    print('started')


t = threading.Thread(target=main, name='jvm-starter')
t.start()
t.join()
print('done')

That doesn't exit cleanly, whereas the following does:

import time
import threading

def main():
    import jpype as jp
    jp.startJVM()
    jp.java.lang.Thread.detach()
    jp.java.lang.Thread.attachAsDaemon()
    time.sleep(1)
    print('started')


t = threading.Thread(target=main, name='jvm-starter')
t.start()
t.join()
print('done')

Using all that you've told me, this strongly suggests that the non-main thread is being attached as a user/non-daemon thread. (I checked whether making the Python thread daemon or not makes a difference, and it doesn't). In contrast, there appears to be no impact of attach / attachAsDaemon on the main thread (it always exits "cleanly").

Just in case it matters, this is openjdk version "11.0.13" 2021-10-19.

@pelson
Copy link
Contributor Author

pelson commented Feb 7, 2024

I was just writing some tests for this, and was doing so via a subprocess. Turns out that the behaviour changes if it is forked vs spawned:

def main():
    import threading

    def main():
        import jpype as jp
        jp.startJVM()
        print('started JVM')

    t = threading.Thread(target=main, name='jvm-starter')
    t.start()
    print('finished')


if __name__ == '__main__':
    # main()
    from multiprocessing import Process, set_start_method
    set_start_method('fork')
    p = Process(target=main, )
    p.start()
    p.join()

spawn blocks, whereas fork doesn't.

I don't think this is particularly important, but it is interesting (and I couldn't justify the behaviour to myself).

I note that the method by default on OSX is spawn since Python 3.8, just in case there is an implication for your commentary above.

@Thrameos
Copy link
Contributor

Thrameos commented Feb 8, 2024

I believe it was mentiined somewhere in the docs that the JVM does not handle fork well.

@pelson
Copy link
Contributor Author

pelson commented Feb 10, 2024

I believe it was mentiined somewhere in the docs that the JVM does not handle fork well.

Yes, at

processes created with ``fork``. Forks copy all memory including the JVM. The
("JPype cannot be used with processes created with fork"), but it actually works in some context as discussed in #1024.

In the context of this discussion though, the only difference between spawn vs fork is that one results in a non-daemon JVM, whereas the other exits as expected. It was just an observation (mostly in case it results in a clearer understanding of what is going on).

Using all that you've told me, this strongly suggests that the non-main thread is being attached as a user/non-daemon thread.

It is this point that I would value your input on - this is not expected based on what you've said. Is this a bug in JPype, or is it a detail of JNI? Would there be any reason not to attach as daemon when starting the JVM (as seems to be the case on the main thread)?

@Thrameos
Copy link
Contributor

These are by design parts of JNI. Java was designed such that is free to start shutdown once the last user thread is closed. Starting the shutdown has serios implications as to what actions the JVM can take. So by design the JVM forces an attach to the thread that starts it. Which in the case of a fork when gets yanked from it leads to unexpected behavior.

It certainly would not be a good idea for JPype to by default attach the main thread as deamon as this is really undefined behavior. Unless there is some documentation stating this is a supported action, you are at the mercy of the JVM implementation and we cant possibly test them all.
Though I doubt any of them will spawn a copy of nethack, we cant guarantee it.

Because it is by design part of the JVM the best we can do is make some other thread the main thread for Java and have an atexit call on the real main send a signal to let the java main proceed to shutdown.

Does that help?

@Thrameos
Copy link
Contributor

Thrameos commented Feb 10, 2024

Btw... this isn't the first time the by design parts of JNI/JVM have been a problem. By design the launching thread has no context in the Java callback system, thus no module id. When they made it a requirement that all priviliged operations check the modules caller id, it literally broke JPype and every other code that operates a slaved jvm. This still isn't fixed so we reroute our calls through a redirector in org.jpype jar. They could allow the main thread to declare it module and attachment type, but they haven't put much thought into JNI since JVM 1.5.

Thus the state of JNI being what it is is likely just an oversight as to how threading and the main thread operates with the JVM being a slave. They haven't focused on it, don't test it, and barely support it as the Java launched from shell is their main usage.

@Thrameos
Copy link
Contributor

I looked in to this further. The JVM requires the main thread be user not daemon. Their intent is for the JVM to be launched on some thread, spawn one or more user threads, then call DestroyJVM on that original thread. We can do the same thing by spawning on a side thread, attaching on the Python main, then calling DestroyJVM, on the side.

It wpuld seem like calling DestroyJVM is dangerous, but in fact reading the docs makes it clear the DestroyJVM is actually a wait statement.

I made an attempt to make a binary jpython to be shipped out with JPype which will perform the proper sequence plus the visual thread for mac. Unfortunately I ran into bootloader problems with the module security in newer Java that will be difficult to overcome. The only sucess I had was making a receipe for making executables via setuptools. I will give it another go the next time slot that opens up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants