Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8285893: Hiding dialog and showing new one causes dialog to be frozen #1324

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

Maran23
Copy link
Member

@Maran23 Maran23 commented Jan 9, 2024

This PR fixes the dialog freeze problem once and for all.

This one is a bit tricky to understand, here is how it works:
This bug happens on every platform, although the implementation of nested event loops differs on every platform.
E.g. on Linux we use gtk_main and gtk_main_quit, on Windows and Mac we have an own implementation of a nested event loop (while loop), controlled by a boolean flag.

Funny enough, the reason why this bug happens is always the same: Timing.

  1. When we hide a dialog, _leaveNestedEventLoop is called.
  2. This will call native code to get out of the nested event loop, e.g. on Windows we try to break out of the while loop with a boolean flag, on Linux we call gtk_main_quit.
  3. Now, if we immediately open a new dialog, we enter a new nested event loop via _enterNestedEventLoop, as a consequence we do not break out of the while loop on Windows (the flag is set back again, the while loop is still running), and we do not return from gtk_main on Linux.
  4. And this will result in the Java code never returning and calling notifyLeftNestedEventLoop, which we need to recover the UI.

So it is actually not trivial to fix this problem, and we can not really do anything on the Java side. We may can try to wait until one more frame has run so that things will hopefully be right, but that sounds rather hacky.

I therefore analyzed, if we even need to return from _enterNestedEventLoop. Turns out, we don't need to.
There is a return value which we actually do not use (it does not have any meaning to us, other that it is used inside an assert statement).
Without the need of a return value, we also do not need to care when _enterNestedEventLoop is returning - instead we cleanup and call notifyLeftNestedEventLoop in _leaveNestedEventLoop, after the native code was called.
See below for more information. To recover the UI (and in general nested nested event loops ;) we need to set a flag for the InvokeLaterDispatcher.

Lets see if this is the right approach (for all platforms).
Testing appreciated.

  • Tested on Windows
  • Tested on Linux
  • Tested on Mac
  • Tested on iOS (although the nested event loop code is the same as for Mac) (I would appreciate if someone can do this as I have no access to an iOS device)
  • Adjust copyright
  • Write Systemtest
  • Document the new behaviour - in general, there should be more information what to expect

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8285893: Hiding dialog and showing new one causes dialog to be frozen (Bug - P3)

Contributors

  • Martin Fox <mfox@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jfx.git pull/1324/head:pull/1324
$ git checkout pull/1324

Update a local copy of the PR:
$ git checkout pull/1324
$ git pull https://git.openjdk.org/jfx.git pull/1324/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1324

View PR using the GUI difftool:
$ git pr show -t 1324

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jfx/pull/1324.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 9, 2024

👋 Welcome back mhanl! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Ready for review label Jan 9, 2024
@Maran23 Maran23 marked this pull request as draft January 9, 2024 21:48
@openjdk openjdk bot removed the rfr Ready for review label Jan 9, 2024
@Maran23 Maran23 force-pushed the JDK-8285893-dialog-freezing- branch from f3a78ce to a5a16a7 Compare February 8, 2024 18:28
@Maran23 Maran23 force-pushed the JDK-8285893-dialog-freezing- branch from a5a16a7 to 98a97b0 Compare February 8, 2024 18:31
@Maran23 Maran23 marked this pull request as ready for review February 18, 2024 19:56
@openjdk openjdk bot added the rfr Ready for review label Feb 18, 2024
@mlbridge
Copy link

mlbridge bot commented Feb 18, 2024

@kevinrushforth
Copy link
Member

Hmm. There was a reason we added a return value to _enterNestedEventLoop, but I'll take a closer look. Have you run all of the headful tests?

/reviewers 2

@openjdk
Copy link

openjdk bot commented Feb 22, 2024

@kevinrushforth
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

@kevinrushforth
Copy link
Member

@Maran23 Please merge in the latest upstream master, since your branch is out of date.

Copy link
Member

@kevinrushforth kevinrushforth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I was afraid of, this change causes many test failures (see below for just one example). I think a different approach is needed, one that preserves the synchronous nature of these methods.

NestedEventLoopTest > testCanEnterAndExitNestedEventLoop FAILED
    java.lang.AssertionError
        at org.junit.Assert.fail(Assert.java:87)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.junit.Assert.assertFalse(Assert.java:65)
        at org.junit.Assert.assertFalse(Assert.java:75)
        at test.javafx.stage.NestedEventLoopTest.lambda$testCanEnterAndExitNestedEventLoop$2(NestedEventLoopTest.java:120)
        at test.util.Util.lambda$submit$0(Util.java:108)
        at javafx.graphics@23-ea/com.sun.javafx.application.PlatformImpl.lambda$runLater$10(PlatformImpl.java:456)
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:400)
        at javafx.graphics@23-ea/com.sun.javafx.application.PlatformImpl.lambda$runLater$11(PlatformImpl.java:455)
        at javafx.graphics@23-ea/com.sun.glass.ui.InvokeLaterDispatcher$Future.run(InvokeLaterDispatcher.java:95)
        at javafx.graphics@23-ea/com.sun.glass.ui.mac.MacApplication._enterNestedEventLoopImpl(Native Method)
        at javafx.graphics@23-ea/com.sun.glass.ui.mac.MacApplication._enterNestedEventLoop(MacApplication.java:169)
        at javafx.graphics@23-ea/com.sun.glass.ui.Application.enterNestedEventLoop(Application.java:512)
        at javafx.graphics@23-ea/com.sun.glass.ui.EventLoop.enter(EventLoop.java:107)
        at javafx.graphics@23-ea/com.sun.javafx.tk.quantum.QuantumToolkit.enterNestedEventLoop(QuantumToolkit.java:656)
        at javafx.graphics@23-ea/javafx.application.Platform.enterNestedEventLoop(Platform.java:310)
        at test.javafx.stage.NestedEventLoopTest.lambda$testCanEnterAndExitNestedEventLoop$0(NestedEventLoopTest.java:112)

@kevinrushforth
Copy link
Member

@Maran23 I see from the description that you tested this on mac. How did you test it? Did you run with -PFULL_TEST=true?

@Maran23
Copy link
Member Author

Maran23 commented Feb 22, 2024

@Maran23 I see from the description that you tested this on mac. How did you test it? Did you run with -PFULL_TEST=true?

No, I checked the example in the ticket as well as the handling of dialogs in general on Mac.
Unfortuantely, I always need to ask a colleague to test my JavaFX Builds on Mac, but maybe it's time to setup a VM on my Windows machine.
Are there any recommendations I may should follow for the setup? :-)

Regarding the system tests, I checked the nested event loop ones on Windows and they work for me.

I just pushed a change which may fixes this for Mac as well.
The change is also logical and is synchronized with the other changes I made to nested event loops.
Will ask my colleague as well to test my changes again + run the most obvious (related) system tests on Mac.

I also will do a much bigger testing soon, I have an application that relies heavily on nested event loops for blocking the UI while doing background work.

@kevinrushforth
Copy link
Member

maybe it's time to setup a VM on my Windows machine. Are there any recommendations I may should follow for the setup? :-)

I don't know of any supported way to run macOS in a VM on a Windows host, but perhaps others will be able to help.

Regarding the system tests, I checked the nested event loop ones on Windows and they work for me.

It worked for me on Windows, too (and on Linux in a headful test run). I would want to make sure that we know why they still work on Windows and Linux and fail on Mac, but I only saw the failures on Mac.

I just pushed a change which may fixes this for Mac as well. The change is also logical and is synchronized with the other changes I made to nested event loops. Will ask my colleague as well to test my changes again + run the most obvious (related) system tests on Mac.

I can also fire off a headful test run this afternoon, and post results tomorrow.

I also will do a much bigger testing soon, I have an application that relies heavily on nested event loops for blocking the UI while doing background work.

The results of that testing would be good information.

@kevinrushforth
Copy link
Member

I see the same failure with the most recent patch as I did yesterday. I looked over the changes, and I'm not convinced that this is the right approach.

One comment on your proposal:

we also do not need to care when _enterNestedEventLoop is returning - instead we cleanup and call notifyLeftNestedEventLoop in _leaveNestedEventLoop, after the native code was called.

This changes one of the fundamental aspects of entering and exiting a nested event loop. This will need a lot more analysis to show why this is the right approach.

@Maran23
Copy link
Member Author

Maran23 commented Feb 24, 2024

This changes one of the fundamental aspects of entering and exiting a nested event loop. This will need a lot more analysis to show why this is the right approach.

Yes, this is right and I also thought about documenting that as soon as we think that this is the right approach.
I completely agree that we need to analyse this. This is not a simple change but will affect Dialogs and every custom code that uses nested event loops (e.g. I used it for blocking the UI without freezing it for data loading purposes, where we want to wait until the code returns in a non UI blicking way).

Maybe there is a better way as well, I just don't figured out one.
As described, right now we rely that enterNestedEventLoop returns always, which is not happening immediately for ALL platforms, even though Platform has another implementation. Which makes me think the current approach has a flaw, resulting in this behaviour.

I see the same failure with the most recent patch as I did yesterday. I looked over the changes, and I'm not convinced that this is the right approach.

Thanks for testing. As soon as my colleague is available, I will debug this issue. I don't get why only MacOS is failing, since the low level code is pretty much the same, we just continue right after we left the nested event loop.
I agree that we should find out the root cause why MacOS behaves different.

I don't know of any supported way to run macOS in a VM on a Windows host, but perhaps others will be able to help.

That was also my understanding the last time I checked. AFAIK, there is no official way from Apple, but VirtualBox may offer that. Will check as soon as I have more time.

@beldenfox
Copy link
Contributor

The Mac is failing on the simplest NestedEventLoop test, the one that verifies that we can enter and exit a nested loop and correctly retrieve the return value. I started looking into this but find what's happening in Glass a bit confusing. I'll take a closer look in the next day or two.

I don't understand the original failure case. Events aren't blocked and you can type into the modal's text field, it just doesn't repaint. If you resize the window you will see your edits.

@beldenfox
Copy link
Contributor

I don't think this is a problem with the nested event loop bookkeeping. It looks like a much simpler bug in the invokeLaterDispatcher.

When exitNestedEventLoop is called on the innermost loop the invokeLaterDispatcher suspends operation until the loop finishes. But if you immediately start a new event loop the previous one won't finish and the dispatcher will be wedged. If you're already in a nested loop you can get into the wedged state with just two lines of code:

exitNestedEventLoop(innermost, retVal); enterNestedEventLoop(newLoop);

When the invokeLaterDispatcher is told that the innermost loop is exiting it currently sets leavingNestedEventLoop to true. When the dispatcher is told that a new event loop has started it is not clearing leavingNestedEventLoop but it should. Basically it should follow the same logic used in glass; leaving the innermost loop updates a boolean indicating that the loop should exit but if a new loop is started the boolean is set back to a running state since it now applies to the new loop, not the previous one.

@Maran23
Copy link
Member Author

Maran23 commented Feb 28, 2024

I don't think this is a problem with the nested event loop bookkeeping. It looks like a much simpler bug in the invokeLaterDispatcher.

Well, yes and no. My testing was showing that notifyLeftNestedEventLoop was not even called in the finally block, so there is that. We may still have a problem at the InvokeLaterDispatcher, although not 100% sure since it worked good for me on Windows.
But I will test it more as soon as I have more time, since I want to really check everything, so I do not miss anything here.

@Maran23
Copy link
Member Author

Maran23 commented Mar 1, 2024

So I could finally look into the issue. Unfortunately, the problem is much deeper than I thought.
Consider the following testcase:

Platform.runLater(() -> {
    System.out.println("exit1");
    Platform.exitNestedEventLoop(this, "123");
    
    Platform.runLater(() -> {
        System.out.println("exit2");
    
        Platform.exitNestedEventLoop("this", "456");
    });
    System.out.println("enter2");
    Object o = Platform.enterNestedEventLoop("this");
    System.out.println(o);
    });
System.out.println("enter1");
Object o = Platform.enterNestedEventLoop(this);
System.out.println(o);

What I would expect:

enter1
exit1
123
enter2
exit2
456

New behaviour:

enter1
exit1
enter2
exit2
456

Old behaviour:

enter1
exit1
enter2

Application hangs (Makes sense, as it is the same logic as if we hide a dialog and show a new one)

So it is better now, since we do not stall the application anymore, but we never return back from the first EventLoop, since the native code did not return. And I would like to fix this completely.

Maybe, the approach needs to be a little bit different.
Something like: If we enter a nested event loop WHILE we are exiting one, we should somewhat wait for it to finish.
That is a pretty though one. Ideas welcome.

@beldenfox
Copy link
Contributor

The first runLater block is being invoked by the first nested event loop; that loop is on the call stack. In that block you call exitNestedEventLoop but don't allow the stack to unwind before calling enterNestedEventLoop. So on the stack the new event loop is indeed nested inside the previous one (the previous one is marked as LEAVING but it's still doing something). Since this is a stack there's only one way to unwind it which is in reverse order. The output I would expect would be:

enter1
exit1
enter2
exit2
456
123

There is that bug in the invokeLaterDispatcher which will prevent the second runLater block from executing. And the documentation doesn't really address this case but it should.

By the way, when I grab the code in the original bug report and run it agains this PR the innermost modal is not blocked and everything seems fine. But if I press the Close button exceptions start getting thrown. This is happening on both Mac and Windows.

@Maran23
Copy link
Member Author

Maran23 commented Mar 1, 2024

The output I would expect would be:

enter1
exit1
enter2
exit2
456
123

You are right, this sounds correct.
EDIT: Just thinking about it again, we technically exited the first loop, so not 100% sure.

By the way, when I grab the code in the original bug report and run it agains this PR the innermost modal is not blocked and everything seems fine. But if I press the Close button exceptions start getting thrown. This is happening on both Mac and Windows.

Can you post the exception? I may know which one and I already fixed one locally, testing it currently.
I still don't have an idea how to solve the problem mentioned above, since the Application.enterNestedEventLoop() is not returning.

Technically speaking, we have 2 issues here.

  • Application.enterNestedEventLoop() is not returning
  • InvokeLaterDispatcher mechanism is not perfect as it can lead to stalling the UI.

@beldenfox
Copy link
Contributor

The top of the call stack when the exception is thrown:

Exception in thread "JavaFX Application Thread" java.lang.IllegalStateException: Not in a nested event loop
	at javafx.graphics@23-internal/com.sun.glass.ui.Application.leaveNestedEventLoop(Application.java:534)
	at javafx.graphics@23-internal/com.sun.glass.ui.EventLoop.lambda$enter$0(EventLoop.java:122)
	at javafx.graphics@23-internal/com.sun.glass.ui.InvokeLaterDispatcher$Future.run(InvokeLaterDispatcher.java:95)
	at javafx.graphics@23-internal/com.sun.glass.ui.mac.MacApplication._enterNestedEventLoopImpl(Native Method)
	at javafx.graphics@23-internal/com.sun.glass.ui.mac.MacApplication._enterNestedEventLoop(MacApplication.java:169)
	at javafx.graphics@23-internal/com.sun.glass.ui.Application.enterNestedEventLoop(Application.java:511)
	at javafx.graphics@23-internal/com.sun.glass.ui.EventLoop.enter(EventLoop.java:107)
	at javafx.graphics@23-internal/com.sun.javafx.tk.quantum.QuantumToolkit.enterNestedEventLoop(QuantumToolkit.java:656)
	at javafx.graphics@23-internal/javafx.stage.Stage.showAndWait(Stage.java:469)
	at ApplicationModalSampleApp2$MainFx.showModal(ApplicationModalSampleApp2.java:42)
	at ApplicationModalSampleApp2$MainFx.lambda$start$0(ApplicationModalSampleApp2.java:25)
	at javafx.base@23-internal/com.sun.javafx.event.CompositeEventHandler.dispatchBubblingEvent(CompositeEventHandler.java:86)
	at javafx.base@23-internal/com.sun.javafx.event.EventHandlerManager.dispatchBubblingEvent(EventHandlerManager.java:232)
	at javafx.base@23-internal/com.sun.javafx.event.EventHandlerManager.dispatchBubblingEvent(EventHandlerManager.java:189)
	at javafx.base@23-internal/com.sun.javafx.event.CompositeEventDispatcher.dispatchBubblingEvent(CompositeEventDispatcher.java:59)
	at javafx.base@23-internal/com.sun.javafx.event.BasicEventDispatcher.dispatchEvent(BasicEventDispatcher.java:58)
	at javafx.base@23-internal/com.sun.javafx.event.EventDispatchChainImpl.dispatchEvent(EventDispatchChainImpl.java:114)
	at javafx.base@23-internal/com.sun.javafx.event.BasicEventDispatcher.dispatchEvent(BasicEventDispatcher.java:56)
	at javafx.base@23-internal/com.sun.javafx.event.EventDispatchChainImpl.dispatchEvent(EventDispatchChainImpl.java:114)
	at javafx.base@23-internal/com.sun.javafx.event.BasicEventDispatcher.dispatchEvent(BasicEventDispatcher.java:56)
	at javafx.base@23-internal/com.sun.javafx.event.EventDispatchChainImpl.dispatchEvent(EventDispatchChainImpl.java:114)
	at javafx.base@23-internal/com.sun.javafx.event.EventUtil.fireEventImpl(EventUtil.java:74)
	at javafx.base@23-internal/com.sun.javafx.event.EventUtil.fireEvent(EventUtil.java:49)

@Maran23
Copy link
Member Author

Maran23 commented Mar 4, 2024

The top of the call stack when the exception is thrown:

Yeah, had the same issue and fixed it locally. Now pushed. Problem as stated above remains though

@Maran23 Maran23 force-pushed the JDK-8285893-dialog-freezing- branch from 508e256 to 0bcda3a Compare March 4, 2024 19:12
@openjdk
Copy link

openjdk bot commented Mar 4, 2024

@Maran23 Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

@Maran23
Copy link
Member Author

Maran23 commented Mar 5, 2024

I may found a solution which fixes the dialog bug and passes my test above.
It behaves exactly like @beldenfox said.

enter1
exit1
enter2
exit2
456
123

@beldenfox
Copy link
Contributor

The Mac is still failing the NestedEventLoop test.

At the top of InvokeLaterDispatcher.java there's a long explanation on how it's trying to coordinate with the event loop. To do this it needs to have an accurate picture of what's happening at the platform level. In this PR you're telling the dispatcher that an event loop has exited before the platform has actually exited the loop (before enterNestedEventLoop has returned). This is causing the dispatcher to send runLater runnables to the platform earlier than expected. On Windows the runnables will get deferred to the next event loop cycle but on Mac the runnables might get executed in the current event loop cycle. I think this is one of the "native system limitations" mentioned in the comment in InvokeLaterDispatcher.java.

I've created a branch with my proposed fix for this problem (https://github.com/beldenfox/jfx/tree/eventloopjam). The fix prevents the dispatcher from jamming when the event loop it thought was leaving enters a new loop instead. Over in the Mac Glass code I also added a comment with a few more details on what's going on.

@Maran23
Copy link
Member Author

Maran23 commented Mar 7, 2024

The Mac is still failing the NestedEventLoop test.

Thanks for testing this on Mac. I always need to ask someone in order to test the changes, so really appreciating that.

I've created a branch with my proposed fix for this problem (https://github.com/beldenfox/jfx/tree/eventloopjam). The fix prevents the dispatcher from jamming when the event loop it thought was leaving enters a new loop instead. Over in the Mac Glass code I also added a comment with a few more details on what's going on.

I tested your code and it looks good! So this seems to be the better approach here. While mine works for Windows and maybe Linux, it does not for Mac as you also specified.

The return value of the native nested event loops are still not needed, so we can still think about keeping this change, but we should probably revert the EventLoop changes and take your fix instead.

@Maran23
Copy link
Member Author

Maran23 commented Mar 7, 2024

/contributor add @beldenfox

Thanks! I feel like we are coming closer to the fix of this problem. I hopefully have time to test this much more tomorrow - I have at least "booked" myself a time slot for JavaFX. 😄

@openjdk
Copy link

openjdk bot commented Mar 7, 2024

@Maran23
Contributor Martin Fox <mfox@openjdk.org> successfully added.

@openjdk
Copy link

openjdk bot commented Mar 13, 2024

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot changed the title JDK-8285893: Hiding dialog and showing new one causes dialog to be frozen 8285893: Hiding dialog and showing new one causes dialog to be frozen Apr 10, 2024
@bridgekeeper
Copy link

bridgekeeper bot commented Apr 10, 2024

@Maran23 This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@Undecium
Copy link

Undecium commented May 2, 2024

Please don't close this pull request. I really need this fix.

@beldenfox
Copy link
Contributor

I just submitted my proposed fix for this issue as PR #1449.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfr Ready for review
4 participants