Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Critical nw2 regression: Major (10x) slowdown post-0.42.3 (where nw2 is enabled by default--disabling it restores original performance) #7979

Open
2 of 3 tasks
pd-l2ork opened this issue Oct 12, 2022 · 29 comments
Assignees

Comments

@pd-l2ork
Copy link

pd-l2ork commented Oct 12, 2022

Issue Type

Before opening an issue, please search and see if it has already been raised.

  • Bug Report

  • Feature Request

  • Successfully reproduced against the latest version of NW.js?

Current/Missing Behavior

Most versions (tried everything between 0.42.4 up to 0.70.1, sometimes skipping a few versions) are significantly (10 or more times) slower than 0.42.3 and lower versions when drawing a busy SVG-based canvas using scripted events. Evidence suggests this is due to nw2 features being enabled by default starting with 0.42.4 onward. Disabling nw2 on newer versions makes performance comparable.

Expected/Proposed Behavior

Speed of running scripted events that populate the canvas should not be significantly different.

Additional Info

  • Operating System: any (tested on Linux, OSX, and Windows)
  • NW.js Version: 0.42.4 and newer have problems (0.43.0-beta1 being exception as it was released in proximity to 0.42.3)
  • Repro Link:
    Best thing is to download my software (Pd-L2Ork) ideally on Windows or Linux where swapping nw.js versions is a matter of replacing a single folder inside installed application's folder. The app already ships with nw.js 0.42.3.

You can download it from: http://l2ork.music.vt.edu/main/make-your-own-l2ork/software/ (click on the "Complete (a.k.a. ‘Burrito Supreme’) Installer" section to expand it and download a version for the appropriate platform)

Replacing nw version: On Linux nw version is located in /usr/lib/pd-l2ork/bin/nw folder. On Windows in C:\Program Files (x86)\Pd-L2Ork\bin\nw. Mac is difficult since the files are all mixed, so I discourage testing on that one, even though results are the same. Just replace it with SDK (not runtime) of any other version by renaming that version's folder to nw and replacing this one. Then restart the app.

To open something graphically intense: start Pd-L2Ork (pd-l2ork on command line or double-click on the shortcut). Help (rightmost item in the menubar)->Help Browser->scroll to the bottom and click on L2Ork Tweeter and then again on L2Ork Tweeter (right below the CHANGELOG.txt).

Observe the time it takes to open and populate the window.

As an alternative, one could probably build a simple scripted example that populates the canvas with svg shapes (not sure if the slowdown is tied to strictly SVG, however) and run the same with the latest nw.js with nw2 enabled and disabled and observe the perfromance difference.

  • Code snippet: N/A (I provided in the comments the SVG snapshot of the page, but that does not make the difference--the real difference is scripting that now takes way longer using nw2 than when using nw1)
  • Crash report:No crash, just a major slowdown.
@pd-l2ork
Copy link
Author

Linked are index.html and package.json that is essentially the snapshot of how the screen looks when it loads.

https://l2ork.music.vt.edu/data/pd-l2ork/test.tar.gz

You can start the same using 0.42.3 and compare it to newer versions.

@pd-l2ork
Copy link
Author

pd-l2ork commented Oct 13, 2022

The difference is less obvious since there is no background communication (there is a lot of network code going back and forth to draw all this, but it is all identical in both cases). Still you can see the difference. I also spotted a lot of ssl errors on 0.42.3 being printed out in the console, although they do not appear to be causing any problems program-wise as far as I can tell. Could it be that the 0.42.4 and onward use ssl layer for network proxy? If so, could this be the bottleneck?

@ayushmanchhabra
Copy link
Contributor

I'm seeing a black screen for v0.42.3 and v0.42.4 on Ubuntu 20. Not sure how to proceed

@pd-l2ork
Copy link
Author

I'm seeing a black screen for v0.42.3 and v0.42.4 on Ubuntu 20. Not sure how to proceed

@ayushmXn if you are running 0.42.4 or 0.42.5 (possibly also 0.42.6) on Linux. It indeed is broken and does not show anything. 0.42.3, however, should work just fine. You can also try 0.69.1.

@ayushmanchhabra
Copy link
Contributor

Could you send a screenshot of what a working example looks like? I'll try across different versions.

@ayushmanchhabra
Copy link
Contributor

ayushmanchhabra commented Oct 16, 2022

v0.42.3 does not work for me. Could be a missing shared library which would explain why it works for you.

[123138:123138:1015/205713.710040:ERROR:sandbox_linux.cc(371)] InitializeSandbox() called with multiple threads in process gpu-process.
[123138:123138:1015/205713.969746:ERROR:buffer_manager.cc(488)] [.DisplayCompositor]GL ERROR :GL_INVALID_OPERATION : glBufferData: <- error from previous GL command

Also wasn't able to get it working for a bunch of versions from 0.36...0.69.1.

@ayushmanchhabra ayushmanchhabra added bug has-min-repro Has a minimum reproduction labels Oct 16, 2022
@pd-l2ork
Copy link
Author

Can you try 0.42.2 or earlier? Just about any lower works fine.

@pd-l2ork
Copy link
Author

pd-l2ork commented Oct 16, 2022

Here's how the window should look. This is on Win10 with 0.42.3. Running nw . (from the folder where the two files provided in the aforesaid tarball are located). Fonts are off, but should not affect the outcome of the test.

tweeter

@ayushmanchhabra
Copy link
Contributor

ayushmanchhabra commented Oct 16, 2022

I was able to reproduce! There were quite a few changes made in between 0.42.3 and 0.42.4. I'll try to look into it.. Based on what I've seen, I may have a temp workaround. You could have 2 windows - one which shows a temporary splash screen / circular progress on start up until the main app is loaded.

Putting down the diff for my reference:
nw diff: b8c5658...5c1c6cf
v8 diff: nwjs/v8@00f437a...76eb2bb

@pd-l2ork
Copy link
Author

pd-l2ork commented Oct 16, 2022

@ayushmXn great! Please bear in mind that since 0.42.4-0.42.6 are effectively broken (do not display anything on my side), the slowdown may have been introduced post-0.42.6 (since there is no way to confirm that the problem is with 0.42.4, .5 or .6). 0.43.5 was the next one I was able to test successfully (although both 0.43.5 and 0.43.6 segfault on Linux when opening dev tools). 0.44.0 is the first "stable" iteration that does not have any major issues other than the slowdown.

@pd-l2ork
Copy link
Author

I was able to reproduce! There were quite a few changes made in between 0.42.3 and 0.42.4. I'll try to look into it.. Based on what I've seen, I may have a temp workaround. You could have 2 windows - one which shows a temporary splash screen / circular progress on start up until the main app is loaded.

Putting down the diff for my reference: nw diff: b8c5658...5c1c6cf v8 diff: nwjs/v8@00f437a...76eb2bb

How would a start-up screen solve the slowness? Or, would this be simply a distraction from the slowness? If so, that will not cut it, because this is a visual programming language and users can create things of varying complexity. While the aforesaid example may be seen as a more complex example, even more complex ones are certainly possible.

@ayushmanchhabra
Copy link
Contributor

How would a start-up screen solve the slowness? Or, would this be simply a distraction from the slowness? If so, that will not cut it, because this is a visual programming language and users can create things of varying complexity. While the aforesaid example may be seen as a more complex example, even more complex ones are certainly possible.

AHh gotcha. I realise I misunderstood the problem when I replied above. Just reread the issue description:

Most versions (tried everything between 0.42.4 up to 0.69.1, sometimes skipping a few versions) are significantly (10 or more times) slower than 0.42.3 and lower versions when drawing a busy SVG-based canvas.

@ayushmanchhabra
Copy link
Contributor

Could you send over a complex example of a *.pd file so I can observe the slowdown? I currently have a NW build running on my local machine which will probably take the whole of today to complete. I should be able to debug this during the coming weekend.

@pd-l2ork
Copy link
Author

If you install pd-l2ork as per my first post (it has a deb installer) and follow instructions provided there (also copied below), it will open the very thing that I reduced to that html file above. And there the difference is dramatic:

You can download it from: http://l2ork.music.vt.edu/main/make-your-own-l2ork/software/ (click on the "Complete (a.k.a. ‘Burrito Supreme’) Installer" section to expand it and download a version for the appropriate platform).

Replacing nw version: On Linux nw version is located in /usr/lib/pd-l2ork/bin/nw folder. On Windows in C:\Program Files (x86)\Pd-L2Ork\bin\nw. Mac is difficult since the files are all mixed, so I discourage testing on that one, even though results are the same. Just replace it with SDK (not runtime) of any other version by renaming that version's folder to nw and replacing this one. Then restart the app.

To open something graphically intense: start Pd-L2Ork (pd-l2ork on command line or double-click on the shortcut). Go to Help (rightmost item in the menubar)->Help Browser->scroll to the bottom and click on L2Ork Tweeter and then again on L2Ork Tweeter (right below the CHANGELOG.txt).

@pd-l2ork
Copy link
Author

pd-l2ork commented Nov 7, 2022

@tharatau I was able to change the way my app works to be able to record the profile inside dev tools before the patch opens. Below are 2 profile files that I was able to save, one for 0.42.3 and one for 0.70.1:

https://l2ork.music.vt.edu/data/pd-l2ork/Profile-20221106T222201-0.42.3.json
https://l2ork.music.vt.edu/data/pd-l2ork/Profile-20221106T221257-0.70.1.json

Both run exactly the same code on my end. The only difference is the use of a different nw.js version (SDK in both cases).

I would really appreciate your assistance at dissecting what is causing these major slowdowns as I have zero experience with the profile option and this is currently a major issue preventing me from fixing crashes on OSX on 0.42.3 and earlier by upgrading to 0.60.1 or newer that fix this issue because anything post 0.42.3 that does run without crashing is ~10 times slower than older versions. Thank you.

P.S. Forgot to add, happy to run other profiling options, if needed. Also, as you can see just from the file size, while 0.42.3 has generated 11K lines in the profile file, the 0.70.1 has over 152K (!). No wonder the second is so much slower... Now, onto the questions why and, once the source is located, is there a way to make 0.70.1. run as efficiently as 0.42.3?

@pd-l2ork
Copy link
Author

pd-l2ork commented Nov 7, 2022

@tharatau -- another update, having looked at the profile tab in the dev tools, it appears the major difference stems from the "scripting" aspect. This is the one that makes this huge difference in load times between the two. Whereas 0.42.3 takes barely a fraction of a second to process all the scripting, on 0.70.1 the same takes up 9 seconds (!). Yet, both versions use the same exact scripts and therefore the number of calls, at least from pd-l2ork's perspective. Not sure what else nw.js does on its end that has made such a profound difference...

@pd-l2ork
Copy link
Author

pd-l2ork commented Nov 7, 2022

@tharatau -- yet another update. 0.43.0-beta1 is still fast. 0.43.0-0.43.4 I cannot run because those versions have a bug with running eval functions. 0.43.5 runs (although opening dev tools crash nwjs) and is slow. So, whatever slowed things down, has happened between 0.43.0-beta1 and 0.43.x (where x is 5 or less). Unfortunately both 0.43.5 and 0.43.6 segfault when opening dev tools (at least on Linux), so the earliest profile comparison may have to be between 0.43.0-beta1 and possibly 0.44.0 which is the closest next version that can be used to compare things using dev tools and does not have the eval issue that prevents the patch from loading.

@pd-l2ork
Copy link
Author

pd-l2ork commented Nov 7, 2022

@tharatau Found it! While before doing this would generate an error, now it finally works. If I disable nw2 by passing the argument --disable-features=nw2 using 0.70.1, the slowdown is gone. So, this is clearly a part of the nw2 introduction. Now onto figuring out what is causing such colossal slowdown (and this is also where I need to step back because I have no knowledge of the nw1 vs nw2)...

@ayushmanchhabra
Copy link
Contributor

ayushmanchhabra commented Nov 7, 2022

@pd-l2ork Apologies for the late response. That's awesome. I'm not too sure how nw1 differs from nw2 also.

cc: #7230 @rogerwang

@rogerwang rogerwang self-assigned this Nov 7, 2022
@pd-l2ork
Copy link
Author

pd-l2ork commented Nov 8, 2022

@tharatau @rogerwang May I suggest that you do not obsolete nw1 until this major slowdown is resolved because it is significant (10x slower in nw2). Thank you for your consideration.

@pd-l2ork pd-l2ork changed the title Major slowdown post 0.42.3 Critical nw2 regression: Major (10x) slowdown post-0.42.3 (where nw2 is enabled by default--disabling it restores original performance) Nov 8, 2022
@pd-l2ork
Copy link
Author

pd-l2ork commented Nov 9, 2022

@rogerwang to simplify troubleshooting, you can also simply use nw.js 0.67.1 (newer versions have broken window close using nw1, see #7981 ) and change the flag in the package.json that enables or disables nw2 to observe the dramatic change in performance when loading the aforesaid L2Ork Tweeter patch. Please let me know what OS you run and I will make you a build of pd-l2ork for your platform to easily troubleshoot this.

Folder locations for different OS installs:
Linux: /usr/lib/pd-l2ork/bin/package.json
OSX: /Applications/Pd-L2Ork.app/Contents/Resources/app.nw/package.json
Windows: C:/Program Files (x86)/Pd-L2Ork/bin/package.json

@pd-l2ork
Copy link
Author

pd-l2ork commented Nov 9, 2022

@tharatau please add the nw2 label to this issue. Thanks.

@rogerwang
Copy link
Member

rogerwang commented Nov 9, 2022

I tried your sample under nw1 and nw2 with the latest version, but I don't notice any difference...

@pd-l2ork
Copy link
Author

pd-l2ork commented Nov 9, 2022

I tried your sample under nw1 and nw2 with the latest version, but I don't notice any difference...

@rogerwang that is because it is a scripting performance issue (not a drawing performance issue) as noted in my comment above. Inside the app all these are procedurally created using scripting whereas the min repro that was requested failed to do this because it is all predrawn (as in a static webpage). You should try the pd-l2ork app instead. See the two profile files I uploaded for comparison.

@pd-l2ork
Copy link
Author

pd-l2ork commented Nov 9, 2022

@rogerwang what OS are you using? I can prepare a version of the app that is easy to test and show the slowdown, including the ability to profile...

@rogerwang
Copy link
Member

Please provide a minimal case to reproduce, not a whole application...

@rogerwang rogerwang removed the has-min-repro Has a minimum reproduction label Nov 9, 2022
@pd-l2ork
Copy link
Author

pd-l2ork commented Nov 9, 2022

@rogerwang this is a complex application that does pretty much everything nwjs can, from networking, to scripting, drawing, creating windows, and interacting. It clearly exhibits a huge slowdown when everything remains the same except for toggling the nw2 feature. How would I go about narrowing down the source of the problem? Is there a way to use profiling tool to be able to do this?

@rogerwang
Copy link
Member

You can try using the devtools provided by Chromium, and insert code in various steps in your application to monitor the performance e.g. printing timestamps.

@pd-l2ork
Copy link
Author

pd-l2ork commented Nov 17, 2022

@rogerwang so, I added bunch of timestamps in hope of identifying one major slowdown. Instead, I got everything inside js files executing slower in nw2 mode than nw1. When there are a few calls, this is not noticeable, but as the number of calls increases, the overall execution time deviates. As an example, below are two files executing same select functions with the only difference being one disabling nw2 in the package.json file. I have not timestamped all functions (that would take quite an effort given the code is thousands of lines long), but this gives a nice cross-section. The log files insert =======\n at any point where there is a jump larger than 10ms. Every start of the function is prepended by a ">". Note that some functions are nested in others. After executing little over 450 functions, nw2 is at 3848ms whereas nw1 is at 1484ms. All times shown are in ms. Both are using 0.67.1 due to #7981 bug in nw1 mode in newer versions which makes the comparison difficult. That said, the behavior is exactly the same on any version that has nw2 enabled by default (IIRC 0.42.4 onward).

Is there a chance that there is some debugging or logging stuff running somewhere in the background in the nw2 mode that is causing this trickling slowdown across the board?

nw2-disabled-log
nw2-enabled-log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants