Improve path cache generation performance and accuracy, and add prepend mounting #93

WaywardHeart · 2023-10-22T10:23:41Z

What was originally a fix for loading files with Japanese characters and trying to stop redundant path traversal in the path cache has ballooned quite a bit in the past few months. Here's the salient points.

UTF-8 lookups on MacOS should now function properly with the path cache on.
Mounting multiple archives or folders that share directory paths no longer causes path cache generation time to increase exponentially. The current implementation of this change requires changes to PhysFS.
As a consequence of the previous point, files can no longer be shadowed by files with different cases in lower priority mounts.
System.mount now accepts a fourth parameter to prepend the mount to the search path, giving it higher priority than everything currently mounted.
Calling System.mount with the reload parameter set to true or unspecified now only updates the path cache instead of triggering a full reload unless System.mount or System.unmount was called with reload set to false since the last path cache generation. This also requires the new PhysFS functionality.
A new method System.regenerate_scripts has been added to simplify reloading the scripts array when prepending mounts that override the scripts file. The method only functions while preload scripts are still running and is a no-op later.
The global ruby method load_data now attempts to use the path cache to access files (it apparently didn't previously).
Multiple performance improvements have been made to mounting and enumerating zip and 7z archives in the related PhysFS PR. See mkxp-z/physfs#1 for the relevant commits.

Fixes #92

Splendide-Imaginarius · 2023-10-31T11:51:12Z

Some kind of customScript test for the performance improvements would be nice. Maybe just a Ruby script that exits immediately as soon as the path cache is loaded, along with a Bash script that launches mkxp-z and times it? (If that is enough to show the improvement.)

Eblo · 2024-01-20T18:27:25Z

I was reminded of this when I mentioned that I'd get some significant chugging on startup if I had several archive files (the more archive files, the longer times). I took a look and found that it was redundantly traversing through already-processed file structures. This PR fixed that for me.

I can't speak for the MacOS piece to this, however.

WaywardHeart · 2024-01-22T14:42:34Z

I somewhat forgot that I never elaborated on that. Yes, it's as Eblo says. When we enumerate a directory, PhysFS gives us everything in each mount point, including duplicates. If that duplicate item is a directory, then we end up traversing it multiple times, which gets exponentially worse the more mounts have that directory path. If one of those directories has a lot of files in it, then the wait time gets to be very noticeable.

I saw a small error in the MacOS commit while I was working on #163, and then had an idea on how I could do the optimization even better, so here's an update! Now with path cache enabled we keep the mounts separate. PhysFS enumerates through the mounts one by one anyway, so there shouldn't be any performance penalty from this. This also guarantees that the highest priority file is chosen, even if the directory path has different cases in different archives. Additionally, once I get around to adding a mount_prepend method for dynamically loading patches, it'll be trivial to have it simply update the path cache instead of completely reloading it like the reload option on mount does.

I kept the previous commit so that this one could be easily reverted if desired, but I can squash them if you want.

WaywardHeart · 2024-03-18T03:34:22Z

I gave this some more thought, and decided I really don't like the dynamic mount point hack from my last push. I experimented with unmounting and remounting everything, but decided that was unacceptably slow. So instead I looked into patching physfs, and found out that the change I wanted was pretty simple to implement.

I'll update this PR again after mkxp-z/physfs#1 has been merged.

WaywardHeart · 2024-03-31T23:32:23Z

I finally got around to polishing my latest bit of feature creep for this PR, so I'm pushing this now for anyone interested to look at. It's currently set to pull from my PhysFS fork.

Path cache enumeration now scans each mount individually rather than having PhysFS lump them all together. This means files in lower priority archives should never shadow files in higher ones.
Calling System.mount with the reload parameter set to true will now only update the path cache instead of fully reloading it unless the path cache is known to be dirty from a previous mount or unmount.
A new optional parameter has been added to System.mount. When set to true, the mount is prepended to the search path, giving it higher priority than what was already mounted.
Going along with prepending, a new method called System.regenerate_scripts has been added that reloads the scripts array. It's for use by preload scripts for loading patches, and after all of the preload scripts have executed becomes a no-op.

Splendide-Imaginarius · 2024-04-01T00:15:49Z

Are your PhysFS changes something that perhaps upstream would be willing to consider merging?

Splendide-Imaginarius · 2024-04-10T22:55:24Z

#181 might make it easier to test this PR.

Operating systems allow file paths to be at least a kibibyte in length. Developers are unlikely to use anywhere near that much, but we may as well allocate for it just in case.

WaywardHeart · 2024-04-11T10:39:44Z

Alright, here's a test suite for you. The ruby script has plenty of comments talking about what's going on.

I also fixed a segfault I accidentally introduced when mounting a nonexistent path and changed load_data (and font accesses, I think? That shouldn't really matter, though) to use the path cache (it'll fall back to using the provided case if the file isn't in the cache, such as if the file had just been created with save_data).

In games that have a lot of files and an RTP is being loaded, this can save a couple seconds. This also keeps the cases of archives with lower priority from overriding the higher priority archives.

This commit requires mkxp-z/physfs#1, which allows us to enumerate files in specific mount points. This allows us to guarantee that the highest priority file is chosen, even if the directory path has different cases in different archives.

System.mount now accepts a fourth argument to prepend the mount to the search path instead of append it. In addition, specifying the fourth argument when the third argument is true will update the path cache instead of triggering a complete reload. This commit also adds a System.regenerate_scripts method, which regenerates the $RGSS_SCRIPTS array, for dynamically loading patches. This method is a no-op after the preload scripts have finished executing.

…rgument. While this could potentially cause problems if the game was hoping to catch changes to loose files, that case seems highly improbable.

WaywardHeart · 2024-04-18T22:07:24Z

I've played around with the test script myself and decided to address a few more performance issues, mostly in the PhysFS PR.

First off, I noticed that enumerating 7zip archives was taking longer than I thought it should, and that if we'd already enumerated it once then reloading the path cache would take even longer. Digging into the PhysFS code, I saw that their hash table is always allocating 64 buckets - which meant the 30k test file had an average of ~469 items per bucket. Now I have it set to generate enough buckets to have 8-16 items per bucket, with a cap of 64k buckets should someone decide to load an archive with over 1 million entries in it.

I also added a PHYSFS_statFromMountPoint function to mitigate that kind of slowdown even further for later loading archives, and changed my PHYSFS_enumerate_path function's name to PHYSFS_enumerateFromMountPoint to match PhysFS's naming scheme. Having PHYSFS_statFromMountPoint also means that our path cache will be able to see files or folders that are being shadowed by the other type, should that situation come up.

Next, I addressed some terrible performance issues in mounting and initially enumerating zip archives. The enumeration problem was caused by PHYSFS_stat triggering some metadata resolution that's supposed to be deferred until the file's read, which was corrected easily enough. The mounting issue, on the other hand, was primarily caused by the entry parser making several small reads per entry. Now we're allocating an up to 1 MiB buffer for that, a zip archive with 50k files goes from taking 4.5 seconds just to mount to just .45 seconds, which is very acceptable. I've also included zip and rgss3a versions of the 50k archive for you to experiment with.

I also included a test for the NFC/NFD thing. It'd be nice if someone with a more modern version of MacOS could confirm the path normalization is still converting it. Here's my output from the current version of mkxp-z:

Current output

archives/EnumerationTest_30k.7z took 0.232163 seconds to mount 
Generating the path cache took 3.35724 seconds 
Mounting ReadTest_2.7z and adding it to the cache took 5.5673189999999995 seconds 
The desensitized path to "tests/read/file.txt" is "Tests/Read/File.txt". 
The contents of Tests/Read/File.txt are incorrect. 
 
Tests/NFC/ヴ.txt desensitized via NFC to NFC form 
Tests/NFC/ヴ.txt not desensitized via NFD 
 
Tests/NFD/ヴ.txt desensitized via NFC to NFC form 
Tests/NFD/ヴ.txt not desensitized via NFD 
 
Testing load_data 
Tests/NFC/ヴ.txt not found 
Tests/NFD/ヴ.txt found via NFC and NFD 
 
Testing Bitmap constructor 
Tests/NFC/ヴ.txt not found 
Tests/NFD/ヴ.txt found via NFC and NFD

And here's the output from this PR:

This PR's output

archives/EnumerationTest_30k.7z took 0.03801499999999999 seconds to mount 
Generating the path cache took 0.12170199999999999 seconds 
Mounting ReadTest_2.7z and adding it to the cache took 0.0009630000000000194 seconds 
The desensitized path to "tests/read/file.txt" is "Tests/Read/file.txt". 
The contents of Tests/Read/File.txt are correct. 
 
Tests/NFC/ヴ.txt desensitized via NFC to NFC form 
Tests/NFC/ヴ.txt desensitized via NFD to NFC form 
 
Tests/NFD/ヴ.txt desensitized via NFC to NFD form 
Tests/NFD/ヴ.txt desensitized via NFD to NFD form 
 
Testing load_data 
Tests/NFC/ヴ.txt found via NFC and NFD 
Tests/NFD/ヴ.txt found via NFC and NFD 
 
Testing Bitmap constructor 
Tests/NFC/ヴ.txt found via NFC and NFD 
Tests/NFD/ヴ.txt found via NFC and NFD

Are your PhysFS changes something that perhaps upstream would be willing to consider merging?

Into the stable branch we're currently using? Almost definitely not. It's been feature frozen for a while now. As for the 4.0 branch... maybe. PHYSFS_enumerateFromMountPoint and PHYSFS_statFromMountPoint go a bit against the idea of treating all of the mounts as a single file system, so I doubt they'd be interested in that, but the performance stuff I should probably at least bring to their attention.

Hrm. And it looks like I force pushed this a bit too soon and it didn't catch the updated PhysFS branch. I'll try to retrigger the builds later if you don't do it first.

Splendide-Imaginarius · 2024-04-18T22:16:08Z

Very impressive optimization wizardry!

Hrm. And it looks like I force pushed this a bit too soon and it didn't catch the updated PhysFS branch. I'll try to retrigger the builds later if you don't do it first.

I'll close and re-open now, no worries.

WaywardHeart · 2024-04-18T22:31:26Z

Nope, still broken. You'll probably have to clear the caches first. https://github.com/mkxp-z/mkxp-z/actions/caches

Splendide-Imaginarius · 2024-04-18T23:16:47Z

Cleared the caches, let's try this again.

WaywardHeart · 2024-04-19T03:35:04Z

Now we're allocating an up to 1 MiB buffer for that, a zip archive with 50k files goes from taking 4.5 seconds just to mount to just .45 seconds

Small correction, introducing that buffer dropped mounting time from ~3 seconds. The other ~1.5 seconds saved was from the hash bucket change, although I don't know why it had such a big impact on mounting. It's still nice to see the time get that low, though.

WaywardHeart · 2024-04-21T11:02:48Z

Just fixed a bug in the linked PR that would have prevented mounting zip64 archives. It won't affect any of the test archives, so you shouldn't need to trigger a rebuild for it. Definitely clear the caches again before merging, though.

WaywardHeart mentioned this pull request Oct 26, 2023

High-res RTP graphics can shadow non-RTP graphics #103

Open

WaywardHeart force-pushed the NFD branch from c68e7c4 to 64fd761 Compare January 22, 2024 14:22

WaywardHeart force-pushed the NFD branch from 64fd761 to c8e6851 Compare March 31, 2024 23:23

Increase the sizes of a few buffers

f67bae8

Operating systems allow file paths to be at least a kibibyte in length. Developers are unlikely to use anywhere near that much, but we may as well allocate for it just in case.

WaywardHeart force-pushed the NFD branch from c8e6851 to d1f07ef Compare April 11, 2024 10:25

WaywardHeart added 8 commits April 18, 2024 15:07

Fix UTF-8 lookups with the path cache on macOS

23c4952

Optimize the path cache slightly

8f51679

In games that have a lot of files and an RTP is being loaded, this can save a couple seconds. This also keeps the cases of archives with lower priority from overriding the higher priority archives.

Further improve path cache enumeration

93cee86

This commit requires mkxp-z/physfs#1, which allows us to enumerate files in specific mount points. This allows us to guarantee that the highest priority file is chosen, even if the directory path has different cases in different archives.

Stop making path cache update vs reload logic depend on the preload a…

8c50098

…rgument. While this could potentially cause problems if the game was hoping to catch changes to loose files, that case seems highly improbable.

Use my physfs fork, for testing purposes.

abec6c6

Use the path cache for load_data

4006119

Add patch cache creation time tests

a06358e

WaywardHeart force-pushed the NFD branch from d1f07ef to a06358e Compare April 18, 2024 21:51

Splendide-Imaginarius closed this Apr 18, 2024

Splendide-Imaginarius reopened this Apr 18, 2024

Splendide-Imaginarius closed this Apr 18, 2024

Splendide-Imaginarius reopened this Apr 18, 2024

WaywardHeart changed the title ~~Fix UTF-8 lookups with the path cache on macOS, and optimize the path cache slightly~~ Improve path cache generation performance and accuracy, and add prepend mounting Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve path cache generation performance and accuracy, and add prepend mounting #93

Improve path cache generation performance and accuracy, and add prepend mounting #93

WaywardHeart commented Oct 22, 2023 •

edited

Splendide-Imaginarius commented Oct 31, 2023

Eblo commented Jan 20, 2024 •

edited

WaywardHeart commented Jan 22, 2024

WaywardHeart commented Mar 18, 2024

WaywardHeart commented Mar 31, 2024

Splendide-Imaginarius commented Apr 1, 2024

Splendide-Imaginarius commented Apr 10, 2024

WaywardHeart commented Apr 11, 2024

WaywardHeart commented Apr 18, 2024 •

edited

Splendide-Imaginarius commented Apr 18, 2024

WaywardHeart commented Apr 18, 2024

Splendide-Imaginarius commented Apr 18, 2024

WaywardHeart commented Apr 19, 2024

WaywardHeart commented Apr 21, 2024

Improve path cache generation performance and accuracy, and add prepend mounting #93

Are you sure you want to change the base?

Improve path cache generation performance and accuracy, and add prepend mounting #93

Conversation

WaywardHeart commented Oct 22, 2023 • edited

Splendide-Imaginarius commented Oct 31, 2023

Eblo commented Jan 20, 2024 • edited

WaywardHeart commented Jan 22, 2024

WaywardHeart commented Mar 18, 2024

WaywardHeart commented Mar 31, 2024

Splendide-Imaginarius commented Apr 1, 2024

Splendide-Imaginarius commented Apr 10, 2024

WaywardHeart commented Apr 11, 2024

WaywardHeart commented Apr 18, 2024 • edited

Splendide-Imaginarius commented Apr 18, 2024

WaywardHeart commented Apr 18, 2024

Splendide-Imaginarius commented Apr 18, 2024

WaywardHeart commented Apr 19, 2024

WaywardHeart commented Apr 21, 2024

WaywardHeart commented Oct 22, 2023 •

edited

Eblo commented Jan 20, 2024 •

edited

WaywardHeart commented Apr 18, 2024 •

edited