Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve path cache generation performance and accuracy, and add prepend mounting #93

Open
wants to merge 9 commits into
base: dev
Choose a base branch
from

Conversation

WaywardHeart
Copy link

@WaywardHeart WaywardHeart commented Oct 22, 2023

What was originally a fix for loading files with Japanese characters and trying to stop redundant path traversal in the path cache has ballooned quite a bit in the past few months. Here's the salient points.

  1. UTF-8 lookups on MacOS should now function properly with the path cache on.
  2. Mounting multiple archives or folders that share directory paths no longer causes path cache generation time to increase exponentially. The current implementation of this change requires changes to PhysFS.
  3. As a consequence of the previous point, files can no longer be shadowed by files with different cases in lower priority mounts.
  4. System.mount now accepts a fourth parameter to prepend the mount to the search path, giving it higher priority than everything currently mounted.
  5. Calling System.mount with the reload parameter set to true or unspecified now only updates the path cache instead of triggering a full reload unless System.mount or System.unmount was called with reload set to false since the last path cache generation. This also requires the new PhysFS functionality.
  6. A new method System.regenerate_scripts has been added to simplify reloading the scripts array when prepending mounts that override the scripts file. The method only functions while preload scripts are still running and is a no-op later.
  7. The global ruby method load_data now attempts to use the path cache to access files (it apparently didn't previously).
  8. Multiple performance improvements have been made to mounting and enumerating zip and 7z archives in the related PhysFS PR. See mkxp-z/physfs#1 for the relevant commits.

Fixes #92

@Splendide-Imaginarius
Copy link

Some kind of customScript test for the performance improvements would be nice. Maybe just a Ruby script that exits immediately as soon as the path cache is loaded, along with a Bash script that launches mkxp-z and times it? (If that is enough to show the improvement.)

@Eblo
Copy link

Eblo commented Jan 20, 2024

I was reminded of this when I mentioned that I'd get some significant chugging on startup if I had several archive files (the more archive files, the longer times). I took a look and found that it was redundantly traversing through already-processed file structures. This PR fixed that for me.

I can't speak for the MacOS piece to this, however.

@WaywardHeart
Copy link
Author

I somewhat forgot that I never elaborated on that. Yes, it's as Eblo says. When we enumerate a directory, PhysFS gives us everything in each mount point, including duplicates. If that duplicate item is a directory, then we end up traversing it multiple times, which gets exponentially worse the more mounts have that directory path. If one of those directories has a lot of files in it, then the wait time gets to be very noticeable.

I saw a small error in the MacOS commit while I was working on #163, and then had an idea on how I could do the optimization even better, so here's an update! Now with path cache enabled we keep the mounts separate. PhysFS enumerates through the mounts one by one anyway, so there shouldn't be any performance penalty from this. This also guarantees that the highest priority file is chosen, even if the directory path has different cases in different archives. Additionally, once I get around to adding a mount_prepend method for dynamically loading patches, it'll be trivial to have it simply update the path cache instead of completely reloading it like the reload option on mount does.

I kept the previous commit so that this one could be easily reverted if desired, but I can squash them if you want.

@WaywardHeart
Copy link
Author

I gave this some more thought, and decided I really don't like the dynamic mount point hack from my last push. I experimented with unmounting and remounting everything, but decided that was unacceptably slow. So instead I looked into patching physfs, and found out that the change I wanted was pretty simple to implement.

I'll update this PR again after mkxp-z/physfs#1 has been merged.

@WaywardHeart
Copy link
Author

I finally got around to polishing my latest bit of feature creep for this PR, so I'm pushing this now for anyone interested to look at. It's currently set to pull from my PhysFS fork.

  • Path cache enumeration now scans each mount individually rather than having PhysFS lump them all together. This means files in lower priority archives should never shadow files in higher ones.
  • Calling System.mount with the reload parameter set to true will now only update the path cache instead of fully reloading it unless the path cache is known to be dirty from a previous mount or unmount.
  • A new optional parameter has been added to System.mount. When set to true, the mount is prepended to the search path, giving it higher priority than what was already mounted.
  • Going along with prepending, a new method called System.regenerate_scripts has been added that reloads the scripts array. It's for use by preload scripts for loading patches, and after all of the preload scripts have executed becomes a no-op.

@Splendide-Imaginarius
Copy link

Are your PhysFS changes something that perhaps upstream would be willing to consider merging?

@Splendide-Imaginarius
Copy link

#181 might make it easier to test this PR.

Operating systems allow file paths to be at least a kibibyte in length.
Developers are unlikely to use anywhere near that much, but we may as well allocate for it just in case.
@WaywardHeart
Copy link
Author

Alright, here's a test suite for you. The ruby script has plenty of comments talking about what's going on.

I also fixed a segfault I accidentally introduced when mounting a nonexistent path and changed load_data (and font accesses, I think? That shouldn't really matter, though) to use the path cache (it'll fall back to using the provided case if the file isn't in the cache, such as if the file had just been created with save_data).

In games that have a lot of files and an RTP is being loaded,
this can save a couple seconds.

This also keeps the cases of archives with lower priority from
overriding the higher priority archives.
This commit requires mkxp-z/physfs#1, which allows us to enumerate files in specific mount points.

This allows us to guarantee that the highest priority file is chosen, even if the directory path has different cases in different archives.
System.mount now accepts a fourth argument to prepend the mount to the search path instead of append it. In addition, specifying the fourth argument when the third argument is true will update the path cache instead of triggering a complete reload.

This commit also adds a System.regenerate_scripts method, which regenerates the $RGSS_SCRIPTS array, for dynamically loading patches. This method is a no-op after the preload scripts have finished executing.
…rgument.

While this could potentially cause problems if the game was hoping to catch changes to loose files, that case seems highly improbable.
@WaywardHeart
Copy link
Author

WaywardHeart commented Apr 18, 2024

I've played around with the test script myself and decided to address a few more performance issues, mostly in the PhysFS PR.

First off, I noticed that enumerating 7zip archives was taking longer than I thought it should, and that if we'd already enumerated it once then reloading the path cache would take even longer. Digging into the PhysFS code, I saw that their hash table is always allocating 64 buckets - which meant the 30k test file had an average of ~469 items per bucket. Now I have it set to generate enough buckets to have 8-16 items per bucket, with a cap of 64k buckets should someone decide to load an archive with over 1 million entries in it.

I also added a PHYSFS_statFromMountPoint function to mitigate that kind of slowdown even further for later loading archives, and changed my PHYSFS_enumerate_path function's name to PHYSFS_enumerateFromMountPoint to match PhysFS's naming scheme. Having PHYSFS_statFromMountPoint also means that our path cache will be able to see files or folders that are being shadowed by the other type, should that situation come up.

Next, I addressed some terrible performance issues in mounting and initially enumerating zip archives. The enumeration problem was caused by PHYSFS_stat triggering some metadata resolution that's supposed to be deferred until the file's read, which was corrected easily enough. The mounting issue, on the other hand, was primarily caused by the entry parser making several small reads per entry. Now we're allocating an up to 1 MiB buffer for that, a zip archive with 50k files goes from taking 4.5 seconds just to mount to just .45 seconds, which is very acceptable. I've also included zip and rgss3a versions of the 50k archive for you to experiment with.

I also included a test for the NFC/NFD thing. It'd be nice if someone with a more modern version of MacOS could confirm the path normalization is still converting it. Here's my output from the current version of mkxp-z:

Current output
archives/EnumerationTest_30k.7z took 0.232163 seconds to mount 
Generating the path cache took 3.35724 seconds 
Mounting ReadTest_2.7z and adding it to the cache took 5.5673189999999995 seconds 
The desensitized path to "tests/read/file.txt" is "Tests/Read/File.txt". 
The contents of Tests/Read/File.txt are incorrect. 
 
Tests/NFC/ヴ.txt desensitized via NFC to NFC form 
Tests/NFC/ヴ.txt not desensitized via NFD 
 
Tests/NFD/ヴ.txt desensitized via NFC to NFC form 
Tests/NFD/ヴ.txt not desensitized via NFD 
 
Testing load_data 
Tests/NFC/ヴ.txt not found 
Tests/NFD/ヴ.txt found via NFC and NFD 
 
Testing Bitmap constructor 
Tests/NFC/ヴ.txt not found 
Tests/NFD/ヴ.txt found via NFC and NFD

And here's the output from this PR:

This PR's output
archives/EnumerationTest_30k.7z took 0.03801499999999999 seconds to mount 
Generating the path cache took 0.12170199999999999 seconds 
Mounting ReadTest_2.7z and adding it to the cache took 0.0009630000000000194 seconds 
The desensitized path to "tests/read/file.txt" is "Tests/Read/file.txt". 
The contents of Tests/Read/File.txt are correct. 
 
Tests/NFC/ヴ.txt desensitized via NFC to NFC form 
Tests/NFC/ヴ.txt desensitized via NFD to NFC form 
 
Tests/NFD/ヴ.txt desensitized via NFC to NFD form 
Tests/NFD/ヴ.txt desensitized via NFD to NFD form 
 
Testing load_data 
Tests/NFC/ヴ.txt found via NFC and NFD 
Tests/NFD/ヴ.txt found via NFC and NFD 
 
Testing Bitmap constructor 
Tests/NFC/ヴ.txt found via NFC and NFD 
Tests/NFD/ヴ.txt found via NFC and NFD

Are your PhysFS changes something that perhaps upstream would be willing to consider merging?

Into the stable branch we're currently using? Almost definitely not. It's been feature frozen for a while now. As for the 4.0 branch... maybe. PHYSFS_enumerateFromMountPoint and PHYSFS_statFromMountPoint go a bit against the idea of treating all of the mounts as a single file system, so I doubt they'd be interested in that, but the performance stuff I should probably at least bring to their attention.

Hrm. And it looks like I force pushed this a bit too soon and it didn't catch the updated PhysFS branch. I'll try to retrigger the builds later if you don't do it first.

@Splendide-Imaginarius
Copy link

Very impressive optimization wizardry!

Hrm. And it looks like I force pushed this a bit too soon and it didn't catch the updated PhysFS branch. I'll try to retrigger the builds later if you don't do it first.

I'll close and re-open now, no worries.

@WaywardHeart
Copy link
Author

Nope, still broken. You'll probably have to clear the caches first. https://github.com/mkxp-z/mkxp-z/actions/caches

@Splendide-Imaginarius
Copy link

Cleared the caches, let's try this again.

@WaywardHeart WaywardHeart changed the title Fix UTF-8 lookups with the path cache on macOS, and optimize the path cache slightly Improve path cache generation performance and accuracy, and add prepend mounting Apr 19, 2024
@WaywardHeart
Copy link
Author

Now we're allocating an up to 1 MiB buffer for that, a zip archive with 50k files goes from taking 4.5 seconds just to mount to just .45 seconds

Small correction, introducing that buffer dropped mounting time from ~3 seconds. The other ~1.5 seconds saved was from the hash bucket change, although I don't know why it had such a big impact on mounting. It's still nice to see the time get that low, though.

@WaywardHeart
Copy link
Author

Just fixed a bug in the linked PR that would have prevented mounting zip64 archives. It won't affect any of the test archives, so you shouldn't need to trigger a rebuild for it. Definitely clear the caches again before merging, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Case sensitivity issue with RTP.
3 participants