Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance on NFS-mounted files much helped by specifying buffering #587

Open
medoc92 opened this issue Dec 24, 2022 · 1 comment
Open

Comments

@medoc92
Copy link

medoc92 commented Dec 24, 2022

This is probably not a mutagen issue, but something which may be of interest anyway. I did not try to reproduce the thing in other contexts, so it may be quite specific.
While doing mass tags extraction from an NFS-mounted file system, specifying buffering=4096 to the open() call in _utils.py
yields a massive performance improvement (around 5x in my configuration).

Details:

  • Client system: "Ubuntu 22.04.1 LTS" Linux 5.15.0-56-generic Python 3.10.6
  • NFS server: Odroid hc4 : ARM running "Ubuntu 22.04.1 LTS" Linux 5.19.17-meson64
  • The volume is a 4TB spinning disk on the ARM system.

Without the buffering parameter, extracting tags from 3000 FLAC and MP3 files takes around 100 mS per file. With the buffering argument we get down to around 22 mS

I also did a quick test on a local SSD, on which the buffering does not appear to make a difference one way or another.

Tests done while trying to determine why recoll was slow indexing NFS-mounted audio files. The workaround for the application is to open the file with a buffering argument, before building the mutagen object.

This appears to be actually a Python bug, as from the Python manual open() doc:

Binary files are buffered in fixed-size chunks; the size of the buffer is chosen using a heuristic trying 
to determine the underlying device’s “block size” and falling back on
 [io.DEFAULT_BUFFER_SIZE](https://docs.python.org/3/library/io.html#io.DEFAULT_BUFFER_SIZE). 
On many systems, the buffer will typically be 4096 or 8192 bytes long.

So specifying buffering=4096 should be close to a no-op, and doing it as a precautionary default in mutagen should be inocuous enough.

@martinwguy
Copy link

Thanks, but it's _util.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants