Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParseOptions: ignore Picture setting #186

Open
hinto-janai opened this issue Apr 12, 2023 · 6 comments
Open

ParseOptions: ignore Picture setting #186

hinto-janai opened this issue Apr 12, 2023 · 6 comments
Labels
enhancement New feature or request good first issue These issues are a good way to get started with Lofty help wanted Extra attention is needed

Comments

@hinto-janai
Copy link
Contributor

Consider the following situation:

You have 10,000 audio files you want to probe, totaling to around 500 albums:

for file in files {
    let probe  = lofty::Probe::new(file)
    let tagged = probe.guess_file_type()?.read()?;

    // Do stuff with TaggedFile.
}

Currently, this fully scans every file, including the Picture bytes, meaning: every single file allocates a Vec for the picture bytes, even if we have already scanned them, or don't want them in the first place (e.g, we only want the core tag metadata).

I'd like to avoid allocating bytes that I already have, so my proposal:

for file in files {
    let options = lofty::ParseOptions::new().read_picture(false); // This doesn't exist.
    let probe   = lofty::Probe::new(file).options(options);
    let tagged  = probe.guess_file_type()?.read()?;

    // Do stuff with TaggedFile (that doesn't have a Picture).
}

After testing around, this cuts probing time in half for files with non-trivial picture data (1200x1200+).

I'm willing to submit a PR for this, but I was wondering if this ParseOptions::read_picture() API was the correct way forward, or if this should be done another way. It would add another branch each file parse since every version of read_from() would have this added:

if parse_options.read_picture {
    // Push picture bytes.
}
@hinto-janai hinto-janai added the enhancement New feature or request label Apr 12, 2023
@Serial-ATA
Copy link
Owner

An addition to ParseOptions would be the way to go about this. :)

I'm curious though, how did you go about testing this? When items are encountered, they get read into memory before being interpreted, so there would be allocations regardless.

@hinto-janai
Copy link
Contributor Author

This was commented out in the tests (it was a match before though).

if block.ty == BLOCK_ID_PICTURE {
flac_file
.pictures
.push(Picture::from_flac_bytes(&block.content, false)?)
}

I'm now seeing that the allocation is not even here. In my case, I'm iterating in a hot loop over 1000s of files so I guess the encoding + push time was adding up.

If this read were prevented in Block::read, the bytes wouldn't even be allocated, right?

let ty = byte & 0x7F;
let size = data.read_u24::<BigEndian>()?;
let mut content = try_vec![0; size as usize];
data.read_exact(&mut content)?;

Something like a check before continuing after line 32 works but maybe is too invasive? Each Block::read would have to take in a ParseOptions and branch. It would have to return a signal to the calling read_from, letting it know to continue as well.

What is the right way to go about this? I'd like to make this easy to merge.

@Serial-ATA
Copy link
Owner

Yes, the allocation would be avoided completely if you just seek over the content in Block::read.

A nice way you could go about this would be to make Block::read take a closure:

let block = Block::read(reader, |block_type| {
    block_type == BLOCK_ID_VORBIS_COMMENTS
        || (block_type == BLOCK_ID_PICTURE && parse_options.read_pictures)
});

Then just change Block::read:

if predicate(block.ty) {
    // Read block content
} else {
    // Seek over content and return empty Vec
}

@hinto-janai
Copy link
Contributor Author

I will work on this and eventually submit an initial draft PR (probably in the next few weeks).

@hinto-janai
Copy link
Contributor Author

Sorry. I won't be working on this.

@Serial-ATA
Copy link
Owner

This should be a pretty easy feature to implement. I'll keep this open so I remember to eventually work on it.

@Serial-ATA Serial-ATA reopened this Jun 16, 2023
@Serial-ATA Serial-ATA added help wanted Extra attention is needed good first issue These issues are a good way to get started with Lofty labels Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue These issues are a good way to get started with Lofty help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants