ParseOptions: ignore `Picture` setting #186

hinto-janai · 2023-04-12T16:48:32Z

Consider the following situation:

You have 10,000 audio files you want to probe, totaling to around 500 albums:

for file in files {
    let probe  = lofty::Probe::new(file)
    let tagged = probe.guess_file_type()?.read()?;

    // Do stuff with TaggedFile.
}

Currently, this fully scans every file, including the Picture bytes, meaning: every single file allocates a Vec for the picture bytes, even if we have already scanned them, or don't want them in the first place (e.g, we only want the core tag metadata).

I'd like to avoid allocating bytes that I already have, so my proposal:

for file in files {
    let options = lofty::ParseOptions::new().read_picture(false); // This doesn't exist.
    let probe   = lofty::Probe::new(file).options(options);
    let tagged  = probe.guess_file_type()?.read()?;

    // Do stuff with TaggedFile (that doesn't have a Picture).
}

After testing around, this cuts probing time in half for files with non-trivial picture data (1200x1200+).

I'm willing to submit a PR for this, but I was wondering if this ParseOptions::read_picture() API was the correct way forward, or if this should be done another way. It would add another branch each file parse since every version of read_from() would have this added:

if parse_options.read_picture {
    // Push picture bytes.
}

The text was updated successfully, but these errors were encountered:

Serial-ATA · 2023-04-13T02:52:04Z

An addition to ParseOptions would be the way to go about this. :)

I'm curious though, how did you go about testing this? When items are encountered, they get read into memory before being interpreted, so there would be allocations regardless.

hinto-janai · 2023-04-13T12:48:25Z

This was commented out in the tests (it was a match before though).

lofty-rs/src/flac/read.rs

Lines 93 to 97 in 46127a0

    
           if block.ty == BLOCK_ID_PICTURE { 
        
           	flac_file 
        
           		.pictures 
        
           		.push(Picture::from_flac_bytes(&block.content, false)?) 
        
           }

I'm now seeing that the allocation is not even here. In my case, I'm iterating in a hot loop over 1000s of files so I guess the encoding + push time was adding up.

If this read were prevented in Block::read, the bytes wouldn't even be allocated, right?

lofty-rs/src/flac/block.rs

Lines 32 to 37 in 46127a0

    
           let ty = byte & 0x7F; 
        
           let size = data.read_u24::<BigEndian>()?; 
        
           let mut content = try_vec![0; size as usize]; 
        
           data.read_exact(&mut content)?;

Something like a check before continuing after line 32 works but maybe is too invasive? Each Block::read would have to take in a ParseOptions and branch. It would have to return a signal to the calling read_from, letting it know to continue as well.

What is the right way to go about this? I'd like to make this easy to merge.

Serial-ATA · 2023-04-14T09:35:04Z

Yes, the allocation would be avoided completely if you just seek over the content in Block::read.

A nice way you could go about this would be to make Block::read take a closure:

let block = Block::read(reader, |block_type| {
    block_type == BLOCK_ID_VORBIS_COMMENTS
        || (block_type == BLOCK_ID_PICTURE && parse_options.read_pictures)
});

Then just change Block::read:

if predicate(block.ty) {
    // Read block content
} else {
    // Seek over content and return empty Vec
}

hinto-janai · 2023-04-19T19:20:43Z

I will work on this and eventually submit an initial draft PR (probably in the next few weeks).

hinto-janai · 2023-06-16T22:33:13Z

Sorry. I won't be working on this.

Serial-ATA · 2023-06-16T23:19:27Z

This should be a pretty easy feature to implement. I'll keep this open so I remember to eventually work on it.

hinto-janai added the enhancement New feature or request label Apr 12, 2023

hinto-janai closed this as completed Jun 16, 2023

Serial-ATA reopened this Jun 16, 2023

Serial-ATA added help wanted Extra attention is needed good first issue These issues are a good way to get started with Lofty labels Aug 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ParseOptions: ignore `Picture` setting #186

ParseOptions: ignore `Picture` setting #186

hinto-janai commented Apr 12, 2023

Serial-ATA commented Apr 13, 2023

hinto-janai commented Apr 13, 2023

Serial-ATA commented Apr 14, 2023

hinto-janai commented Apr 19, 2023

hinto-janai commented Jun 16, 2023

Serial-ATA commented Jun 16, 2023

ParseOptions: ignore Picture setting #186

ParseOptions: ignore Picture setting #186

Comments

hinto-janai commented Apr 12, 2023

Serial-ATA commented Apr 13, 2023

hinto-janai commented Apr 13, 2023

Serial-ATA commented Apr 14, 2023

hinto-janai commented Apr 19, 2023

hinto-janai commented Jun 16, 2023

Serial-ATA commented Jun 16, 2023

ParseOptions: ignore `Picture` setting #186

ParseOptions: ignore `Picture` setting #186