Processing batches of audio files through Essentia-Tensorflow pre-trained models #1353

burstMembrane · 2023-07-10T06:25:11Z

First of all thanks to the contributors of this library!

I'm currently trying to batch create embeddings from the AudioSet-VGGish pre-trained model

Am able to follow the docs to download the pretrained model and generate embeddings.

from essentia.standard import MonoLoader, TensorflowPredictVGGish

audio = MonoLoader(filename="audio.wav", sampleRate=16000, resampleQuality=4)()
model = TensorflowPredictVGGish(graphFilename="audioset-vggish-3.pb", output="model/vggish/embeddings")
embeddings = model(audio)

The problem is the examples don't show any implementation for batch_processing of multiple audio files. When I chucked the below code in for loop, it reinitializes tensorflow and runs really slow each iteration of the loop e.g

from essentia.standard import MonoLoader, TensorflowPredictVGGish
audio_paths = ["file1.wav", "file2.wav"]

for audio in audio_paths:
  audio = MonoLoader(filename="audio.wav", sampleRate=16000, resampleQuality=4)()
  model = TensorflowPredictVGGish(graphFilename="audioset-vggish-3.pb", output="model/vggish/embeddings")
  embeddings = model(audio)

I've tried it like this and it does the same thing, is there a way to process audio in batches or stop tensorflow from reinitializing each run?

palonso · 2023-07-10T10:10:28Z

Yes, you can initialize MonoLoader and TensorflowPredictVGGish outside the inference loop:

from essentia.standard import MonoLoader, TensorflowPredictVGGish
audio_paths = ["file1.wav", "file2.wav"]

loader = MonoLoader()
model = TensorflowPredictVGGish(graphFilename="audioset-vggish-3.pb", output="model/vggish/embeddings")

for audio in audio_paths:
    loader.configure(filename=audio, sampleRate=16000, resampleQuality=4)
    audio = loader()
    embeddings = model(audio)

Galvo87 · 2023-08-04T08:18:49Z

Yes, you can initialize MonoLoader and TensorflowPredictVGGish outside the inference loop:

from essentia.standard import MonoLoader, TensorflowPredictVGGish
audio_paths = ["file1.wav", "file2.wav"]

loader = MonoLoader()
model = TensorflowPredictVGGish(graphFilename="audioset-vggish-3.pb", output="model/vggish/embeddings")

for audio in audio_paths:
    audio = loader.configure(filename=audio, sampleRate=16000, resampleQuality=4)
    embeddings = model(audio)

loader = MonoLoader()
print(loader)

returns TypeError: __str__ returned non-string (type NoneType).

It seems loader.configure() is not behaving well, it always returns None, also in your code above.

palonso · 2023-08-07T14:30:14Z

that's the expected return value for configure.

Galvo87 · 2023-08-08T10:39:02Z

Ok got it, but I still don't understand how this could work out...

palonso · 2023-08-09T06:59:38Z

sorry @Galvo87!
It was a mistake in my example script.
I've updated the script and double-checked that it works.

The loader had to be configured first and then called.

jbm-composer · 2024-05-09T15:57:30Z

@burstMembrane, did you find a good solution for batch processing? I have 8 GPUs and want to extract a bunch of embeddings as quickly as possible

I noticed the "batch_size" argument, but it seems like that has to do with how many "patches" it will process from the input audio file, rather than an option to batch-process multiple audio files.

Any tips appreciated.

palonso · 2024-05-09T17:41:29Z

The simplest approach would be to modify this script to receive a list of files to process with something like argparse.

import argparse
from essentia.standard import MonoLoader, TensorflowPredictVGGish

def main(audio_paths):
    loader = MonoLoader()
    model = TensorflowPredictVGGish(graphFilename="audioset-vggish-3.pb", output="model/vggish/embeddings")

    for audio in audio_paths:
        loader.configure(filename=audio, sampleRate=16000, resampleQuality=4)
        audio = loader()
        embeddings = model(audio)

        # save the embeddings ...

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Process audio files using VGGish model")
    parser.add_argument("audio_files", nargs="+", help="List of audio files to process")
    args = parser.parse_args()
    main(args.audio_files)
)

then you can divide the filelist you want to process in 8 chunks, (e.g., split -n l/8 -d filelist filelist_part)

Finally you can launch one script per GPU:

CUDA_VISIBLE_DEVICES=0 python extract_embeddings.py $(< filelist_part00)
...
CUDA_VISIBLE_DEVICES=7 python extract_embeddings.py $(< filelist_part07)

jbm-composer · 2024-05-09T18:15:40Z

Thanks, yes, I actually realized there was something similar I could do, in just chunking my data into my GPU-count chunks (8) and having a separate serial process for each GPU. Works well. (I also used batchSize=-1, which think helps optimize a bit, though I'm not totally sure about that one.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processing batches of audio files through Essentia-Tensorflow pre-trained models #1353

Processing batches of audio files through Essentia-Tensorflow pre-trained models #1353

burstMembrane commented Jul 10, 2023

palonso commented Jul 10, 2023 •

edited

Galvo87 commented Aug 4, 2023

palonso commented Aug 7, 2023

Galvo87 commented Aug 8, 2023

palonso commented Aug 9, 2023

jbm-composer commented May 9, 2024 •

edited

palonso commented May 9, 2024

jbm-composer commented May 9, 2024

Processing batches of audio files through Essentia-Tensorflow pre-trained models #1353

Processing batches of audio files through Essentia-Tensorflow pre-trained models #1353

Comments

burstMembrane commented Jul 10, 2023

palonso commented Jul 10, 2023 • edited

Galvo87 commented Aug 4, 2023

palonso commented Aug 7, 2023

Galvo87 commented Aug 8, 2023

palonso commented Aug 9, 2023

jbm-composer commented May 9, 2024 • edited

palonso commented May 9, 2024

jbm-composer commented May 9, 2024

palonso commented Jul 10, 2023 •

edited

jbm-composer commented May 9, 2024 •

edited