Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing batches of audio files through Essentia-Tensorflow pre-trained models #1353

Open
burstMembrane opened this issue Jul 10, 2023 · 8 comments

Comments

@burstMembrane
Copy link

First of all thanks to the contributors of this library!

I'm currently trying to batch create embeddings from the AudioSet-VGGish pre-trained model

Am able to follow the docs to download the pretrained model and generate embeddings.

from essentia.standard import MonoLoader, TensorflowPredictVGGish

audio = MonoLoader(filename="audio.wav", sampleRate=16000, resampleQuality=4)()
model = TensorflowPredictVGGish(graphFilename="audioset-vggish-3.pb", output="model/vggish/embeddings")
embeddings = model(audio)

The problem is the examples don't show any implementation for batch_processing of multiple audio files. When I chucked the below code in for loop, it reinitializes tensorflow and runs really slow each iteration of the loop e.g

from essentia.standard import MonoLoader, TensorflowPredictVGGish
audio_paths = ["file1.wav", "file2.wav"]

for audio in audio_paths:
  audio = MonoLoader(filename="audio.wav", sampleRate=16000, resampleQuality=4)()
  model = TensorflowPredictVGGish(graphFilename="audioset-vggish-3.pb", output="model/vggish/embeddings")
  embeddings = model(audio)

I've tried it like this and it does the same thing, is there a way to process audio in batches or stop tensorflow from reinitializing each run?

@palonso
Copy link
Contributor

palonso commented Jul 10, 2023

Yes, you can initialize MonoLoader and TensorflowPredictVGGish outside the inference loop:

from essentia.standard import MonoLoader, TensorflowPredictVGGish
audio_paths = ["file1.wav", "file2.wav"]

loader = MonoLoader()
model = TensorflowPredictVGGish(graphFilename="audioset-vggish-3.pb", output="model/vggish/embeddings")

for audio in audio_paths:
    loader.configure(filename=audio, sampleRate=16000, resampleQuality=4)
    audio = loader()
    embeddings = model(audio)

@Galvo87
Copy link

Galvo87 commented Aug 4, 2023

Yes, you can initialize MonoLoader and TensorflowPredictVGGish outside the inference loop:

from essentia.standard import MonoLoader, TensorflowPredictVGGish
audio_paths = ["file1.wav", "file2.wav"]

loader = MonoLoader()
model = TensorflowPredictVGGish(graphFilename="audioset-vggish-3.pb", output="model/vggish/embeddings")

for audio in audio_paths:
    audio = loader.configure(filename=audio, sampleRate=16000, resampleQuality=4)
    embeddings = model(audio)
loader = MonoLoader()
print(loader)

returns TypeError: __str__ returned non-string (type NoneType).

It seems loader.configure() is not behaving well, it always returns None, also in your code above.

@palonso
Copy link
Contributor

palonso commented Aug 7, 2023

that's the expected return value for configure.

@Galvo87
Copy link

Galvo87 commented Aug 8, 2023

Ok got it, but I still don't understand how this could work out...

@palonso
Copy link
Contributor

palonso commented Aug 9, 2023

sorry @Galvo87!
It was a mistake in my example script.
I've updated the script and double-checked that it works.

The loader had to be configured first and then called.

@jbm-composer
Copy link

jbm-composer commented May 9, 2024

@burstMembrane, did you find a good solution for batch processing? I have 8 GPUs and want to extract a bunch of embeddings as quickly as possible

I noticed the "batch_size" argument, but it seems like that has to do with how many "patches" it will process from the input audio file, rather than an option to batch-process multiple audio files.

Any tips appreciated.

@palonso
Copy link
Contributor

palonso commented May 9, 2024

The simplest approach would be to modify this script to receive a list of files to process with something like argparse.

import argparse
from essentia.standard import MonoLoader, TensorflowPredictVGGish

def main(audio_paths):
    loader = MonoLoader()
    model = TensorflowPredictVGGish(graphFilename="audioset-vggish-3.pb", output="model/vggish/embeddings")

    for audio in audio_paths:
        loader.configure(filename=audio, sampleRate=16000, resampleQuality=4)
        audio = loader()
        embeddings = model(audio)

        # save the embeddings ...

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Process audio files using VGGish model")
    parser.add_argument("audio_files", nargs="+", help="List of audio files to process")
    args = parser.parse_args()
    main(args.audio_files)
)

then you can divide the filelist you want to process in 8 chunks, (e.g., split -n l/8 -d filelist filelist_part)

Finally you can launch one script per GPU:

CUDA_VISIBLE_DEVICES=0 python extract_embeddings.py $(< filelist_part00)
...
CUDA_VISIBLE_DEVICES=7 python extract_embeddings.py $(< filelist_part07)

@jbm-composer
Copy link

Thanks, yes, I actually realized there was something similar I could do, in just chunking my data into my GPU-count chunks (8) and having a separate serial process for each GPU. Works well. (I also used batchSize=-1, which think helps optimize a bit, though I'm not totally sure about that one.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants