numpy.ndarray audio input doesn't work? #1690

Purfview · 2024-04-18T23:07:31Z

Tested versions

pyannote.audio==3.1.1

System information

Windows / CPU

Issue description

If the wrong type is passed to a pipeline you get this error message:

ValueError: 
Audio files can be provided to the Audio class using different types:
    - a "str" or "Path" instance: "audio.wav" or Path("audio.wav")
    - a "IOBase" instance with "read" and "seek" support: open("audio.wav", "rb")
    - a "Mapping" with any of the above as "audio" key: {"audio": ...}
    - a "Mapping" with both "waveform" and "sample_rate" key:
        {"waveform": (channel, time) numpy.ndarray or torch.Tensor, "sample_rate": 44100}

It says above that it supports numpy.ndarray,

Test:

from pyannote.audio import Model
model = Model.from_pretrained(
  "pyannote/segmentation-3.0", 
  use_auth_token="removed")

# Generate dummy audio:
import numpy as np
audio_data = np.sin(2 * np.pi * 440 * np.linspace(0, 60, 60*16000)).astype(np.float32) / 32768.0
# Reshape to "(channel, time)":
audio_data = audio_data.reshape(1, -1)

audio_data = {"waveform": audio_data, "sample_rate": 16000}

from pyannote.audio.pipelines import VoiceActivityDetection
pipeline = VoiceActivityDetection(segmentation=model)
HYPER_PARAMETERS = {"min_duration_on": 0.0, "min_duration_off": 0.0}
pipeline.instantiate(HYPER_PARAMETERS)
timecodes = pipeline(audio_data)
print(timecodes)

Error:

    timecodes = pipeline(audio_data)
  File "D:\Programs\Python64\lib\site-packages\pyannote\audio\core\pipeline.py", line 325, in __call__
    return self.apply(file, **kwargs)
  File "D:\Programs\Python64\lib\site-packages\pyannote\audio\pipelines\voice_activity_detection.py", line 211, in apply
    segmentations: SlidingWindowFeature = self._segmentation(
  File "D:\Programs\Python64\lib\site-packages\pyannote\audio\core\inference.py", line 425, in __call__
    return self.slide(waveform, sample_rate, hook=hook)
  File "D:\Programs\Python64\lib\site-packages\pyannote\audio\core\inference.py", line 281, in slide
    waveform.unfold(1, window_size, step_size),
AttributeError: 'numpy.ndarray' object has no attribute 'unfold'

The text was updated successfully, but these errors were encountered:

hbredin · 2024-04-19T06:26:30Z

Thanks for the bug report.

Would you mind opening a PR removing the mention of numpy arrays in the error message?

Remove numpy.ndarray mentions as it's not supported. Closing pyannote#1690

Purfview · 2024-04-19T15:44:03Z

Offtopic question:
Is it possible to get VAD results faster, for example pyannote-onnx implementation is ~5 times faster for me?

Would you mind opening a PR removing the mention of numpy arrays in the error message?

Done

hbredin · 2024-04-22T14:18:45Z

Thanks for the PR.

Please open a new issue/discussion for your other question.

hbredin added the Good first issue label Apr 19, 2024

Purfview added a commit to Purfview/pyannote-audio that referenced this issue Apr 19, 2024

Remove numpy.ndarray mentions

466ef98

Remove numpy.ndarray mentions as it's not supported. Closing pyannote#1690

Purfview mentioned this issue Apr 19, 2024

Remove numpy.ndarray mentions #1691

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

numpy.ndarray audio input doesn't work? #1690

numpy.ndarray audio input doesn't work? #1690

Purfview commented Apr 18, 2024 •

edited

hbredin commented Apr 19, 2024

Purfview commented Apr 19, 2024

hbredin commented Apr 22, 2024

numpy.ndarray audio input doesn't work? #1690

numpy.ndarray audio input doesn't work? #1690

Comments

Purfview commented Apr 18, 2024 • edited

Tested versions

System information

Issue description

hbredin commented Apr 19, 2024

Purfview commented Apr 19, 2024

hbredin commented Apr 22, 2024

Purfview commented Apr 18, 2024 •

edited