Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy.ndarray audio input doesn't work? #1690

Open
Purfview opened this issue Apr 18, 2024 · 3 comments
Open

numpy.ndarray audio input doesn't work? #1690

Purfview opened this issue Apr 18, 2024 · 3 comments

Comments

@Purfview
Copy link

Purfview commented Apr 18, 2024

Tested versions

pyannote.audio==3.1.1

System information

Windows / CPU

Issue description

If the wrong type is passed to a pipeline you get this error message:

ValueError: 
Audio files can be provided to the Audio class using different types:
    - a "str" or "Path" instance: "audio.wav" or Path("audio.wav")
    - a "IOBase" instance with "read" and "seek" support: open("audio.wav", "rb")
    - a "Mapping" with any of the above as "audio" key: {"audio": ...}
    - a "Mapping" with both "waveform" and "sample_rate" key:
        {"waveform": (channel, time) numpy.ndarray or torch.Tensor, "sample_rate": 44100}

It says above that it supports numpy.ndarray,

Test:

from pyannote.audio import Model
model = Model.from_pretrained(
  "pyannote/segmentation-3.0", 
  use_auth_token="removed")

# Generate dummy audio:
import numpy as np
audio_data = np.sin(2 * np.pi * 440 * np.linspace(0, 60, 60*16000)).astype(np.float32) / 32768.0
# Reshape to "(channel, time)":
audio_data = audio_data.reshape(1, -1)

audio_data = {"waveform": audio_data, "sample_rate": 16000}

from pyannote.audio.pipelines import VoiceActivityDetection
pipeline = VoiceActivityDetection(segmentation=model)
HYPER_PARAMETERS = {"min_duration_on": 0.0, "min_duration_off": 0.0}
pipeline.instantiate(HYPER_PARAMETERS)
timecodes = pipeline(audio_data)
print(timecodes)

Error:

    timecodes = pipeline(audio_data)
  File "D:\Programs\Python64\lib\site-packages\pyannote\audio\core\pipeline.py", line 325, in __call__
    return self.apply(file, **kwargs)
  File "D:\Programs\Python64\lib\site-packages\pyannote\audio\pipelines\voice_activity_detection.py", line 211, in apply
    segmentations: SlidingWindowFeature = self._segmentation(
  File "D:\Programs\Python64\lib\site-packages\pyannote\audio\core\inference.py", line 425, in __call__
    return self.slide(waveform, sample_rate, hook=hook)
  File "D:\Programs\Python64\lib\site-packages\pyannote\audio\core\inference.py", line 281, in slide
    waveform.unfold(1, window_size, step_size),
AttributeError: 'numpy.ndarray' object has no attribute 'unfold'
@hbredin
Copy link
Member

hbredin commented Apr 19, 2024

Thanks for the bug report.

Would you mind opening a PR removing the mention of numpy arrays in the error message?

Purfview added a commit to Purfview/pyannote-audio that referenced this issue Apr 19, 2024
Remove numpy.ndarray mentions as it's not supported.

Closing pyannote#1690
@Purfview
Copy link
Author

Offtopic question:
Is it possible to get VAD results faster, for example pyannote-onnx implementation is ~5 times faster for me?

Would you mind opening a PR removing the mention of numpy arrays in the error message?

Done

@hbredin
Copy link
Member

hbredin commented Apr 22, 2024

Thanks for the PR.

Please open a new issue/discussion for your other question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants