Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diarized speech after audio duration #1611

Open
Valahaar opened this issue Jan 11, 2024 · 3 comments
Open

Diarized speech after audio duration #1611

Valahaar opened this issue Jan 11, 2024 · 3 comments

Comments

@Valahaar
Copy link

Tested versions

I tested on this environment:

pyannote.audio==3.1.0
pyannote.core==5.0.0
pyannote.database==5.0.1
pyannote.metrics==3.2.1
pyannote.pipeline==3.0.1
torch==2.1.1
torch-audiomentations==0.11.0
torch-pitch-shift==1.2.4
torchaudio==2.1.1

System information

Ubuntu 22.04 - pyannote 3.1.0

Issue description

Hi! I've encountered a strange problem with an audio file (that was upsampled from 8 kHz, not sure if this is relevant in any way):
using this snippet with my audio file, pyannote outputs a diarization that goes over the length of the audio itself. I cannot share the audio publicly but I would be happy to provide it privately should the need arise.

import torch
from pyannote.audio import Pipeline

pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", use_auth_token='XXX')

audio = torch.load('debug.pt')
waveform, sr = audio['waveform'], audio['sample_rate']
print(waveform.shape, sr, waveform.shape[1] / sr)

dz = pipeline(audio)

print(dz)

output:

[ 00:00:00.008 -->  00:00:01.825] A SPEAKER_00
[ 00:00:01.943 -->  00:00:03.421] B SPEAKER_00
[ 00:00:03.455 -->  00:00:06.918] C SPEAKER_00
[ 00:00:07.224 -->  00:00:08.684] D SPEAKER_00
[ 00:00:08.853 -->  00:00:10.483] E SPEAKER_00
[ 00:00:10.602 -->  00:00:14.677] F SPEAKER_00
[ 00:00:14.966 -->  00:00:20.059] G SPEAKER_00
[ 00:00:20.365 -->  00:00:21.994] H SPEAKER_00
[ 00:00:22.062 -->  00:00:27.631] I SPEAKER_00
[ 00:00:27.903 -->  00:00:28.005] J SPEAKER_00

The last row is out of bounds. I can fix it by .cropping to (0, 27.84), but that does not seem like the right way to handle this.

Let me know if I can help in any other way, thanks!

Minimal reproduction example (MRE)

/

@hbredin
Copy link
Member

hbredin commented Jan 11, 2024

Can you please share the output of print(waveform.shape, sr, waveform.shape[1] / sr)?

@Valahaar
Copy link
Author

Sorry, I was sure to have included it in the original snippet. Here it is:

torch.Size([1, 445440]) 16000 27.84

@hbredin
Copy link
Member

hbredin commented Jan 11, 2024

Would definitely be easier and faster if you shared the audio file and a Google Colab I can just run...

In the meantime, yes, diarization.crop(...) should do the trick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants