Diarized speech after audio duration #1611

Valahaar · 2024-01-11T05:42:02Z

Tested versions

I tested on this environment:

pyannote.audio==3.1.0
pyannote.core==5.0.0
pyannote.database==5.0.1
pyannote.metrics==3.2.1
pyannote.pipeline==3.0.1
torch==2.1.1
torch-audiomentations==0.11.0
torch-pitch-shift==1.2.4
torchaudio==2.1.1

System information

Ubuntu 22.04 - pyannote 3.1.0

Issue description

Hi! I've encountered a strange problem with an audio file (that was upsampled from 8 kHz, not sure if this is relevant in any way):
using this snippet with my audio file, pyannote outputs a diarization that goes over the length of the audio itself. I cannot share the audio publicly but I would be happy to provide it privately should the need arise.

import torch
from pyannote.audio import Pipeline

pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", use_auth_token='XXX')

audio = torch.load('debug.pt')
waveform, sr = audio['waveform'], audio['sample_rate']
print(waveform.shape, sr, waveform.shape[1] / sr)

dz = pipeline(audio)

print(dz)

output:

[ 00:00:00.008 -->  00:00:01.825] A SPEAKER_00
[ 00:00:01.943 -->  00:00:03.421] B SPEAKER_00
[ 00:00:03.455 -->  00:00:06.918] C SPEAKER_00
[ 00:00:07.224 -->  00:00:08.684] D SPEAKER_00
[ 00:00:08.853 -->  00:00:10.483] E SPEAKER_00
[ 00:00:10.602 -->  00:00:14.677] F SPEAKER_00
[ 00:00:14.966 -->  00:00:20.059] G SPEAKER_00
[ 00:00:20.365 -->  00:00:21.994] H SPEAKER_00
[ 00:00:22.062 -->  00:00:27.631] I SPEAKER_00
[ 00:00:27.903 -->  00:00:28.005] J SPEAKER_00

The last row is out of bounds. I can fix it by .cropping to (0, 27.84), but that does not seem like the right way to handle this.

Let me know if I can help in any other way, thanks!

Minimal reproduction example (MRE)

/

The text was updated successfully, but these errors were encountered:

hbredin · 2024-01-11T10:13:14Z

Can you please share the output of print(waveform.shape, sr, waveform.shape[1] / sr)?

Valahaar · 2024-01-11T10:42:59Z

Sorry, I was sure to have included it in the original snippet. Here it is:

torch.Size([1, 445440]) 16000 27.84

hbredin · 2024-01-11T12:26:44Z

Would definitely be easier and faster if you shared the audio file and a Google Colab I can just run...

In the meantime, yes, diarization.crop(...) should do the trick.

hbredin added the cannot_reproduce label Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diarized speech after audio duration #1611

Diarized speech after audio duration #1611

Valahaar commented Jan 11, 2024

hbredin commented Jan 11, 2024 •

edited

Valahaar commented Jan 11, 2024

hbredin commented Jan 11, 2024

Diarized speech after audio duration #1611

Diarized speech after audio duration #1611

Comments

Valahaar commented Jan 11, 2024

Tested versions

System information

Issue description

Minimal reproduction example (MRE)

hbredin commented Jan 11, 2024 • edited

Valahaar commented Jan 11, 2024

hbredin commented Jan 11, 2024

hbredin commented Jan 11, 2024 •

edited