Skip to content

Releases: pyannote/pyannote-audio

Version 3.1.1

01 Dec 13:26
Compare
Choose a tag to compare

TL;DR

Providing num_speakers to pyannote/speaker-diarization-3.1 now works as expected.

Full changelog

Fixes

Version 3.1.0

16 Nov 12:37
Compare
Choose a tag to compare

TL;DR

pyannote/speaker-diarization-3.1 no longer requires unpopular ONNX runtime

Full changelog

New features

  • feat(model): add WeSpeaker embedding wrapper based on PyTorch
  • feat(model): add support for multi-speaker statistics pooling
  • feat(pipeline): add TimingHook for profiling processing time
  • feat(pipeline): add ArtifactHook for saving internal steps
  • feat(pipeline): add support for list of hooks with Hooks
  • feat(utils): add "soft" option to Powerset.to_multilabel

Fixes

  • fix(pipeline): add missing "embedding" hook call in SpeakerDiarization
  • fix(pipeline): fix AgglomerativeClustering to honor num_clusters when provided
  • fix(pipeline): fix frame-wise speaker count exceeding max_speakers or detected num_speakers in SpeakerDiarization pipeline

Improvements

  • improve(pipeline): compute fbank on GPU when requested

Breaking changes

  • BREAKING(pipeline): rename WeSpeakerPretrainedSpeakerEmbedding to ONNXWeSpeakerPretrainedSpeakerEmbedding
  • BREAKING(setup): remove onnxruntime dependency.
    You can still use ONNX hbredin/wespeaker-voxceleb-resnet34-LM but you will have to install onnxruntime yourself.
  • BREAKING(pipeline): remove logging_hook (use ArtifactHook instead)
  • BREAKING(pipeline): remove onset and offset parameter in SpeakerDiarizationMixin.speaker_count
    You should now binarize segmentations before passing them to speaker_count

Version 3.0.1

28 Sep 19:47
Compare
Choose a tag to compare

TL;DR

pyannote/speaker-diarization-3.0 is now much faster when sent to GPU.

import torch
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.0")
pipeline.to(torch.device("cuda"))

Full changelog

Fixes and improvements

  • fix: fix WeSpeakerPretrainedSpeakerEmbedding GPU support

Dependencies update

  • setup: switch from onnxruntime to onnxruntime-gpu

Version 3.0.0

26 Sep 13:00
Compare
Choose a tag to compare

TL;DR

Better pretrained pipeline and model

Benchmark (DER %) v2.1 v3.0
AISHELL-4 14.1 12.3
AliMeeting (channel 1) 27.4 24.3
AMI (IHM) 18.9 19.0
AMI (SDM) 27.1 22.2
AVA-AVD - 49.1
DIHARD 3 (full) 26.9 21.7
MSDWild - 24.6
REPERE (phase2) 8.2 7.8
VoxConverse (v0.3) 11.2 11.3

Major breaking changes

  • BREAKING: pipelines now run on CPU by default
    Use pipeline.to(torch.device('cuda')) to use GPU
  • BREAKING: removed SpeakerSegmentation pipeline
    Use SpeakerDiarization pipeline instead
  • BREAKING: removed support for prodi.gy recipes

Full changelog

Features and improvements

  • feat(pipeline): send pipeline to device with pipeline.to(device)
  • feat(pipeline): add return_embeddings option to SpeakerDiarization pipeline
  • feat(pipeline): make segmentation_batch_size and embedding_batch_size mutable in SpeakerDiarization pipeline (they now default to 1)
  • feat(pipeline): add progress hook to pipelines
  • feat(task): add powerset support to SpeakerDiarization task
  • feat(task): add support for multi-task models
  • feat(task): add support for label scope in speaker diarization task
  • feat(task): add support for missing classes in multi-label segmentation task
  • feat(model): add segmentation model based on torchaudio self-supervised representation
  • feat(pipeline): check version compatibility at load time
  • improve(task): load metadata as tensors rather than pyannote.core instances
  • improve(task): improve error message on missing specifications

Breaking changes

  • BREAKING(task): rename Segmentation task to SpeakerDiarization
  • BREAKING(pipeline): pipeline defaults to CPU (use pipeline.to(device))
  • BREAKING(pipeline): remove SpeakerSegmentation pipeline (use SpeakerDiarization pipeline)
  • BREAKING(pipeline): remove segmentation_duration parameter from SpeakerDiarization pipeline (defaults to duration of segmentation model)
  • BREAKING(task): remove support for variable chunk duration for segmentation tasks
  • BREAKING(pipeline): remove support for FINCHClustering and HiddenMarkovModelClustering
  • BREAKING(setup): drop support for Python 3.7
  • BREAKING(io): channels are now 0-indexed (used to be 1-indexed)
  • BREAKING(io): multi-channel audio is no longer downmixed to mono by default.
    You should update how pyannote.audio.core.io.Audio is instantiated:
    • replace Audio() by Audio(mono="downmix");
    • replace Audio(mono=True) by Audio(mono="downmix");
    • replace Audio(mono=False) by Audio().
  • BREAKING(model): get rid of (flaky) Model.introspection
    If, for some weird reason, you wrote some custom code based on that,
    you should instead rely on Model.example_output.
  • BREAKING(interactive): remove support for Prodigy recipes

Fixes and improvements

  • fix(pipeline): fix reproducibility issue with Ampere CUDA devices
  • fix(pipeline): fix support for IOBase audio
  • fix(pipeline): fix corner case with no speaker
  • fix(train): prevent metadata preparation to happen twice
  • fix(task): fix support for "balance" option
  • improve(task): shorten and improve structure of Tensorboard tags

Dependencies update

  • setup: switch to torch 2.0+, torchaudio 2.0+, soundfile 0.12+, lightning 2.0+, torchmetrics 0.11+
  • setup: switch to pyannote.core 5.0+, pyannote.database 5.0+, and pyannote.pipeline 3.0+
  • setup: switch to speechbrain 0.5.14+

Version 2.1.1

31 Jan 13:50
Compare
Choose a tag to compare

Version 2.1.x introduces a major overhaul of pyannote.audio default speaker diarization pipeline, made of three main stages:

More details in the attached technical report.

Version 1.1.1

25 Nov 08:48
c5de4f2
Compare
Choose a tag to compare
chore: do not update to pyannote.pipeline >= 2.0