Skip to content

Commit

Permalink
Merge branch 'release/2.1'
Browse files Browse the repository at this point in the history
  • Loading branch information
hbredin committed Oct 27, 2022
2 parents 25462d5 + 6d9d98c commit 2cf1490
Show file tree
Hide file tree
Showing 23 changed files with 2,853 additions and 1,746 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,4 @@ jobs:
file: ./coverage.xml
env_vars: PYTHON
name: codecov-pyannote-audio
fail_ci_if_error: true
fail_ci_if_error: false
75 changes: 75 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Changelog

## Version 2.1 (2022-11-xx)

- BREAKING(pipeline): rewrite speaker diarization pipeline
- feat(pipeline): add option to optimize for DER variant
- feat(clustering): add support for NeMo speaker embedding
- feat(clustering): add FINCH clustering
- feat(clustering): add min_cluster_size hparams to AgglomerativeClustering
- feat(hub): add support for private/gated models
- setup(hub): switch to latest hugginface_hub API
- fix(pipeline): fix support for missing reference in Resegmentation pipeline
- fix(clustering) fix corner case where HMM.fit finds too little states

## Version 2.0.1 (2022-07-20)

- BREAKING: complete rewrite
- feat: much better performance
- feat: Python-first API
- feat: pretrained pipelines (and models) on Huggingface model hub
- feat: multi-GPU training with pytorch-lightning
- feat: data augmentation with torch-audiomentations
- feat: Prodigy recipe for model-assisted audio annotation

## Version 1.1.2 (2021-01-28)

- fix: make sure master branch is used to load pretrained models (#599)

## Version 1.1 (2020-11-08)

- last release before complete rewriting

## Version 1.0.1 (2018--07-19)

- fix: fix regression in Precomputed.__call__ (#110, #105)

## Version 1.0 (2018-07-03)

- chore: switch from keras to pytorch (with tensorboard support)
- improve: faster & better traning (`AutoLR`, advanced learning rate schedulers, improved batch generators)
- feat: add tunable speaker diarization pipeline (with its own tutorial)
- chore: drop support for Python 2 (use Python 3.6 or later)

## Version 0.3.1 (2017-07-06)

- feat: add python 3 support
- chore: rewrite neural speaker embedding using autograd
- feat: add new embedding architectures
- feat: add new embedding losses
- chore: switch to Keras 2
- doc: add tutorial for (MFCC) feature extraction
- doc: add tutorial for (LSTM-based) speech activity detection
- doc: add tutorial for (LSTM-based) speaker change detection
- doc: add tutorial for (TristouNet) neural speaker embedding

## Version 0.2.1 (2017-03-28)

- feat: add LSTM-based speech activity detection
- feat: add LSTM-based speaker change detection
- improve: refactor LSTM-based speaker embedding
- feat: add librosa basic support
- feat: add SMORMS3 optimizer

## Version 0.1.4 (2016-09-26)

- feat: add 'covariance_type' option to BIC segmentation

## Version 0.1.3 (2016-09-23)

- chore: rename sequence generator in preparation of the release of
TristouNet reproducible research package.

## Version 0.1.2 (2016-09-22)

- first public version
47 changes: 29 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,23 +11,26 @@


```python
# instantiate pretrained speaker diarization pipeline
# 1. visit hf.co/pyannote/speaker-diarization and accept user conditions (only if requested)
# 2. visit hf.co/settings/tokens to create an access token (only if you had to go through 1.)
# 3. instantiate pretrained speaker diarization pipeline
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization")
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization",
use_auth_token="ACCESS_TOKEN_GOES_HERE")

# apply pretrained pipeline
# 4. apply pretrained pipeline
diarization = pipeline("audio.wav")

# print the result
# 5. print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_A
# start=1.8s stop=3.9s speaker_B
# start=4.2s stop=5.7s speaker_A
# start=0.2s stop=1.5s speaker_0
# start=1.8s stop=3.9s speaker_1
# start=4.2s stop=5.7s speaker_0
# ...
```

## What's new in `pyannote.audio` 2.0
## What's new in `pyannote.audio` 2.x?

For version 2.x of `pyannote.audio`, [I](https://herve.niderb.fr) decided to rewrite almost everything from scratch.
Highlights of this release are:
Expand All @@ -51,11 +54,12 @@ conda activate pyannote
# (see https://pytorch.org/get-started/previous-versions/#v1110)
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 -c pytorch

pip install pyannote.audio
pip install -qq https://github.com/pyannote/pyannote-audio/archive/develop.zip
```

## Documentation

- [Changelog](CHANGELOG.md)
- Models
- Available tasks explained
- [Applying a pretrained model](tutorials/applying_a_model.ipynb)
Expand All @@ -69,6 +73,9 @@ pip install pyannote.audio
- [Adding a new task](tutorials/add_your_own_task.ipynb)
- Adding a new pipeline
- Sharing pretrained models and pipelines
- Blog
- 2022-10-23 > ["One speaker segmentation model to rule them all"](https://herve.niderb.fr/fastpages/2022/10/23/One-speaker-segmentation-model-to-rule-them-all)
- 2021-08-05 > ["Streaming voice activity detection with pyannote.audio"](https://herve.niderb.fr/fastpages/2021/08/05/Streaming-voice-activity-detection-with-pyannote.html)
- Miscellaneous
- [Training with `pyannote-audio-train` command line tool](tutorials/training_with_cli.md)
- [Annotating your own data with Prodigy](tutorials/prodigy.md)
Expand All @@ -94,15 +101,19 @@ pip install pyannote.audio

## Benchmark

Out of the box, `pyannote.audio` default speaker diarization pipeline is expected to be much better (and faster) in v2.0 than in v1.1.:

| Dataset | DER% with v1.1 | DER% with v2.0 | Relative improvement |
| ----------- | -------------- | -------------- | -------------------- |
| AMI | 29.7% | 18.2% | 38% |
| DIHARD | 29.2% | 21.0% | 28% |
| VoxConverse | 21.5% | 12.8% | 40% |

A more detailed benchmark is available [here](https://hf.co/pyannote/speaker-diarization).
Out of the box, `pyannote.audio` default speaker diarization [pipeline](https://hf.co/pyannote/speaker-diarization) is expected to be much better (and faster) in v2.x than in v1.1. Those numbers are diarization error rates (in %)

| Dataset \ Version | v1.1 | v2.0 | v2.1 (finetuned) |
| ---------------------- | ---- | ---- | ---------------- |
| AISHELL-4 | - | 14.6 | 14.1 (14.5) |
| AliMeeting (channel 1) | - | - | 27.4 (23.8) |
| AMI (IHM) | 29.7 | 18.2 | 18.9 (18.5) |
| AMI (SDM) | - | 29.0 | 27.1 (22.2) |
| CALLHOME (part2) | - | 30.2 | 32.4 (29.3) |
| DIHARD 3 (full) | 29.2 | 21.0 | 26.9 (21.9) |
| VoxConverse (v0.3) | 21.5 | 12.6 | 11.2 (10.7) |
| REPERE (phase2) | - | 12.6 | 8.2 ( 8.3) |
| This American Life | - | - | 20.8 (15.2) |

## Citations

Expand Down
15 changes: 14 additions & 1 deletion doc/source/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,22 @@
Changelog
#########

Version 2.0.1 (2022-07-20)
Version 2.1 (2022-11-xx)
~~~~~~~~~~~~~~~~~~~~~~~~

- BREAKING(pipeline): rewrite speaker diarization pipeline
- feat(pipeline): add option to optimize for DER variant
- feat(clustering): add support for NeMo speaker embedding
- feat(clustering): add FINCH clustering
- feat(clustering): add min_cluster_size hparams to AgglomerativeClustering
- feat(hub): add support for private/gated models
- setup(hub): switch to latest hugginface_hub API
- fix(pipeline): fix support for missing reference in Resegmentation pipeline
- fix(clustering) fix corner case where HMM.fit finds too little states

Version 2.0.1 (2022-07-20)
~~~~~~~~~~~~~~~~~~~~~~~~~~

- BREAKING: complete rewrite
- feat: much better performance
- feat: Python-first API
Expand Down
5 changes: 5 additions & 0 deletions pyannote/audio/cli/train_config/optimizer/Adan.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# @package _group_
_target_: adan_pytorch.Adan
lr: 1e-3
betas: [0.1, 0.1, 0.001]
weight_decay: 0.0
67 changes: 51 additions & 16 deletions pyannote/audio/core/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@
import torch
import torch.nn as nn
import torch.optim
from huggingface_hub import cached_download, hf_hub_url
from huggingface_hub import hf_hub_download
from huggingface_hub.utils import RepositoryNotFoundError
from pyannote.core import SlidingWindow
from pytorch_lightning.utilities.cloud_io import load as pl_load
from pytorch_lightning.utilities.model_summary import ModelSummary
Expand Down Expand Up @@ -415,6 +416,10 @@ def on_save_checkpoint(self, checkpoint):

@staticmethod
def check_version(library: Text, theirs: Text, mine: Text):

theirs = ".".join(theirs.split(".")[:3])
mine = ".".join(mine.split(".")[:3])

theirs = VersionInfo.parse(theirs)
mine = VersionInfo.parse(mine)
if theirs.major != mine.major:
Expand Down Expand Up @@ -777,32 +782,62 @@ def from_pretrained(
model_id = checkpoint
revision = None

url = hf_hub_url(
model_id, filename=HF_PYTORCH_WEIGHTS_NAME, revision=revision
)
path_for_pl = cached_download(
url=url,
library_name="pyannote",
library_version=__version__,
cache_dir=cache_dir,
use_auth_token=use_auth_token,
)
try:
path_for_pl = hf_hub_download(
model_id,
HF_PYTORCH_WEIGHTS_NAME,
repo_type="model",
revision=revision,
library_name="pyannote",
library_version=__version__,
cache_dir=cache_dir,
# force_download=False,
# proxies=None,
# etag_timeout=10,
# resume_download=False,
use_auth_token=use_auth_token,
# local_files_only=False,
# legacy_cache_layout=False,
)
except RepositoryNotFoundError:
print(
f"""
Could not download '{model_id}' model.
It might be because the model is private or gated so make
sure to authenticate. Visit https://hf.co/settings/tokens to
create your access token and retry with:
>>> Model.from_pretrained('{model_id}',
... use_auth_token=YOUR_AUTH_TOKEN)
If this still does not work, it might be because the model is gated:
visit https://hf.co/{model_id} to accept the user conditions."""
)
return None

# HACK Huggingface download counters rely on config.yaml
# HACK Therefore we download config.yaml even though we
# HACK do not use it. Fails silently in case model does not
# HACK have a config.yaml file.
try:
config_url = hf_hub_url(
model_id, filename=HF_LIGHTNING_CONFIG_NAME, revision=revision
)
_ = cached_download(
url=config_url,

_ = hf_hub_download(
model_id,
HF_LIGHTNING_CONFIG_NAME,
repo_type="model",
revision=revision,
library_name="pyannote",
library_version=__version__,
cache_dir=cache_dir,
# force_download=False,
# proxies=None,
# etag_timeout=10,
# resume_download=False,
use_auth_token=use_auth_token,
# local_files_only=False,
# legacy_cache_layout=False,
)

except Exception:
pass

Expand Down
56 changes: 42 additions & 14 deletions pyannote/audio/core/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,15 @@
from typing import Callable, List, Optional, Text, Union

import yaml
from huggingface_hub import cached_download, hf_hub_url
from huggingface_hub import hf_hub_download
from huggingface_hub.utils import RepositoryNotFoundError
from pyannote.core.utils.helper import get_class_by_name
from pyannote.database import FileFinder, ProtocolFile
from pyannote.pipeline import Pipeline as _Pipeline

from pyannote.audio import Audio, __version__
from pyannote.audio.core.io import AudioFile
from pyannote.audio.core.model import CACHE_DIR
from pyannote.core.utils.helper import get_class_by_name
from pyannote.database import FileFinder, ProtocolFile
from pyannote.pipeline import Pipeline as _Pipeline

PIPELINE_PARAMS_NAME = "config.yaml"

Expand Down Expand Up @@ -77,15 +78,40 @@ def from_pretrained(
else:
model_id = checkpoint_path
revision = None
url = hf_hub_url(model_id, filename=PIPELINE_PARAMS_NAME, revision=revision)

config_yml = cached_download(
url=url,
library_name="pyannote",
library_version=__version__,
cache_dir=cache_dir,
use_auth_token=use_auth_token,
)

try:
config_yml = hf_hub_download(
model_id,
PIPELINE_PARAMS_NAME,
repo_type="model",
revision=revision,
library_name="pyannote",
library_version=__version__,
cache_dir=cache_dir,
# force_download=False,
# proxies=None,
# etag_timeout=10,
# resume_download=False,
use_auth_token=use_auth_token,
# local_files_only=False,
# legacy_cache_layout=False,
)

except RepositoryNotFoundError:
print(
f"""
Could not download '{model_id}' pipeline.
It might be because the pipeline is private or gated so make
sure to authenticate. Visit https://hf.co/settings/tokens to
create your access token and retry with:
>>> Pipeline.from_pretrained('{model_id}',
... use_auth_token=YOUR_AUTH_TOKEN)
If this still does not work, it might be because the pipeline is gated:
visit https://hf.co/{model_id} to accept the user conditions."""
)
return None

with open(config_yml, "r") as fp:
config = yaml.load(fp, Loader=yaml.SafeLoader)
Expand All @@ -95,7 +121,9 @@ def from_pretrained(
Klass = get_class_by_name(
pipeline_name, default_module_name="pyannote.pipeline.blocks"
)
pipeline = Klass(**config["pipeline"].get("params", {}))
params = config["pipeline"].get("params", {})
params.setdefault("use_auth_token", use_auth_token)
pipeline = Klass(**params)

# freeze parameters
if "freeze" in config:
Expand Down
2 changes: 1 addition & 1 deletion pyannote/audio/interactive/pipeline/recipe.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ def pipeline(
beep: bool = False,
) -> Dict[str, Any]:

pipeline = Pipeline.from_pretrained(pipeline)
pipeline = Pipeline.from_pretrained(pipeline, use_auth_token=True)
classes = pipeline.classes()

if isinstance(classes, Iterator):
Expand Down

0 comments on commit 2cf1490

Please sign in to comment.