Skip to content

Commit

Permalink
Merge branch 'develop' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
hbredin committed Apr 10, 2024
2 parents b284865 + 1a397b0 commit 0e510e0
Show file tree
Hide file tree
Showing 3 changed files with 182 additions and 9 deletions.
17 changes: 8 additions & 9 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,24 @@

### New features

- feat(task): add option to cache task training metadata to speed up training
- feat(task): add option to cache task training metadata to speed up training (with [@clement-pages](https://github.com/clement-pages/))
- feat(model): add `receptive_field`, `num_frames` and `dimension` to models (with [@Bilal-Rahou](https://github.com/Bilal-Rahou))
- feat(util): add `Powerset.permutation_mapping` to help with permutation in powerset space (with [@FrenchKrab](https://github.com/FrenchKrab))
- feat(sample): add sample file at `pyannote.audio.sample.SAMPLE_FILE`
- feat(metric): add `reduce` option to `diarization_error_rate` metric (with [@Bilal-Rahou](https://github.com/Bilal-Rahou))
- feat(pipeline): add `Waveform` and `SampleRate` preprocessors
- feat(model): add `num_frames` method to every model
- feat(model): add `receptive_field` property to every model
- feat(model): and `dimension` property to every model
- feat(sample): add sample file at `pyannote.audio.sample.SAMPLE_FILE`
- feat(powerset): add `Powerset.permutation_mapping` to help with permutation in powerset space
- feat(metric): add `reduce` option to `diarization_error_rate` metric

### Fixes

- fix(task): fix random generators and their reproducibility
- fix(task): fix estimation of training set size
- fix(task): fix random generators and their reproducibility (with [@FrenchKrab](https://github.com/FrenchKrab))
- fix(task): fix estimation of training set size (with [@FrenchKrab](https://github.com/FrenchKrab))

### Improvements

- improve(metric): add support for number of speakers mismatch in `diarization_error_rate` metric
- improve(pipeline): track both `Model` and `nn.Module` attributes in `Pipeline.to(device)`
- improve(io): switch to `torchaudio >= 2.2.0`
- improve(doc): update tutorials (with [@clement-pages](https://github.com/clement-pages/))

## Breaking changes

Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ for turn, _, speaker in diarization.itertracks(yield_label=True):
- [Introduction to speaker diarization](https://umotion.univ-lemans.fr/video/9513-speech-segmentation-and-speaker-diarization/) / JSALT 2023 summer school / 90 min
- [Speaker segmentation model](https://www.youtube.com/watch?v=wDH2rvkjymY) / Interspeech 2021 / 3 min
- [First release of pyannote.audio](https://www.youtube.com/watch?v=37R_R82lfwA) / ICASSP 2020 / 8 min
- Community contributions (not maintained by the core team)
- 2024-04-05 > [Offline speaker diarization (speaker-diarization-3.1)](tutorials/community/offline_usage_speaker_diarization.ipynb) by [Simon Ottenhaus](https://github.com/simonottenhauskenbun)

## Benchmark

Expand Down
172 changes: 172 additions & 0 deletions tutorials/community/offline_usage_speaker_diarization.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Offline Speaker Diarization (speaker-diarization-3.1)\n",
"\n",
"This notebooks gives a short introduction how to use the [speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) pipeline with local models.\n",
"\n",
"In order to use local models, you first need to download them from huggingface and place them in a local folder. \n",
"Then you need to create a local config file, similar to the one in HF, but with local model paths.\n",
"\n",
"❗ **Naming of the model files is REALLY important! See end of notebook for details.** ❗\n",
"\n",
"## Get the models\n",
"\n",
"1. Install the `pyannote-audio` package: `!pip install pyannote.audio`\n",
"2. Create a huggingface account https://huggingface.co/join\n",
"3. Accept [pyannote/segmentation-3.0](https://hf.co/pyannote/segmentation-3.0) user conditions\n",
"4. Create a local folder `models`, place all downloaded files there\n",
" 1. [wespeaker-voxceleb-resnet34-LM](https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM/blob/main/pytorch_model.bin), to be placed in `models/pyannote_model_wespeaker-voxceleb-resnet34-LM.bin`\n",
" 2. [segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0/blob/main/pytorch_model.bin), to be placed in `models/pyannote_model_segmentation-3.0.bin`\n",
"\n",
"Running `ls models` should show the following files:\n",
"```\n",
"pyannote_model_segmentation-3.0.bin (5.7M)\n",
"pyannote_model_wespeaker-voxceleb-resnet34-LM.bin (26MB)\n",
"```\n",
"\n",
"❗ **make sure the 'wespeaker-voxceleb-resnet34-LM' model is named 'pyannote_model_wespeaker-voxceleb-resnet34-LM.bin'** ❗"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Config for local models\n",
"\n",
"Create a local config, similar to the one in HF: [speaker-diarization-3.1/blob/main/config.yaml](https://huggingface.co/pyannote/speaker-diarization-3.1/blob/main/config.yaml), but with local model paths\n",
"\n",
"Contents of `models/pyannote_diarization_config.yaml`:\n",
"\n",
"```yaml\n",
"version: 3.1.0\n",
"\n",
"pipeline:\n",
" name: pyannote.audio.pipelines.SpeakerDiarization\n",
" params:\n",
" clustering: AgglomerativeClustering\n",
" # embedding: pyannote/wespeaker-voxceleb-resnet34-LM # if you want to use the HF model\n",
" embedding: models/pyannote_model_wespeaker-voxceleb-resnet34-LM.bin # if you want to use the local model\n",
" embedding_batch_size: 32\n",
" embedding_exclude_overlap: true\n",
" # segmentation: pyannote/segmentation-3.0 # if you want to use the HF model\n",
" segmentation: models/pyannote_model_segmentation-3.0.bin # if you want to use the local model\n",
" segmentation_batch_size: 32\n",
"\n",
"params:\n",
" clustering:\n",
" method: centroid\n",
" min_cluster_size: 12\n",
" threshold: 0.7045654963945799\n",
" segmentation:\n",
" min_duration_off: 0.0\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loading the local pipeline\n",
"\n",
"**Hint**: The paths in the config are relative to the current working directory, not relative to the config file.\n",
"If you want to start your notebook/script from a different directory, you can use `os.chdir` temporarily, to 'emulate' config-relative paths.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pathlib import Path\n",
"from pyannote.audio import Pipeline\n",
"\n",
"def load_pipeline_from_pretrained(path_to_config: str | Path) -> Pipeline:\n",
" path_to_config = Path(path_to_config)\n",
"\n",
" print(f\"Loading pyannote pipeline from {path_to_config}...\")\n",
" # the paths in the config are relative to the current working directory\n",
" # so we need to change the working directory to the model path\n",
" # and then change it back\n",
"\n",
" cwd = Path.cwd().resolve() # store current working directory\n",
"\n",
" # first .parent is the folder of the config, second .parent is the folder containing the 'models' folder\n",
" cd_to = path_to_config.parent.parent.resolve()\n",
"\n",
" print(f\"Changing working directory to {cd_to}\")\n",
" os.chdir(cd_to)\n",
"\n",
" pipeline = Pipeline.from_pretrained(path_to_config)\n",
"\n",
" print(f\"Changing working directory back to {cwd}\")\n",
" os.chdir(cwd)\n",
"\n",
" return pipeline\n",
"\n",
"PATH_TO_CONFIG = \"path/to/your/pyannote_diarization_config.yaml\"\n",
"pipeline = load_pipeline_from_pretrained(PATH_TO_CONFIG)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Notes on file naming (pyannote-audio 3.1.1)\n",
"\n",
"Pyannote uses some internal logic to determine the model type.\n",
"\n",
"The funtion `def PretrainedSpeakerEmbedding(...` in (speaker_verification.py)[https://github.com/pyannote/pyannote-audio/blob/develop/pyannote/audio/pipelines/speaker_verification.py#L712] uses the the file path of the model to infer the model type.\n",
"\n",
"```python\n",
"def PretrainedSpeakerEmbedding(\n",
" embedding: PipelineModel,\n",
" device: torch.device = None,\n",
" use_auth_token: Union[Text, None] = None,\n",
"):\n",
" #...\n",
" if isinstance(embedding, str) and \"pyannote\" in embedding:\n",
" return PyannoteAudioPretrainedSpeakerEmbedding(\n",
" embedding, device=device, use_auth_token=use_auth_token\n",
" )\n",
"\n",
" elif isinstance(embedding, str) and \"speechbrain\" in embedding:\n",
" return SpeechBrainPretrainedSpeakerEmbedding(\n",
" embedding, device=device, use_auth_token=use_auth_token\n",
" )\n",
"\n",
" elif isinstance(embedding, str) and \"nvidia\" in embedding:\n",
" return NeMoPretrainedSpeakerEmbedding(embedding, device=device)\n",
"\n",
" elif isinstance(embedding, str) and \"wespeaker\" in embedding:\n",
" return ONNXWeSpeakerPretrainedSpeakerEmbedding(embedding, device=device) # <-- this is called, but the wespeaker-voxceleb-resnet34-LM is not an ONNX model\n",
"\n",
" else:\n",
" # fallback to pyannote in case we are loading a local model\n",
" return PyannoteAudioPretrainedSpeakerEmbedding(\n",
" embedding, device=device, use_auth_token=use_auth_token\n",
" )\n",
"```\n",
"\n",
"The [wespeaker-voxceleb-resnet34-LM](https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM/blob/main/pytorch_model.bin) model is not an ONNX model, but a `PyannoteAudioPretrainedSpeakerEmbedding`. So if `wespeaker` is in the file name, the code will infer the model type incorrectly. If `pyannote` is somewhere in the file name, the model type will be inferred correctly, as the first if statement will be true..."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

0 comments on commit 0e510e0

Please sign in to comment.