Skip to content

Commit

Permalink
Merge branch 'release/3.0.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
hbredin committed Sep 26, 2023
2 parents 7ead17e + 9a5a902 commit 795b92a
Show file tree
Hide file tree
Showing 110 changed files with 8,413 additions and 14,658 deletions.
20 changes: 20 additions & 0 deletions .faq/FAQ.md
@@ -0,0 +1,20 @@

# Frequently Asked Questions

{%- for question in questions %}
- [{{ question.title }}](#{{ question.slug }})
{%- endfor %}


{%- for question in questions %}

<a name="{{ question.slug }}"></a>
## {{ question.title }}

{{ question.body }}

{%- endfor %}

<hr>

Generated by [FAQtory](https://github.com/willmcgugan/faqtory)
34 changes: 34 additions & 0 deletions .faq/suggest.md
@@ -0,0 +1,34 @@
Thank you for your issue.

{%- if questions -%}
{% if questions|length == 1 %}
We found the following entry in the [FAQ]({{ faq_url }}) which you may find helpful:
{%- else %}
We found the following entries in the [FAQ]({{ faq_url }}) which you may find helpful:
{%- endif %}

{% for question in questions %}
- [{{ question.title }}]({{ faq_url }}#{{ question.slug }})
{%- endfor %}

{%- else -%}
You might want to check the [FAQ]({{ faq_url }}) if you haven't done so already.
{%- endif %}

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read [this](https://xyproblem.info/) first and update your request accordingly, if needed.

If your issue is a bug report, please provide a [minimum reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) as a link to a self-contained [Google Colab](https://colab.research.google.com/) notebook containing everthing needed to reproduce the bug:
- installation
- data preparation
- model download
- etc.

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

Companies relying on `pyannote.audio` in production may contact [me](https://herve.niderb.fr) via email regarding:
* paid scientific consulting around speaker diarization and speech processing in general;
* custom models and tailored features (via the local tech transfer office).

> This is an automated reply, generated by [FAQtory](https://github.com/willmcgugan/faqtory)
4 changes: 2 additions & 2 deletions .github/stale.yml
@@ -1,7 +1,7 @@
# Number of days of inactivity before an issue becomes stale
daysUntilStale: 60
daysUntilStale: 180
# Number of days of inactivity before a stale issue is closed
daysUntilClose: 7
daysUntilClose: 30
# Issues with these labels will never be considered stale
exemptLabels:
- pinned
Expand Down
29 changes: 29 additions & 0 deletions .github/workflows/new_issue.yml
@@ -0,0 +1,29 @@
name: issues
on:
issues:
types: [opened]
jobs:
add-comment:
runs-on: ubuntu-latest
permissions:
issues: write
steps:
- uses: actions/checkout@v3
with:
ref: develop
- name: Install FAQtory
run: pip install FAQtory
- name: Run Suggest
env:
TITLE: ${{ github.event.issue.title }}
run: faqtory suggest "$TITLE" > suggest.md
- name: Read suggest.md
id: suggest
uses: juliangruber/read-file-action@v1
with:
path: ./suggest.md
- name: Suggest FAQ
uses: peter-evans/create-or-update-comment@a35cf36e5301d70b76f316e867e7788a55a31dae
with:
issue-number: ${{ github.event.issue.number }}
body: ${{ steps.suggest.outputs.content }}
43 changes: 18 additions & 25 deletions .github/workflows/test.yml
Expand Up @@ -2,9 +2,9 @@ name: Tests

on:
push:
branches: [ develop ]
branches: [develop]
pull_request:
branches: [ develop ]
branches: [develop]

jobs:
build:
Expand All @@ -13,28 +13,21 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest]
python-version: [3.7, 3.8, 3.9]
python-version: [3.8, 3.9, "3.10"]
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install libsndfile
if: matrix.os == 'ubuntu-latest'
run: |
sudo apt-get install libsndfile1
- name: Install pyannote.audio
run: |
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install libsndfile
if: matrix.os == 'ubuntu-latest'
run: |
sudo apt-get update
sudo apt-get install libsndfile1
- name: Install pyannote.audio
run: |
pip install -e .[dev,testing]
- name: Test with pytest
run: |
export PYANNOTE_DATABASE_CONFIG=$GITHUB_WORKSPACE/tests/data/database.yml
pytest --cov-report=xml
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1
with:
file: ./coverage.xml
env_vars: PYTHON
name: codecov-pyannote-audio
fail_ci_if_error: false
- name: Test with pytest
run: |
pytest
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Expand Up @@ -20,7 +20,7 @@ repos:
args: ["--profile", "black"]

# Formatting, Whitespace, etc
- repo: git://github.com/pre-commit/pre-commit-hooks
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.2.3
hooks:
- id: trailing-whitespace
Expand Down
128 changes: 128 additions & 0 deletions CHANGELOG.md
@@ -0,0 +1,128 @@
# Changelog

## Version 3.0.0 (2023-09-26)

### Features and improvements

- feat(pipeline): send pipeline to device with `pipeline.to(device)`
- feat(pipeline): add `return_embeddings` option to `SpeakerDiarization` pipeline
- feat(pipeline): make `segmentation_batch_size` and `embedding_batch_size` mutable in `SpeakerDiarization` pipeline (they now default to `1`)
- feat(pipeline): add progress hook to pipelines
- feat(task): add [powerset](https://www.isca-speech.org/archive/interspeech_2023/plaquet23_interspeech.html) support to `SpeakerDiarization` task
- feat(task): add support for multi-task models
- feat(task): add support for label scope in speaker diarization task
- feat(task): add support for missing classes in multi-label segmentation task
- feat(model): add segmentation model based on torchaudio self-supervised representation
- feat(pipeline): check version compatibility at load time
- improve(task): load metadata as tensors rather than pyannote.core instances
- improve(task): improve error message on missing specifications

### Breaking changes

- BREAKING(task): rename `Segmentation` task to `SpeakerDiarization`
- BREAKING(pipeline): pipeline defaults to CPU (use `pipeline.to(device)`)
- BREAKING(pipeline): remove `SpeakerSegmentation` pipeline (use `SpeakerDiarization` pipeline)
- BREAKING(pipeline): remove `segmentation_duration` parameter from `SpeakerDiarization` pipeline (defaults to `duration` of segmentation model)
- BREAKING(task): remove support for variable chunk duration for segmentation tasks
- BREAKING(pipeline): remove support for `FINCHClustering` and `HiddenMarkovModelClustering`
- BREAKING(setup): drop support for Python 3.7
- BREAKING(io): channels are now 0-indexed (used to be 1-indexed)
- BREAKING(io): multi-channel audio is no longer downmixed to mono by default.
You should update how `pyannote.audio.core.io.Audio` is instantiated:
* replace `Audio()` by `Audio(mono="downmix")`;
* replace `Audio(mono=True)` by `Audio(mono="downmix")`;
* replace `Audio(mono=False)` by `Audio()`.
- BREAKING(model): get rid of (flaky) `Model.introspection`
If, for some weird reason, you wrote some custom code based on that,
you should instead rely on `Model.example_output`.
- BREAKING(interactive): remove support for Prodigy recipes


### Fixes and improvements

- fix(pipeline): fix reproducibility issue with Ampere CUDA devices
- fix(pipeline): fix support for IOBase audio
- fix(pipeline): fix corner case with no speaker
- fix(train): prevent metadata preparation to happen twice
- fix(task): fix support for "balance" option
- improve(task): shorten and improve structure of Tensorboard tags

### Dependencies update

- setup: switch to torch 2.0+, torchaudio 2.0+, soundfile 0.12+, lightning 2.0+, torchmetrics 0.11+
- setup: switch to pyannote.core 5.0+, pyannote.database 5.0+, and pyannote.pipeline 3.0+
- setup: switch to speechbrain 0.5.14+

## Version 2.1.1 (2022-10-27)

- BREAKING(pipeline): rewrite speaker diarization pipeline
- feat(pipeline): add option to optimize for DER variant
- feat(clustering): add support for NeMo speaker embedding
- feat(clustering): add FINCH clustering
- feat(clustering): add min_cluster_size hparams to AgglomerativeClustering
- feat(hub): add support for private/gated models
- setup(hub): switch to latest hugginface_hub API
- fix(pipeline): fix support for missing reference in Resegmentation pipeline
- fix(clustering) fix corner case where HMM.fit finds too little states

## Version 2.0.1 (2022-07-20)

- BREAKING: complete rewrite
- feat: much better performance
- feat: Python-first API
- feat: pretrained pipelines (and models) on Huggingface model hub
- feat: multi-GPU training with pytorch-lightning
- feat: data augmentation with torch-audiomentations
- feat: Prodigy recipe for model-assisted audio annotation

## Version 1.1.2 (2021-01-28)

- fix: make sure master branch is used to load pretrained models (#599)

## Version 1.1 (2020-11-08)

- last release before complete rewriting

## Version 1.0.1 (2018-07-19)

- fix: fix regression in Precomputed.__call__ (#110, #105)

## Version 1.0 (2018-07-03)

- chore: switch from keras to pytorch (with tensorboard support)
- improve: faster & better traning (`AutoLR`, advanced learning rate schedulers, improved batch generators)
- feat: add tunable speaker diarization pipeline (with its own tutorial)
- chore: drop support for Python 2 (use Python 3.6 or later)

## Version 0.3.1 (2017-07-06)

- feat: add python 3 support
- chore: rewrite neural speaker embedding using autograd
- feat: add new embedding architectures
- feat: add new embedding losses
- chore: switch to Keras 2
- doc: add tutorial for (MFCC) feature extraction
- doc: add tutorial for (LSTM-based) speech activity detection
- doc: add tutorial for (LSTM-based) speaker change detection
- doc: add tutorial for (TristouNet) neural speaker embedding

## Version 0.2.1 (2017-03-28)

- feat: add LSTM-based speech activity detection
- feat: add LSTM-based speaker change detection
- improve: refactor LSTM-based speaker embedding
- feat: add librosa basic support
- feat: add SMORMS3 optimizer

## Version 0.1.4 (2016-09-26)

- feat: add 'covariance_type' option to BIC segmentation

## Version 0.1.3 (2016-09-23)

- chore: rename sequence generator in preparation of the release of
TristouNet reproducible research package.

## Version 0.1.2 (2016-09-22)

- first public version
54 changes: 54 additions & 0 deletions FAQ.md
@@ -0,0 +1,54 @@

# Frequently Asked Questions
- [Can I apply pretrained pipelines on audio already loaded in memory?](#can-i-apply-pretrained-pipelines-on-audio-already-loaded-in-memory)
- [Can I use gated models (and pipelines) offline?](#can-i-use-gated-models-(and-pipelines)-offline)
- [Does pyannote support streaming speaker diarization?](#does-pyannote-support-streaming-speaker-diarization)
- [How can I improve performance?](#how-can-i-improve-performance)
- [How does one spell and pronounce pyannote.audio?](#how-does-one-spell-and-pronounce-pyannoteaudio)

<a name="can-i-apply-pretrained-pipelines-on-audio-already-loaded-in-memory"></a>
## Can I apply pretrained pipelines on audio already loaded in memory?

Yes: read [this tutorial](tutorials/applying_a_pipeline.ipynb) until the end.

<a name="can-i-use-gated-models-(and-pipelines)-offline"></a>
## Can I use gated models (and pipelines) offline?

**Short answer**: yes, see [this tutorial](tutorials/applying_a_model.ipynb) for models and [that one](tutorials/applying_a_pipeline.ipynb) for pipelines.

**Long answer**: gating models and pipelines allows [me](https://herve.niderb.fr) to know a bit more about `pyannote.audio` user base and eventually help me write grant proposals to make `pyannote.audio` even better. So, please fill gating forms as precisely as possible.

For instance, before gating `pyannote/speaker-diarization`, I had no idea that so many people were relying on it in production. Hint: sponsors are more than welcome! Maintaining open source libraries is time consuming.

That being said, this whole authentication process does not prevent you from using official `pyannote.audio` models offline (i.e. without going through the authentication process in every `docker run ...` or whatever you are using in production): see [this tutorial](tutorials/applying_a_model.ipynb) for models and [that one](tutorials/applying_a_pipeline.ipynb) for pipelines.

<a name="does-pyannote-support-streaming-speaker-diarization"></a>
## Does pyannote support streaming speaker diarization?

**Short answer:** not out of the box, no.

**Long answer:** [I](https://herve.niderb.fr) am looking for sponsors to add this feature. In the meantime, [`diart`](https://github.com/juanmc2005/StreamingSpeakerDiarization) is the closest you can get from a streaming `pyannote.audio`. You might also be interested in [this blog post](https://herve.niderb.fr/fastpages/2021/08/05/Streaming-voice-activity-detection-with-pyannote.html) about streaming voice activity detection based on `pyannote.audio`.

<a name="how-can-i-improve-performance"></a>
## How can I improve performance?

**Long answer:**

1. Manually annotate dozens of conversations as precisely as possible.
2. Separate them into train (80%), development (10%) and test (10%) subsets.
3. Setup the data for use with [`pyannote.database`](https://github.com/pyannote/pyannote-database#speaker-diarization).
4. Follow [this recipe](https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/adapting_pretrained_pipeline.ipynb).
5. Enjoy.

**Also:** [I am available](https://herve.niderb.fr) for contracting to help you with that.

<a name="how-does-one-spell-and-pronounce-pyannoteaudio"></a>
## How does one spell and pronounce pyannote.audio?

📝 Written in lower case: `pyannote.audio` (or `pyannote` if you are lazy). Not `PyAnnote` nor `PyAnnotate` (sic).
📢 Pronounced like the french verb `pianoter`. `pi` like in `pi`ano, not `py` like in `py`thon.
🎹 `pianoter` means to play the piano (hence the logo 🤯).

<hr>

Generated by [FAQtory](https://github.com/willmcgugan/faqtory)

0 comments on commit 795b92a

Please sign in to comment.