Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the PixIT task, the ToTaToNet model, and a pipeline for joint speaker diarization/speech separation inference #1676

Open
wants to merge 88 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
8365802
Merge pull request #5 from pyannote/develop
joonaskalda Mar 18, 2024
ff2c705
add convnet layers to PyanNet
joonaskalda Mar 10, 2023
28c9057
add stft and free encoders/decoder
joonaskalda Mar 15, 2023
eca0a31
multitask learning first attempt
joonaskalda Mar 15, 2023
1d2d1ab
properly logging mixit loss in train/valid
joonaskalda Mar 15, 2023
4e175b6
add a weight to mixit_loss
joonaskalda Mar 16, 2023
7894bf7
reformulate multitask loss
joonaskalda Mar 27, 2023
9eeb29c
add dprnn
joonaskalda Apr 16, 2023
2fb16e9
pair mixtures from same file with no overlapping speakers
joonaskalda May 12, 2023
1a83fec
fix mixit loss for odd batch size in validation
joonaskalda May 12, 2023
3ebc46a
make the MoM part of the original batch
joonaskalda May 12, 2023
055744f
clean up
joonaskalda May 12, 2023
4af6be1
check that BS is divisible by 3
joonaskalda May 14, 2023
4aa83b3
don't use MoMs with more than 3 speakers
joonaskalda May 14, 2023
ef14f8f
include original mixtures in separation branch training
joonaskalda May 22, 2023
b218473
matching the order of dimensions of branch outputs
joonaskalda May 24, 2023
ebe3471
make n_sources an argument for model constructor
joonaskalda May 25, 2023
df6794b
changing LSTM default num_layers to 4
joonaskalda May 25, 2023
1fe6106
create separate tasks and models
joonaskalda Jun 9, 2023
b3a7821
Changing n_sources to 3
joonaskalda Jun 12, 2023
0c790d5
forcing alignment between separation and diarization
joonaskalda Jun 16, 2023
70adb7e
fixing edge case of 4 speakers in a second chunk
joonaskalda Jun 17, 2023
fcde9b8
adding a VAD-like forced alignment loss
joonaskalda Jun 18, 2023
699a2fb
refactor: remove vad_loss and warm_up, assume powerset everywhere
joonaskalda Jun 20, 2023
065cde0
remove double check of num_speakers
joonaskalda Jun 20, 2023
3ff0ba5
refactor: moved mom constrcution
joonaskalda Jun 21, 2023
8bb63d2
remove unused mixit wrapper
joonaskalda Jun 21, 2023
5bd913d
format with black
joonaskalda Jun 21, 2023
40c11db
fix for last batch in validation having size 1
joonaskalda Jun 21, 2023
3ee1484
adding documentation
joonaskalda Jun 21, 2023
71541c2
diarization on sources separately and back to multilabel
joonaskalda Jun 26, 2023
8c66d74
make lstm use optional
joonaskalda Jun 26, 2023
dcab13d
make alignment forcing optional
joonaskalda Jun 26, 2023
954e0f8
bug fix
joonaskalda Jun 27, 2023
7dee18c
rename mixit_loss to separation_loss for clarity
joonaskalda Jul 2, 2023
f792135
add 2 sources for noise and alignement accuracy measure
joonaskalda Jul 25, 2023
9f2cd5b
bug regarding specifications being a tuple
joonaskalda Jul 25, 2023
ded4b5e
clean up
joonaskalda Aug 28, 2023
90c9b3a
add avg pooling to diarization branch for smaller kernel sizes
joonaskalda Sep 7, 2023
f97d440
fix validation loss
joonaskalda Sep 9, 2023
2654528
changing to pit_loss
joonaskalda Sep 14, 2023
1ed87e9
changing validation dataloader
joonaskalda Sep 14, 2023
b97dd16
make the additional 2 noise sources optional
joonaskalda Sep 14, 2023
b2baf1d
make aligned training the only supported behavior
joonaskalda Sep 15, 2023
f84c683
clean up
joonaskalda Sep 20, 2023
23f8b7b
change default model parameters
joonaskalda Sep 20, 2023
7ccbee1
3 source mixit
joonaskalda Sep 21, 2023
6917aa0
first commit
joonaskalda Sep 22, 2023
407e1fb
diar branch from masked tf rep instead
joonaskalda Sep 23, 2023
66146fb
add lstm
joonaskalda Sep 23, 2023
adcc594
add lstm back in
joonaskalda Sep 24, 2023
053324e
fix forward
joonaskalda Sep 25, 2023
c1519f4
adding training on single speaker sources
joonaskalda Sep 25, 2023
680e181
check edge case
joonaskalda Sep 25, 2023
031cb9d
clean up and format
joonaskalda Oct 18, 2023
4d20d4a
first commit
joonaskalda Oct 21, 2023
21d7f40
enable finetuning wavlm with separate lr (pytorch-lightning 2.1)
joonaskalda Nov 22, 2023
8081bef
add gradient clipping
joonaskalda Nov 23, 2023
5104521
clean up
joonaskalda Mar 18, 2024
a8c8caa
rename SepDiarNet and include receptive field
joonaskalda Mar 21, 2024
0a9597b
fix rebase mistake in PyanNet
joonaskalda Mar 21, 2024
ec36794
fix rebase mistake in segmentation mixins
joonaskalda Mar 21, 2024
7cfdc75
fix joint task setup
joonaskalda Mar 21, 2024
4b37532
fix joint task init
joonaskalda Mar 21, 2024
c963934
fix data iteration and add docstrings
joonaskalda Mar 21, 2024
658308a
fix docstrings
joonaskalda Mar 21, 2024
b3ba2d4
remove functionality for additional noise sources
joonaskalda Mar 21, 2024
e451a2b
remove functionality for using original sources for separation (when …
joonaskalda Mar 21, 2024
386d750
fixc rebase mistake in speaker diarization task
joonaskalda Mar 21, 2024
60210d9
rename joint task to PixIT
joonaskalda Mar 21, 2024
047f741
make wavlm finetuning optional
joonaskalda Mar 21, 2024
110355e
clean up ToTaToNet
joonaskalda Mar 21, 2024
e2cdae4
add joint diarization separation pipeline
joonaskalda Mar 25, 2024
4df4950
fix docstrings and imports
joonaskalda Mar 25, 2024
c848143
fix ToTaToNet behavior when WavLM not used
joonaskalda Mar 25, 2024
ffe583a
update requirements.txt
joonaskalda Mar 25, 2024
9d79069
Merge branch 'develop' into pixit
hbredin Apr 5, 2024
75c3c8e
doc: update changelog
hbredin Apr 5, 2024
4112ef8
Merge branch 'develop' into pixit
hbredin Apr 18, 2024
776349b
Merge branch 'develop' into pixit
hbredin May 24, 2024
e133515
chore: reorganize things a bit
hbredin May 24, 2024
6e56060
fix: fix import
hbredin May 24, 2024
6adcd2d
fix: fix docstring
hbredin May 24, 2024
8be568c
fix: fix docstrings and default values
joonaskalda May 27, 2024
e8a4db0
fix: rename speaker separation to speech separation
joonaskalda May 27, 2024
8f97c6a
fix: renaming and reorganizing separation pipeline hyperparameters
joonaskalda May 28, 2024
284fb83
doc: add hyper-parameter documntation
hbredin May 28, 2024
089504f
feat(setup): make separation dependencies optional
hbredin May 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,18 @@

## develop

### TL;DR

`pyannote.audio` does [speech separation](https://hf.co/pyannote/speech-separation-ami-1.0): multi-speaker audio in, one audio channel per speaker out!

```bash
pip install pyannote.audio[separation]==3.3.0
```

### New features

- feat(model): add `ToTaToNet` joint speaker diarization and speech separation model (with [@joonaskalda](https://github.com/joonaskalda/))
- feat(pipeline): add `SpeechSeparation` pipeline (with [@joonaskalda](https://github.com/joonaskalda/))
- feat(io): add option to select torchaudio `backend`

### Fixes
Expand Down