Add F1 score, precision, and recall metrics as MultilabelSegmentation default metrics #1336

FrenchKrab · 2023-04-20T08:00:00Z

Currently the MultilabelSegmentation task has no default metric, this PR adds these 3 (torchmetrics) metrics as default.

codecov · 2023-04-20T08:05:55Z

Codecov Report

Patch coverage: 48.64% and project coverage change: +0.15 🎉

Comparison is base (bbe1395) 32.98% compared to head (3107105) 33.13%.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1336      +/-   ##
===========================================
+ Coverage    32.98%   33.13%   +0.15%     
===========================================
  Files           64       65       +1     
  Lines         4072     4134      +62     
===========================================
+ Hits          1343     1370      +27     
- Misses        2729     2764      +35

Impacted Files	Coverage Δ
pyannote/audio/cli/train.py	`0.00% <ø> (ø)`
pyannote/audio/pipelines/speaker_diarization.py	`0.00% <0.00%> (ø)`
pyannote/audio/pipelines/utils/oracle.py	`0.00% <0.00%> (ø)`
pyannote/audio/tasks/embedding/mixins.py	`26.51% <0.00%> (-0.41%)`	⬇️
.../tasks/segmentation/overlapped_speech_detection.py	`42.50% <0.00%> (ø)`
...dio/tasks/segmentation/voice_activity_detection.py	`40.54% <0.00%> (ø)`
pyannote/audio/utils/preview.py	`0.00% <0.00%> (ø)`
pyannote/audio/core/pipeline.py	`21.85% <14.28%> (-0.22%)`	⬇️
...te/audio/tasks/segmentation/speaker_diarization.py	`47.68% <20.00%> (+0.01%)`	⬆️
pyannote/audio/core/task.py	`81.48% <50.00%> (-0.40%)`	⬇️
... and 7 more

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

pyannote/audio/tasks/segmentation/multilabel.py

FrenchKrab · 2023-04-20T09:20:37Z

Before merging I think i need to reshape what's passed to the metric in the validation, so that it's compatible with more metrics (currently i think flat tensors are passed).

hbredin · 2023-04-20T13:29:10Z

pyannote/audio/tasks/segmentation/multilabel.py

@@ -251,10 +254,25 @@ def validation_step(self, batch, batch_idx: int):

        # mask (frame, class) index for which label is missing
        mask: torch.Tensor = y_true != -1
-        y_pred = y_pred[mask]
-        y_true = y_true[mask]
+        y_pred = y_pred[mask].reshape(shape)


This will break as soon as mask contains at least one -1 because the overall size of y_pred will then be smaller than shape.

Still support "global" metrics, BUT they have to be of a binary type.

FrenchKrab · 2023-04-20T15:14:22Z

Writing down some things before i forget them:

I added "per class" metrics, which probably need a better name since it's the metrics computed for each individual class (but the same metrics are used for each class)
I kept the "global" metric (computed for all outputs on all classes), BUT because we support missing targets, and torchmetrics does not (to my knowledge), without any assumptions, we can only treat the multilabel as a binary problem and are limited to these metrics.
We can still let the user set the global metric as "multilabel", but they have to make sure there arent missing targets.

(we can discuss this tomorrow !)

This reverts commit a890ef6.

(uses ignore_index for that, which ignores all targets of that index for computations, in pyannote's case : -1)

…annote-audio into multilabel_default_metrics

hbredin · 2023-05-10T20:36:16Z

Is this ready for review?

This reverts commit 5f51b50.

FrenchKrab · 2023-05-11T07:18:28Z

It should be now ! (although it changes more than anticipated) (sorry for the last two commits, should've reread my whole code before committing a "fix")

Multilabel metrics use ignore_index=-1 to support having targets of different labels ignored in the metrics (partially annotated data).
Classwise (binary) metrics do not need it since we can just filter out the missing data.

Maybe we should enforce ignore_index==-1 in the multilabel metrics / raise an exception, it's easy to miss and the metric would either still work (but give wrong values) or crash (and the user probably wont understand why).

hbredin · 2023-05-15T11:57:44Z

As discussed right now, would be nice to try average="none" instead of those pesky classwise metrics :)

…e.to Fixes 1397

BREAKING(model): get rid of (flaky) `Model.introspection`

…` pipeline Co-authored-by: Hervé BREDIN <hbredin@users.noreply.github.com>

hbredin · 2023-09-15T15:24:52Z

Going over your PRs :) Is this mergeable?

FrenchKrab · 2023-09-18T07:51:11Z

I should test it before, but i'm done with the implementation (i dont know if it's ok for you though :) ).

In the end I didn't find how to do away with "metric_classwise", there's a torchmetric classwise wrapper but it doesn't do what we want.
Torchmetrics's wrapper is for metrics that return a multidim tensor, but for example the multilabel variant of Calibration Error doesn't exist, while the "binary" version exists for probably every metric.

stale · 2024-03-19T00:26:42Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

FrenchKrab added 2 commits April 20, 2023 09:54

add f1,precision,recall metrics for multilabel task

e4680c2

black formatting

d771729

hbredin reviewed Apr 20, 2023

View reviewed changes

pyannote/audio/tasks/segmentation/multilabel.py Outdated Show resolved Hide resolved

simplify MultilabelSegmentation's default_metric logic

458bebe

FrenchKrab marked this pull request as draft April 20, 2023 09:22

FrenchKrab added 2 commits April 20, 2023 13:58

better shape for tensors passed to MultilabelSegmentation metrics

cd47969

use macro avg for default MultilabelSegmentation metrics

2d97b0b

FrenchKrab marked this pull request as ready for review April 20, 2023 11:59

black format

53e676d

hbredin reviewed Apr 20, 2023

View reviewed changes

add support for "per class" metric for MultilabelSegmentation

999252e

Still support "global" metrics, BUT they have to be of a binary type.

FrenchKrab marked this pull request as draft April 20, 2023 15:08

FrenchKrab added 8 commits April 21, 2023 10:19

fix logic in multilabel setup_validation_metric

e698bb3

rename "_per_metric" -> "_classwise"

b7cf6f5

add Loggable and LoggableHistogram classes

a890ef6

Revert "add Loggable and LoggableHistogram classes" (oops, wrong branch)

7ac8209

This reverts commit a890ef6.

fix wrong default_metric return value

68a4d47

make MultilabelSegmentation global metric actually be multilabel

7e06f4b

(uses ignore_index for that, which ignores all targets of that index for computations, in pyannote's case : -1)

update comments

e008ad6

Merge branch 'develop' into multilabel_default_metrics

1c1674b

FrenchKrab marked this pull request as ready for review May 2, 2023 12:02

FrenchKrab and others added 3 commits May 2, 2023 14:06

small fix

9b43f41

Merge branch 'multilabel_default_metrics' of github.com:FrenchKrab/py…

3f0bf90

…annote-audio into multilabel_default_metrics

Merge branch 'develop' into multilabel_default_metrics

05638c7

FrenchKrab added 2 commits May 11, 2023 09:09

add ignore_index to default_metric_classwise

5f51b50

Revert "add ignore_index to default_metric_classwise"

3107105

This reverts commit 5f51b50.

chai3 and others added 16 commits June 8, 2023 08:42

fix: raise TypeError on wrong device type in Pipeline.to and Inferenc…

0551070

…e.to Fixes 1397

feat(task): add support for multi-task models (pyannote#1374)

30ddb0b

BREAKING(model): get rid of (flaky) `Model.introspection`

fix(inference): fix multi-task inference

4eb7190

feat: update FAQtory default answer

dcdfc15

improve(test): use pyannote.database.registry (pyannote#1413)

3363be6

feat(pipeline): add return_embeddings option to `SpeakerDiarization…

017c910

…` pipeline Co-authored-by: Hervé BREDIN <hbredin@users.noreply.github.com>

fix: fix missed speech at the very beginning/end

cf0e3b3

doc: add note to self regarding cluster reassignment (pyannote#1419)

f393546

fix(doc): fix typo in diarization docstring

35be745

ci: update suggest.md (pyannote#1435)

bc0920f

feat: add support for WeSpeaker embeddings (pyannote#1444)

7194929

fix: fix security issue in FAQtory bot

37b39b0

Update README.md

5a7df38

Update README.md

2af703d

fix(task): fix MultiLabelSegmentation.val_monitor

b660b1e

Merge branch 'develop' into multilabel_default_metrics

11e8e6c

hbredin and others added 3 commits September 16, 2023 15:50

fix(core): fix Model.example_output for embedding models

9df6944

update multilabel classwise metrics naming

4be15a4

update multilabel default metric + docstring + black formatting

9b9c966

Merge branch 'develop' into multilabel_default_metrics

10a51f9

hbredin force-pushed the develop branch from e487e0e to b9548a7 Compare September 20, 2023 15:43

stale bot added the wontfix label Mar 19, 2024

hbredin removed the wontfix label Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add F1 score, precision, and recall metrics as MultilabelSegmentation default metrics #1336

Add F1 score, precision, and recall metrics as MultilabelSegmentation default metrics #1336

FrenchKrab commented Apr 20, 2023

codecov bot commented Apr 20, 2023 •

edited

FrenchKrab commented Apr 20, 2023

hbredin Apr 20, 2023 •

edited

FrenchKrab commented Apr 20, 2023

hbredin commented May 10, 2023

FrenchKrab commented May 11, 2023 •

edited

hbredin commented May 15, 2023

hbredin commented Sep 15, 2023

FrenchKrab commented Sep 18, 2023

stale bot commented Mar 19, 2024

Add F1 score, precision, and recall metrics as MultilabelSegmentation default metrics #1336

Are you sure you want to change the base?

Add F1 score, precision, and recall metrics as MultilabelSegmentation default metrics #1336

Conversation

FrenchKrab commented Apr 20, 2023

codecov bot commented Apr 20, 2023 • edited

Codecov Report

FrenchKrab commented Apr 20, 2023

hbredin Apr 20, 2023 • edited

Choose a reason for hiding this comment

FrenchKrab commented Apr 20, 2023

hbredin commented May 10, 2023

FrenchKrab commented May 11, 2023 • edited

hbredin commented May 15, 2023

hbredin commented Sep 15, 2023

FrenchKrab commented Sep 18, 2023

stale bot commented Mar 19, 2024

codecov bot commented Apr 20, 2023 •

edited

hbredin Apr 20, 2023 •

edited

FrenchKrab commented May 11, 2023 •

edited