How to interpret the output of the segmentation model ? #1315

amitli1 · 2023-04-04T06:03:40Z

amitli1
Apr 4, 2023

pyannote for speaker diarization based on the following segmentation model:
End-to-end speaker segmentation for overlap-aware resegmentation

In the above paper they wrote, under the Implementation details:

model input: sequences of 80000 samples
[i.e: 5s audio chunks with a sampling rate of 16kHz]
model output:
K max -dimensional speaker activations between 0 and 1 every 16ms.

Does it means that the output shape is (K, 5000/16) ?
The output values are between 0 and 1. how to interpret it ?
How to conclude if we have a new segment or number of segments in each output ? number of speaker in output ? (example will be very helpful)

hbredin · 2023-04-04T06:52:47Z

hbredin
Apr 4, 2023
Maintainer

Did you read this? This should answer most of your questions about this model.

2 replies

amitli1 Apr 4, 2023
Author

Thanks,
Still don't understand the shape: (11, 293, 3) for 5s sliding window.
Didn't understand the value of 293.
5000 / 16ms = 312.5
How they got 293 ?

amitli1 Apr 7, 2023
Author

@hbredin Can you please explain how we get the shape of: (11, 293, 3)
after the code:

inference = Inference(model, duration=5.0, step=2.5)
output = inference(SAMPLE_WAV)

I don't understand why 293 ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to interpret the output of the segmentation model ? #1315

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

How to interpret the output of the segmentation model ? #1315

amitli1 Apr 4, 2023

Replies: 1 comment · 2 replies

hbredin Apr 4, 2023 Maintainer

amitli1 Apr 4, 2023 Author

amitli1 Apr 7, 2023 Author

amitli1
Apr 4, 2023

Replies: 1 comment 2 replies

hbredin
Apr 4, 2023
Maintainer

amitli1 Apr 4, 2023
Author

amitli1 Apr 7, 2023
Author