Dealing with overlapping speech #1157

bfavero · 2022-11-17T02:16:54Z

bfavero
Nov 17, 2022

Hi!

I'm trying to use pyannote with an ASR API to get diarized transcriptions and I'm having trouble when dealing with speech that overlaps.

The ASR solution I use gives word-level timestamps, so I do the STT and the diarization in parallel. Then, I group the transcribed words into segments that follow the diarization result.

The results have been reasonably good, but I can't quite solve the overlapping problem. Consider the diarization result below:

4.19 --> 45.55: SPEAKER_01
30.51 --> 30.94: SPEAKER_00
36.81 --> 36.83: SPEAKER_00
43.11 --> 48.23: SPEAKER_00

As you can see, pyannote gives overlapping timestamps. The best solution I found so far is to prioritize the speaker that comes first. My script reads each line in the order pyannote creates them and assigns all the words that are in that interval to the first speaker. Then, it goes to the following line and does the same, but it starts from the first word that does not have a speaker yet.

In the case above, the final transcript would be something like this:

4.19 --> 45.55: SPEAKER_01 - words words words words words words words words words words words words words words
45.55 --> 48.23: SPEAKER_00 - words words

The problem is that this approach often results in imprecision — for example, SPEAKER 01 gets words that were spoken by SPEAKER 00.

Does anyone have a better idea of what would be the best way to deal with the problem of overlapping speech?

Sorry for the long post!

Answered by cetiny

Nov 18, 2022

I have a solution that works 95% of the time with some post-processing of the diarization with pandas. My goal is to have zero overlaps in the final dataframe.

First distinguish between full overlaps (like the 2 first in your example) and partial overlaps (like your last example at the end of first segment)
General: Delete all segments that are shorter than 0.5 seconds (mostly "hmm" and short "yes" while the other speaker is speaking)
Full overlap: Delete all segments that are shorter than 1 seconds (mostly speaking too soon and not continueing before 1st speaker finishes)
Full overlap: Longer segments. I divide the longer segment into two, the shorter segment intercepts it and overwrites.

View full answer

cetiny · 2022-11-18T11:21:02Z

cetiny
Nov 18, 2022

I have a solution that works 95% of the time with some post-processing of the diarization with pandas. My goal is to have zero overlaps in the final dataframe.

First distinguish between full overlaps (like the 2 first in your example) and partial overlaps (like your last example at the end of first segment)
General: Delete all segments that are shorter than 0.5 seconds (mostly "hmm" and short "yes" while the other speaker is speaking)
Full overlap: Delete all segments that are shorter than 1 seconds (mostly speaking too soon and not continueing before 1st speaker finishes)
Full overlap: Longer segments. I divide the longer segment into two, the shorter segment intercepts it and overwrites.
Partial overlaps: If segment is less than 2 seconds, and overlaps more than 0.6 seconds -> delete (unnecessary interruptions at the end of the sentence)
Partial overlaps longer: I modify end of first segment to a new value (start of 2nd segment)

1 reply

tpstps Feb 16, 2023

Hi @cetiny ,

would it be possible if you share your code here?

That would be very helpful to me.

Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with overlapping speech #1157

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Dealing with overlapping speech #1157

bfavero Nov 17, 2022

Replies: 1 comment · 1 reply

cetiny Nov 18, 2022

tpstps Feb 16, 2023

bfavero
Nov 17, 2022

Replies: 1 comment 1 reply

cetiny
Nov 18, 2022