What is the duration of audio of each D vector embedding that is created? #66

abhilashnayak · 2019-12-13T10:18:29Z

Hi,

Thanks for this work. I am using the output of dvector_create.py as input to uis-rnn. Diarization is also done.

But I have a small confusion on the number of d vector embeddings created.
dvector_create.py created 24 embeddings for 9.7 sec audio and 21 embeddings for 8.9 sec audio.
In the first case, if I consider every embedding is related to 240 milliseconds (just an assumption) of audio and add up , it does not give the full audio duration.
24 * 240 = 5760 (5.7 seconds). But my audio file is 9.7 seconds long.

Just wanted to understand this as I need to split the audio after diarization is done. The idea is, if diarization result says that first 10 embeddings are related to speaker1 and if I also know each embedding is X ms long, then 10 * X= 10X ms (10X/1000 seconds) . So I will split the audio after 10X ms seconds and so on. So without knowing from what time frame(in milliseconds) to what timeframe speaker 1 spoke and what are the timeframes for speaker 2 , I cannot split the audio.

Please help me understand this. Also is there any other way that you can suggest to split the audio.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the duration of audio of each D vector embedding that is created? #66

What is the duration of audio of each D vector embedding that is created? #66

abhilashnayak commented Dec 13, 2019

What is the duration of audio of each D vector embedding that is created? #66

What is the duration of audio of each D vector embedding that is created? #66

Comments

abhilashnayak commented Dec 13, 2019