Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the duration of audio of each D vector embedding that is created? #66

Open
abhilashnayak opened this issue Dec 13, 2019 · 0 comments

Comments

@abhilashnayak
Copy link

Hi,

Thanks for this work. I am using the output of dvector_create.py as input to uis-rnn. Diarization is also done.

But I have a small confusion on the number of d vector embeddings created.
dvector_create.py created 24 embeddings for 9.7 sec audio and 21 embeddings for 8.9 sec audio.
In the first case, if I consider every embedding is related to 240 milliseconds (just an assumption) of audio and add up , it does not give the full audio duration.
24 * 240 = 5760 (5.7 seconds). But my audio file is 9.7 seconds long.

Just wanted to understand this as I need to split the audio after diarization is done. The idea is, if diarization result says that first 10 embeddings are related to speaker1 and if I also know each embedding is X ms long, then 10 * X= 10X ms (10X/1000 seconds) . So I will split the audio after 10X ms seconds and so on. So without knowing from what time frame(in milliseconds) to what timeframe speaker 1 spoke and what are the timeframes for speaker 2 , I cannot split the audio.

Please help me understand this. Also is there any other way that you can suggest to split the audio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant