-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plda Scoring #548
base: master
Are you sure you want to change the base?
Plda Scoring #548
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done!
@@ -397,7 +397,8 @@ bool KaldiRecognizer::GetSpkVector(Vector<BaseFloat> &out_xvector, int *num_spk_ | |||
// xvector_result is filled with xvector for PldaScoring process | |||
xvector_result = xvector; | |||
// out_xvector will be filled by PldaScoring method from utterance | |||
// xvector after transformation | |||
// xvector before transformation so that it can be used for new | |||
// users enrollment | |||
PldaScoring(out_xvector); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you get this out_xvector? only for enrollment? I guess that maybe it's better to add a specific method for this task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out_xvector
is passed as reference to this function and it will be filled to sending to user as spk
field.
Yes it can be used for enrollment and it should be added to spk_xvectors.ark
as ark format.
Defining new function will make some redundancy computing xvector again while we compute this values in PldaScoring
once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be best to review the entire speaker enrollment and scoring scenario once. Using a function for two purposes is not interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to distinguish between speaker recognition and speech recognition task in the general structure. It may be best to have two separate recognizer modules for these two tasks.
Initially issued by @gooran and is mainly inspired by similar task in lid_kaldi.
By merging this pull request bellow changes will be happened:
ADDED:
KaldiRecognizer::PldaScoring
added tokaldi_recognizer.cc
plda
added tospk_model.h
plda_config
added tospk_model.h
plda_rxfilename
added tospk_model.h
vad_opts
added tospk_model.h
num_utts
added tospk_model.h
train_ivectors
added tospk_model.h
train_ivector_rspecifier
added tospk_model.h
num_utts_rspecifier
added tospk_model.h
sorted_scores
method added totest_speaker.py
spk_sig
vector changed to be similar dim with model (dim=128)After this PR for each utterance after xvector extraction there will be a PLDA scoring which scores the likelihood between the uterance speaker (test) and train xvectors and this process will be done automatically after each utterance and the JSON result will contain a new field called
scores
which should be work like this:There is also a tiny edit on
spk
field returning xvector of latest utterance. After this PR this field will be filled byPldaScoring
method during its computation for PLDA.I've tested this feature with bellow model files and all the things seems normal:
model
= vosk-model-small-fa-0.5.zipmodel-spk
= vosk-model-spk-0.5.zip - which is a new sre16 based speaker recognition model containing bellow files gathered together by @gooran using bellow recipe:final.ext.raw
: extracted version of sre16 model using bellow command:mfcc.conf
MFCC config fileplda_adapt.smooth0.1
: smoothed version of PLDAspk_xvectors.ark
: trained speaker's xvector archive filevad.conf
: VAD config filemean.vec
: mean vectornum_utts.ark
: number of utterances associated to each speakerREADME.md
: README filetransform.mat
: transformation matrixAny comment and enhancement will be happily accepted.