A machine learning system that recognizes the word 'Google' in human speech.
We train a classifier on a set of WAV files using Mel-Frequency Cepstral Coefficients (MFCC) as features. There are two implementations of the classifier available:
- Regularized logistic regression, trained with conjugate gradient optimizer (
fmincg
). - Feed-forward neural network, trained with MATLAB's scaled conjugate gradient optimizer (
trainscg
).
- Import training and test data into the
data
folder. You can get some data from the Releases Page. The names of the files should follow thepronunciation_en_%label%.wav
pattern. - Run either
mainLogisticRegression.m
ormainNeuralNetwork.m
depending on which classifier you would like to try.