Interactive Speaker Recognition

This repository explores the approach to speaker recognition described in paper "A Machine of Few Words -- Interactive Speaker Recognition with Reinforcement Learning".

Installation

This project is implemented as a pip package. To install, clone this repository and then run

pip install .

Add -e flag if you would like to modify package files (src/isr).

You will probably want to use GPU, therefore it's recommended you install torch with CUDA prior to installing this package.

Usage

In order to prepare data, train and test models you will need scripts from src/ directory. First you will need to download the TIMIT dataset and place a link to it in data/

ln -s data/TIMIT PATH_TO_TIMIT

In order to use x-vector embeddings you will have to download and install kaldi. To create files necessary for kaldi first make sure your shell session has KALDI_ROOT variable defined, then run

python3 src/data_processing.py kaldi-data-prep

This will create files in data/kaldi directory. You will need those to extract x-vector embeddings with kaldi, as well as extract_kaldi_xvectors.sh. See the comments in this file for more information.

NOTE:

X-vector embeddings are not particularly great, so consider using a different pretrained model.

To convert extracted embeddings to numpy arrays run

python3 src/data_processing.py kaldi-to-numpy

Now you can finally train and test models. Here is a pipeline example:

mkdir output models
python3 src/guesser.py train
python3 src/guesser.py test
cp output/guesser.pth models/
python3 src/enquirer.py train
python3 src/enquirer.py test
cp output/actor.pth models/enquirer.pth
python3 src/select_words.py
cp output/word_scores_val.csv models/word_scores.csv
python3 src/heuristic_agent.py -w 3

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
reports		reports
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reports

reports

src

src

tests

tests

.gitignore

.gitignore

README.md

README.md

pyproject.toml

pyproject.toml

setup.cfg

setup.cfg

Repository files navigation

Interactive Speaker Recognition

Installation

Usage

About

Releases

Packages

Languages

vsgolovin/interactive-speaker-recognition

Folders and files

Latest commit

History

Repository files navigation

Interactive Speaker Recognition

Installation

Usage

About

Topics

Resources

Stars

Watchers

Forks

Languages