GitHub - felixbur/nkululeko: Machine learning speaker characteristics

Overview
Documentation
Installation
Usage
License

Overview

A project to detect speaker characteristics by machine learning experiments with a high-level interface.

The idea is to have a framework (based on e.g. sklearn and torch) that can be used to rapidly and automatically analyse audio data and explore machine learning models based on that data.

NEW: Nkululeko now automatically generates PDF reports sample for EmoDB
The latest features can be seen in the ini-file options that are used to control Nkululeko
Below is a Hello World example that should set you up fastly, also on Google Colab, and with Kaggle
Here's a blog post on how to set up nkululeko on your computer.
Here is a slack channel to discuss issues related to nkululeko. Please click the link if interested in contributing.
Here's a slide presentation about nkululeko
Here's a video presentation about nkululeko
Here's the 2022 LREC article on nkululeko

Here are some examples of typical output:

Confusion matrix

Per default, Nkululeko displays results as a confusion matrix using binning with regression.

Epoch progression

The point when overfitting starts can sometimes be seen by looking at the results per epoch:

Feature importance

Using the explore interface, Nkululeko analyses the importance of acoustic features:

Feature distribution

And can show the distribution of specific features per category:

t-SNE plots

A t-SNE plot can give you an estimate wether your acoustic features are useful at all:

Data distribution

Sometimes you only want to take a look at your data:

Bias checking

In cases you might wonder if there's bias in your data. You can try to detect this with automatically estimated speech properties, by visualizing the correlation of target label and predicted labels.

Documentation

The documentation, along with extensions of installation, usage, INI file format, and examples, can be found nkululeko.readthedocs.io.

Installation

Create and activate a virtual Python environment and simply run

pip install nkululeko

We excluded some packages from the automatic installation because they might depend on your computer and some of them are only needed in special cases. So if the error

module x not found

appears, please try

pip install x

For many packages you will need the missing torch package. If you don't have a GPU (which is probably true if you don't know what that is), please use

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

else, you can use the default:

pip install torch torchvision torchaudio

Some functionalities require extra packages to be installed, which we didn't include automatically:

the SQUIM model needs a special torch version:

pip uninstall -y torch torchvision torchaudio
pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

the spotlight adapter needs spotlight:

pip install renumics-spotlight sliceguard

Some examples for ini-files (which you use to control nkululeko) are in the tests folder.

Usage

ini-file values

Nkululeko works by specifiying

Basically, you specify your experiment in an "ini" file (e.g. experiment.ini) and then call one of the Nkululeko interfaces to run the experiment like this:

python -m nkululeko.nkululeko --config experiment.ini

A basic configuration looks like this:

[EXP]
root = ./
name = exp_emodb
[DATA]
databases = ['emodb']
emodb = ./emodb/
emodb.split_strategy = speaker_split
target = emotion
labels = ['anger', 'boredom', 'disgust', 'fear']
[FEATS]
type = ['praat']
[MODEL]
type = svm
[EXPL]
model = tree
plot_tree = True
[PLOT]
combine_per_speaker = mode

Read the Hello World example for initial usage with Emo-DB dataset.

Here is an overview of the interfaces/modules:

All of them take --config <my_config.ini> as an argument.

nkululeko.nkululeko: do machine learning experiments combining features and learners
nkululeko.multidb: do multiple experiments, comparing several databases cross and in itself
nkululeko.demo: demo the current best model on the command line
- --list (optional) list of input files
- --file (optional) name of input file
- --folder (optional) parent folder for input files
- --outfile (optional) name of CSV file for output
nkululeko.test: predict a given data set with the current best model
nkululeko.explore: perform data exploration
nkululeko.augment: augment the current training data
nkululeko.aug_train: augment the current training data and do a training including this data
nkululeko.predict: predict features like SNR, MOS, arousal/valence, age/gender, with DNN models
nkululeko.segment: segment a database based on VAD (voice activity detection)
nkululeko.resample: check on all sampling rates and change to 16kHz
nkululeko.nkuluflag: a convenient module to specify configuration parameters on the command-line.
- usage: nkuluflag.py [-h] [--config CONFIG] [--data [DATA ...]] [--label [LABEL ...]] [--tuning_params [TUNING_PARAMS ...]] [--layers [LAYERS ...]] [--model MODEL] [--feat FEAT] [--set SET] [--with_os WITH_OS] [--target TARGET] [--epochs EPOCHS] [--runs RUNS] [--learning_rate LEARNING_RATE] [--drop DROP]

There's my blog with tutorials:

Hello World example

NEW: Here's a Google colab that runs this example out-of-the-box, and here is the same with Kaggle
I made a video to show you how to do this on Windows
Set up Python on your computer, version >= 3.8
Open a terminal/commandline/console window
Test python by typing python, python should start with version >3 (NOT 2!). You can leave the Python Interpreter by typing exit()
Create a folder on your computer for this example, let's call it nkulu_work
Get a copy of the Berlin emodb in audformat and unpack inside the folder you just created (nkulu_work)
Make sure the folder is called "emodb" and does contain the database files directly (not box-in-a-box)
Also, in the nkulu_work folder:
- Create a Python environment
  - python -m venv venv
- Then, activate it:
  - under Linux / mac
    - source venv/bin/activate
  - under Windows
    - venv\Scripts\activate.bat
  - if that worked, you should see a (venv) in front of your prompt
- Install the required packages in your environment
  - pip install nkululeko
  - Repeat until all error messages vanished (or fix them, or try to ignore them)...
Now you should have two folders in your nkulu_work folder:
- emodb and venv
Download a copy of the file exp_emodb.ini to the current working directory (nkulu_work)
Run the demo
- python -m nkululeko.nkululeko --config exp_emodb.ini
Find the results in the newly created folder exp_emodb
- Inspect exp_emodb/images/run_0/emodb_xgb_os_0_000_cnf.png
- This is the main result of you experiment: a confusion matrix for the emodb emotional categories
Inspect and play around with the demo configuration file that defined your experiment, then re-run.
There are many ways to experiment with different classifiers and acoustic features sets, all described here

Features

The framework is targeted at the speech domain and supports experiments where different classifiers are combined with different feature extractors.

Classifiers: Naive Bayes, KNN, Tree, XGBoost, SVM, MLP
Feature extractors: Praat, Opensmile, openXBOW BoAW, TRILL embeddings, Wav2vec2 embeddings, audModel embeddings, ...
Feature scaling
Label encoding
Binning (continuous to categorical)
Online demo interface for trained models

Here's a rough UML-like sketch of the framework (and here's the real one done with pyreverse).

Currently, the following linear classifiers are implemented (integrated from sklearn):

SVM, SVR, XGB, XGR, Tree, Tree_regressor, KNN, KNN_regressor, NaiveBayes, GMM and the following ANNs (artificial neural networks)
MLP (multi-layer perceptron), CNN (convolutional neural network)

Here's an animation that shows the progress of classification done with nkululeko

License

Nkululeko can be used under the MIT license If you use it, please mention the Nkululeko paper

F. Burkhardt, Johannes Wagner, Hagen Wierstorf, Florian Eyben and Björn Schuller: Nkululeko: A Tool For Rapid Speaker Characteristics Detection, Proc. Proc. LREC, 2022

@inproceedings{Burkhardt:lrec2022,
   title = {Nkululeko: A Tool For Rapid Speaker Characteristics Detection},
   author = {Felix Burkhardt and Johannes Wagner and Hagen Wierstorf and Florian Eyben and Björn Schuller},
   isbn = {9791095546726},
   journal = {2022 Language Resources and Evaluation Conference, LREC 2022},
   keywords = {machine learning,speaker characteristics,tools},
   pages = {1925-1932},
   publisher = {European Language Resources Association (ELRA)},
   year = {2022},
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,092 Commits
.github/workflows		.github/workflows
data		data
docs		docs
meta		meta
nkululeko		nkululeko
tests		tests
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
ini_file.md		ini_file.md
make_package.sh		make_package.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_mld.txt		requirements_mld.txt
requirements_wav2vec.txt		requirements_wav2vec.txt
setup.cfg		setup.cfg
setup.py		setup.py
test_runs.sh		test_runs.sh

License

felixbur/nkululeko

Folders and files

Latest commit

History

Repository files navigation

Overview

Confusion matrix

Epoch progression

Feature importance

Feature distribution

t-SNE plots

Data distribution

Bias checking

Documentation

Installation

Usage

Features

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages