Machine learning speaker characteristics
A project to detect speaker characteristics by machine learning experiments with a high-level interface.
The idea is to have a framework (based on e.g. sklearn and torch) that can be used to rapidly and automatically analyse audio data and explore machine learning models based on that data.
Here are some examples of typical output:
Per default, Nkululeko displays results as a confusion matrix using binning with regression.
The point when overfitting starts can sometimes be seen by looking at the results per epoch:
Using the explore interface, Nkululeko analyses the importance of acoustic features:
And can show the distribution of specific features per category:
If there are only two categories, a Mann-Whitney U test for significance is given:
A t-SNE plot can give you an estimate of whether your acoustic features are useful at all:
Sometimes, you only want to take a look at your data:
In some cases, you might wonder if there’s bias in your data. You can try to detect this with automatically estimated speech properties by visualizing the correlation of target labels and predicted labels.
Nkululeko estimates the uncertainty of model decisions (only for classifiers) with entropy over the class probabilities or logits per sample.
The documentation, along with extensions of installation, usage, INI file format, and examples, can be found nkululeko.readthedocs.io.
Create and activate a virtual Python environment and simply run
pip install nkululeko
We excluded some packages from the automatic installation because they might depend on your computer and some of them are only needed in special cases. So if the error
module x not found
appears, please try
pip install x
For many packages, you will need the missing torch package.
If you don’t have a GPU (which is probably true if you don’t know what that is), please use
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
else, you can use the default:
pip install torch torchvision torchaudio
Some functionalities require extra packages to be installed, which we didn’t include automatically:
pip uninstall -y torch torchvision torchaudio
pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
pip install renumics-spotlight sliceguard
Some examples for ini-files (which you use to control nkululeko) are in the tests folder.
Nkululeko works by specifiying
Basically, you specify your experiment in an “ini” file (e.g. experiment.ini) and then call one of the Nkululeko interfaces to run the experiment like this:
python -m nkululeko.nkululeko --config experiment.ini
A basic configuration looks like this:
[EXP]
root = ./
name = exp_emodb
[DATA]
databases = ['emodb']
emodb = ./emodb/
emodb.split_strategy = speaker_split
target = emotion
labels = ['anger', 'boredom', 'disgust', 'fear']
[FEATS]
type = ['praat']
[MODEL]
type = svm
[EXPL]
model = tree
plot_tree = True
Read the Hello World example for initial usage with Emo-DB dataset.
Here is an overview of the interfaces/modules:
All of them take —config
nkululeko.nkuluflag: a convenient module to specify configuration parameters on the command line. Usage:
$ python -m nkululeko.nkuluflag.py [-h] [--config CONFIG] [--data [DATA ...]] [--label [LABEL ...]] [--tuning_params [TUNING_PARAMS ...]] [--layers [LAYERS ...]] [--model MODEL] [--feat FEAT] [--set SET] [--with_os WITH_OS] [--target TARGET] [--epochs EPOCHS] [--runs RUNS] [--learning_rate LEARNING_RATE] [--drop DROP]
There’s my blog with tutorials:
python
, python should start with version >3 (NOT 2!). You can leave the Python Interpreter by typing exit()nkulu_work
nkulu_work
)nkulu_work
folder: python -m venv venv
source venv/bin/activate
venv\Scripts\activate.bat
(venv)
in front of your promptpip install nkululeko
nkulu_work
)python -m nkululeko.nkululeko --config exp_emodb.ini
exp_emodb/images/run_0/emodb_xgb_os_0_000_cnf.png
The framework is targeted at the speech domain and supports experiments where different classifiers are combined with different feature extractors.
Here’s a rough UML-like sketch of the framework (and here’s the real one done with pyreverse).
Currently, the following linear classifiers are implemented (integrated from sklearn):
Here’s an animation that shows the progress of classification done with nkululeko
Nkululeko can be used under the MIT license.
Contributions are welcome and encouraged. To learn more about how to contribute to nkululeko, please refer to the Contributing guidelines.
If you use it, please mention the Nkululeko paper:
F. Burkhardt, Johannes Wagner, Hagen Wierstorf, Florian Eyben and Björn Schuller: Nkululeko: A Tool For Rapid Speaker Characteristics Detection, Proc. Proc. LREC, 2022
@inproceedings{Burkhardt:lrec2022,
title = {Nkululeko: A Tool For Rapid Speaker Characteristics Detection},
author = {Felix Burkhardt and Johannes Wagner and Hagen Wierstorf and Florian Eyben and Björn Schuller},
isbn = {9791095546726},
journal = {2022 Language Resources and Evaluation Conference, LREC 2022},
keywords = {machine learning,speaker characteristics,tools},
pages = {1925-1932},
publisher = {European Language Resources Association (ELRA)},
year = {2022},
}