Skip to content

idiap/zff_vad

Repository files navigation

ZFF VAD

[Paper] [Poster] [Video] [Slides]

License Open-Source Style Security Imports

Pipeline

Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering

This repository contains the code developed for the Interspeech accepted paper: Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering by E. Sarkar, R. Prasad, and M. Magimai Doss (2022).

Please cite the original authors for their work in any publication(s) that uses this work:

@inproceedings{sarkar22_interspeech,
author    = {Eklavya Sarkar and RaviShankar Prasad and Mathew Magimai Doss},
title     = {{Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering}},
year      = {2022},
booktitle = {Proc. Interspeech 2022},
pages     = {4626--4630},
doi       = {10.21437/Interspeech.2022-10535}
}

Approach

We jointly model voice source and vocal tract system information using zero-frequency filtering technique for the purpose of voice activity detection. This is computed by combining the ZFF filter outputs together to compose a composite signal carrying salient source and system information, such as the fundamental frequency $$f_0$$ and formants $$F_1$$ and $$F_2$$, and then applying a dynamic threshold after spectral entropy-based weighting. Our approach operates purely in the time domain, is robust across a range of SNRs, and is much more computationally efficient than other neural methods.

Installation

This package has very few requirements. To create a new conda/mamba environment, install conda, then mamba and simply follow the next steps:

mamba env create -f environment.yml   # Create environment
conda activate zff                    # Activate environment
make install clean                    # Install packages

Command-line Usage

To segment a single audio file into a .csv file:

segment -w path/to/audio.wav -o path/to/save/segments

To segment a folder of audio files:

segment -f path/to/folder/of/audio/files -o path/to/save/segments

For more options check:

segment -h

Note: depending on the conditions of the given data, it will be necessary tune the smoothing and theta parameters.

Python Usage

To compute VAD on a given audio file:

from zff import utils
from zff.zff import zff_vad

# Read audio at native sampling rate
sr, audio = utils.load_audio("audio.wav")

# Get segments
boundary = zff_vad(audio, sr)

# Smooth
boundary = utils.smooth_decision(boundary, sr)

# Convert from sample to time domain
segments = utils.sample2time(audio, sr, boundary)

# Save as .csv file
utils.save_segments("segments", "audio", segments)

To extract the composite signal from a given audio file:

from zff.zff import zff_cs
from zff import utils

# Read audio at native sampling rate
fs, audio = utils.load_audio("audio.mp3")

# Get composite signal
composite = zff_cs(audio, sr)

# Get all signals
composite, y0, y1, y2, gcis = zff_cs(audio, sr, verbose=True)

Repository Structure

.
├── environment.yml          # Environment
├── img                      # Images
├── LICENSE                  # License
├── Makefile                 # Setup
├── MANIFEST.in              # Setup
├── pyproject.toml           # Setup
├── README.rst               # README
├── requirements.txt         # Setup
├── setup.py                 # Setup
├── version.txt              # Version
└── zff                      # Source code folder
    ├── arguments.py            # Arguments parser
    ├── segment.py              # Main method
    ├── utils.py                # Utility methods
    └── zff.py                  # ZFF methods

Contact

For questions or reporting issues to this software package, kindly contact the first author.