ZFF VAD

Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering

This repository contains the code developed for the Interspeech accepted paper: Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering by E. Sarkar, R. Prasad, and M. Magimai Doss (2022).

Please cite the original authors for their work in any publication(s) that uses this work:

@inproceedings{sarkar22_interspeech,
author    = {Eklavya Sarkar and RaviShankar Prasad and Mathew Magimai Doss},
title     = {{Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering}},
year      = {2022},
booktitle = {Proc. Interspeech 2022},
pages     = {4626--4630},
doi       = {10.21437/Interspeech.2022-10535}
}

Approach

We jointly model voice source and vocal tract system information using zero-frequency filtering technique for the purpose of voice activity detection. This is computed by combining the ZFF filter outputs together to compose a composite signal carrying salient source and system information, such as the fundamental frequency $f_0$ and formants $F_1$ and $F_2$ , and then applying a dynamic threshold after spectral entropy-based weighting. Our approach operates purely in the time domain, is robust across a range of SNRs, and is much more computationally efficient than other neural methods.

Installation

This package has very few requirements. To create a new conda/mamba environment, install conda, then mamba and simply follow the next steps:

mamba env create -f environment.yml   # Create environment
conda activate zff                    # Activate environment
make install clean                    # Install packages

Command-line Usage

To segment a single audio file into a .csv file:

segment -w path/to/audio.wav -o path/to/save/segments

To segment a folder of audio files:

segment -f path/to/folder/of/audio/files -o path/to/save/segments

For more options check:

segment -h

Note: depending on the conditions of the given data, it will be necessary tune the smoothing and theta parameters.

Python Usage

To compute VAD on a given audio file:

from zff import utils
from zff.zff import zff_vad

# Read audio at native sampling rate
sr, audio = utils.load_audio("audio.wav")

# Get segments
boundary = zff_vad(audio, sr)

# Smooth
boundary = utils.smooth_decision(boundary, sr)

# Convert from sample to time domain
segments = utils.sample2time(audio, sr, boundary)

# Save as .csv file
utils.save_segments("segments", "audio", segments)

To extract the composite signal from a given audio file:

from zff.zff import zff_cs
from zff import utils

# Read audio at native sampling rate
fs, audio = utils.load_audio("audio.mp3")

# Get composite signal
composite = zff_cs(audio, sr)

# Get all signals
composite, y0, y1, y2, gcis = zff_cs(audio, sr, verbose=True)

Repository Structure

.
├── environment.yml          # Environment
├── img                      # Images
├── LICENSE                  # License
├── Makefile                 # Setup
├── MANIFEST.in              # Setup
├── pyproject.toml           # Setup
├── README.rst               # README
├── requirements.txt         # Setup
├── setup.py                 # Setup
├── version.txt              # Version
└── zff                      # Source code folder
    ├── arguments.py            # Arguments parser
    ├── segment.py              # Main method
    ├── utils.py                # Utility methods
    └── zff.py                  # ZFF methods

Contact

For questions or reporting issues to this software package, kindly contact the first author.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ZFF VAD

Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering

Approach

Installation

Command-line Usage

Python Usage

Repository Structure

Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
img		img
zff		zff
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.rst		README.rst
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
version.txt		version.txt

License

idiap/zff_vad

Folders and files

Latest commit

History

Repository files navigation

ZFF VAD

Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering

Approach

Installation

Command-line Usage

Python Usage

Repository Structure

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages