Skip to content

RicherMans/PSL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pseudo Strong Labels

This repository contains the source code for our ICASSP2022 paper Pseudo strong labels for large scale weakly supervised audio tagging.

Highlights:

  • State-of-the-art on the balanced Audioset subset.
  • Simple MobileNetV2 model, don't need expensive GPU to run.
  • Quick training, since only 60h of balanced Audioset is required.
  • Achieves an mAP of 35.48 (more or less), useable for most real-world applications.

PSL Architecture

The aim of this work is to show that by adding automatic supervision on a fixed scale from a machine annotator (or teacher) to a student model, performance gains can be observed on Audioset.

Specifically, our method outperforms other approaches in literature on the balanced subset of Audioset, while using a rather simple MobileNetV2 architecture.

Method Label mAP $d'$
Baseline (Weak) Weak 17.69 1.994
PSL-10s (Proposed) PSL-10s 31.13 2.454
PSL-5s (Proposed) PSL-5s 34.11 2.549
PSL-2s (Proposed) PSL-2s 35.48 2.588
----------------------------------- ---------------------------------------- ------------- -------------
CNN14 [@Kong2020d] Weak 27.80 1.850
EfficientNet-B0 [@gong2021psla] Weak 33.50 -
EfficientNet-B2 [@gong2021psla] Weak 34.06 -
ResNet-50 [@gong2021psla] Weak 31.80 -
AST [@gong21b_interspeech] Weak 34.70 -

Requirements

Binary package requirements

gnu-parallel for the preprocessing, which can be installed using conda:

conda install parallel

If you have root rights you can:

# On Arch distros
sudo pacman -S parallel 
# On Debian
sudo apt install parallel

Further, the download script in scripts/1_download_audioset.sh uses Proxychains to download the data. You might want to disable proxychains by simply removing the line or configure your own proxychains proxy.

Python requirements

This script has been tested using python=3.8 on a Centos 5 and Manjaro. To install the python dependencies just run:

python3 -m pip install -r requirements.txt

Training preparation

The structure of this repo is as follows:

.
├── configs
├── data
│   ├── audio
│   │   ├── balanced
│   │   └── eval
│   ├── csvs
│   └── logs
├── figures
├── scripts
│   └── utils

[Optional] Preparation without downloading the dataset

If already have downloaded audioset, please put the raw data of the balanced and eval subsets in data/audio/balanced and data/audio/eval respectively. Then put balanced_train_segments.csv, eval_segments.csv and class_labels_indices.csv into data/csvs.

1. Download Data

Firstly, you need the balanced and evaluation subsets of audioset. These can be downloaded using the following script:

./scripts/1_download_audioset.sh

2. Prepare HDF5

In order to speed up IO, we pack the data into hdf5 files. This can be done by:

./scripts/2_prepare_data.sh

Usage

For the experiments in Table 2, run:

## For the 10s PSL training
./train_psl.sh configs/psl_balanced_chunk_10sec.yaml
## For the 5s PSL training
./train_psl.sh configs/psl_balanced_chunk_5sec.yaml
## For the 2s PSL training
./train_psl.sh configs/psl_balanced_chunk_2sec.yaml

For the experiments in Table 3, run:

## For the 10s PSL training
./train_psl.sh configs/teacher_student_chunk_10sec.yaml
## For the 5s PSL training
./train_psl.sh configs/teacher_student_chunk_5sec.yaml
## For the 2s PSL training
./train_psl.sh configs/teacher_student_chunk_2sec.yaml

Note that this repo can be easily extended to run the experiments in Table 4, i.e., using the full Audioset dataset.

About

Source code for ICASSP2022 "Pseudo Strong labels for large scale weakly supervised audio tagging"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published