Knowledge Transfer for Acoustic Scene Classification

NEW Our paper is accepted to IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP 2022). We would like to thank the reviewers and committee members in the audio and speech processing community.

Introduction

This repo includes codes for (1) Our proposed Variational Bayesian Knowledge transfer (VBKT) algorithm, and (2) The implemetation of 13 recent cut-edging knowledge transfer (knowledge distillation / teacher-student learning) methods, including TSL, NLE, Fitnets, AT, AB, VID, FSP, COFD, SP, CCKD, PKT, NST, and RKD. More details can be referred to as in our paper Arxiv.

How to use

Environment Setup

Tensorflow 1.14 and Keras 2.1. (via pip install or conda install).

Noted - We use linux-ppc64le but it should be fine on other platforms follow the suggested version.

$ conda env create -f environment.yml

Dataset

We use DCASE 2020 Task 1a ASC data: TAU Urban Acoustic Scenes 2020 Mobile, Development dataset. Audio clips are grouped based on their recording devices. This repo focuses on the device adaptation problem, to transfer knowledge from the source domain (device A) to the target domain (device b, c, s1-s6).

Feature Extraction

The acoustic features are extracted and dumped to local disk. Run the command below to extract log-mel filter bank (LMFB) features. Please specify the audio path.

$ python tools/extr_feat_logmel.py

Model Training

Two ASC models are covered in this repo: resnet and fcnn, based on DCASE2020_task1.

Train source/teacher model: train_source.py. Refer to the recipe ./scripts/run_source.sh for parameter settings.
Train target/student model with knowledge transfer algorighms: train_target.py. Refer to the recipe ./scripts/run_target.sh for detail parameter settings for each methods.

Pretrained Models

We provide some pretrained models in ./pretrained_models/ as example. We have two pretrained resnet models with VBKT method on target device b and c.

Evaluation

Use ./tools/eval.py to evaluate a well-trained model on a target device. Example usages on pretrained models are shown below, should get 0.7212 and 0.7545, respectively.

$ python tools/eval.py --model_path pretrained_models/model_resnet_vbkt_device-b.hdf5 --device b
$ python tools/eval.py --model_path pretrained_models/model_resnet_vbkt_device-c.hdf5 --device c

Reference

If you find this work useful, please consider to cite our paper. Thank you! Feel free to contact us for any quesitons or collaborations.

@inproceedings{hu2022variational,
  title={A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer},
  author={Hu, Hu and Siniscalchi, Sabato Marco and Yang, Chao-Han Huck and Lee, Chin-Hui},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={4041--4045},
  year={2022},
  organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
fig		fig
kt_losses		kt_losses
models		models
pretrained_models		pretrained_models
scripts		scripts
tools		tools
utils		utils
.gitignore		.gitignore
README.md		README.md
environment.yaml		environment.yaml
train_source.py		train_source.py
train_target.py		train_target.py

MihawkHu/ASC_Knowledge_Transfer

Folders and files

Latest commit

History

Repository files navigation

Knowledge Transfer for Acoustic Scene Classification

Introduction

How to use

Environment Setup

Dataset

Feature Extraction

Model Training

Pretrained Models

Evaluation

Reference

About

Topics

Resources

Stars

Watchers

Forks

Languages