Skip to content

N-HANS/N-HANS

Repository files navigation

PyPI PyPI PyPI GitHub license GitHub license GitHub license

M

N-HANS is a Python toolkit for in-the-wild speech enhancement, including speech, music, and general audio denoising, separation, and selective noise or source suppression. The functionalities are realised based on two neural network models sharing the same architecture, but trained separately. The models are comprised of stacks of residual blocks, each conditioned on additional speech or environmental noise recordings for adapting to different unseen speakers or environments in real life.

                                                                                  pip3 install N-HANS

Please direct any questions or requests to Shuo Liu (shuo.liu@informatik.uni-augsburg.de).

Citation

If you use N-HANS or any code from N-HANS in your research work, you are kindly asked to acknowledge the use of N-HANS in your publications.

                                                       https://link.springer.com/article/10.1007%2Fs11042-021-11080-y

S. Liu, G. Keren, E. Parada-Cabaleiro, B. Schuller, "N-HANS: A neural network-based toolkit for in-the-wild audio enhancement," Multimedia Tools and Applications, 2021, accepted, 27 pages.

Prerequisites

  • Python 3 / Python 2.7

Python Dependencies

  • numpy >=1.14.5
  • scipy >=1.0.1
  • tensorflow/tensorflow-gpu >=1.14.0 or tensorflow >= 2.0

Usage

Loading Models

After pip3 install N-HANS, users are expexted to create a N-HANS folder for conducting audio denoising or separation tasks. For linux users, commands load_denoiser or load_separator will assist in downloading pretrained denoising and separation models, accompanied by some audio examples. The trained models and audio examples can also be found in the above N_HANS_Selective_Noise and N_HANS_Source_Separation folders, which provides users working on other operation systems the opportunity to apply N-HANS.

Applying N-HANS

N-HANS has been developed to process standard .wav audios with sample rate of 16kHz and coded in 16-bit Signed Integer PCM. With the embedded format converter written based on sox package, audio files of other formats are automatically to convert to this standard setting.

Commands

Task Command Discription
speech denoising nhans_denoiser --input noisy.wav --output denoised.wav                              --neg noise.wav --neg the environmental noise
selective noise suppresion nhans_denoiser --input noisy.wav --output denoised.wav                              --pos preserve.wav --neg suppress.wav --pos indicates the noise to be preserved
--neg hints the noise to be suppressed
speech separation nhans_separator --input mixed.wav --output separated.wav
                              --pos target.wav --neg interference.wav
--pos indicates the target speaker
--neg hints the interference speaker

Examples

Processing single wav sample

Task Example
speech denoising nhans_denoiser --input audio_examples/exp2_noisy.wav --output denoised.wav --neg audio_examples/exp2_noise.wav
selective noise suppresion nhans_denoiser --input audio_examples/exp1_noisy.wav --output denoised.wav --pos audio_examples/exp1_posnoise.wav --neg audio_examples/exp2_negnoise.wav
speech separation nhans_separator --input audio_examples/mixed.wav --output separated.wav --pos audio_examples/target_speaker.wav --neg audio_examples/noise_speaker.wav

Processing multiple wav samples in folders

Please create folders containing noisy, (positive) negative recordings, the recordings for each example in different folders should have identical filename.

Task Example
speech denoising nhans_denoiser --input audio_examples/noisy_dir --output denoised_dir --neg audio_examples/neg_dir
selective noise suppresion nhans_denoiser --input audio_examples/noisy_dir --output denoised_dir --pos audio_examples/pos_dir --neg=audio_examples/neg_dir
speech separation nhans_separator --input audio_examples/mixed_dir --output separated_dir --pos=audio_examples/target_dir --neg=audio_examples/interference_dir

Train your own N-HANS

You can train your own selective audio suppression system and separation system using N-HANS architecture based on this respository.

  1. To train a selective audio suppression system, please go into N-HANS/N_HANS___Selective_Noise/ and create lists for clean speech samples and environment noises. Feed the paths of the folders that individually consists of speech .wav files and noise .wav files in create_seeds, which will generate two pickle files (.pkl) containing speech and noise wav files, separately. To maximally train a system that is consistent with our trained model, we provide the seed lists for the data split of AudioSet Corpus (https://research.google.com/audioset/) in our publication. To download AudioSet_seeds.

    To train an speech separation system, please go into N-HANS/N_HANS___Speech_Separation/ and create a speech list using create_seeds direct to your folder containing speech .wav files, which will produce a .pkl file.

  2. Run main.py script with your specifications indicated by FLAGS appear in the following table (default specifications were used to achieve our trained_models). The reader.py provides the training, validataion and test data pipeline and feeds the data to N-HANS neural networks constructed in main.py.

FLAGS Default Funcationalities
--speech_wav_dir './speech_wav_dir/' the directory contains all speech .wav files
--noise_wav_dir './noise_wav_dir/' the directory contains all noise .wav files
--wav_dump_folder './wav_dump/' the directory to save denoised signals
--eval_seeds 'valid' evaluation is applied for 'valid' dataset. In test, change it to 'test'
--window_frames 35 number of frames of input noisy signal
--context_frames 200 number of frames of reference context signal
--random_slices 50 number of random samples from each pair of clean speech and noise signal
--model_name 'nhans' model name
--restore_path '' the path to restore trained model
--alg 'sgd' optimiser used to train N-HANS
--train_mb 64 mini-batch size for training data
--eval_mb 64 mini-batch size for validation or test data
--lr 0.1 learning rate
--mom 0.0 monentum for optimiser
--bn_decay 0.95 batch normalisation decay
--eval_before_training False Training phase: False, Test phase: True
--eval_after_training True Training phase: True, Test phase: False
--train_monitor_every 1000 show training information for each "train_monitor_every" batches
--eval_every 5000 show evaluation information for each "eval_every" training batches
--checkpoint_dir './checkpoints' directory to save checkpoints
--summaries './summaries' directory for summairies
--dump_results './dump' directory for intermediate output of model during training
  1. To test your model, restore_path is set to the trained models, --eval_seeds=test is required.

Authors and Contact Information

Releases

No releases published

Packages

No packages published

Languages