Deep Noise Suppression - NSNet2

NSNet2 is a deep learning artificial recurrent neural network (RNN) used for background noise reduction in speech audio files. Microsoft originally released NSNet2 as an updated comparision baseline for their annual Deep Noise Suppression (DNS) challenge, but it is inconvient to use and is not suited for real-time processing. Use of the released NSNet2 model requires correct versions of Python and supporting libarires (including PyTorch and ONNXruntime) to be installed and properly linked together. These are relatively big software packages to install and configure just to run this one neural network model. Running NSNet2 without any changes also uses massive amounts of uneccessary memory (RAM) that scales with the size of the audio file.

Currently there are not many wide-range, pre-trained, and effective noise suppressors for speech that can be used easily. Projects like RNNoise have several quirks, but NSNet2 can be the next-step-up for fine-tuning captured speech. NSNet2 just needed to be converted to a version that more people looking for additional audio filters for recorded speech could utilize, which is the major focus of this project. This nameless project is a user friendly conversion of the NSNet2 released by Microsoft Research. An example where this software was applied can be found on this project's website.

What Are the Project Goals? (Features)

Noise Reduction of Speech Audio
Usable, Fast, and Straightforward versions of NSNet2
Input wave (.wav) audio file -> Output equivalent noise suppressed audio file
Offline | Non-Live and Real-time | Live versions
Low RAM Usage while maintaining the same processing speed as NSNet2 (Offline version)
Utilize Single-Instruction-Multiple-Data Instructions (AVX2 and FMA) of modern x64 CPUs
Thorough Explanation (with diagrams) of how NSNet2 performs effective Noise Reduction
Thorough Explanation (with diagrams) of what the code is doing and why it was written that way
Simple to Compile and Modify
No reliance on math or general matrix calculation libraries for any repeating calculations

Slightly More Background

This project was created as a starting point into creating open audio noise reduction software. Audio noise suppression research (with and without using neural networks) produces various publications and snippets throughout the web but rarely leads to open (and pre-trained) usable software. Deep learning models get compared in the Microsoft DNS-challenge and while some of the model designs are published (usually with only minimal information) the exact implementations and trained model parameter values are kept private. However, these light publications sometimes give enough information that the network model can be mostly recreated or be useful to synthesize hybrid designs. Models can then be trained with customizeable training file sets (like the one from the DNS-challenge). The final results could then be run through a comparison process against each other and possibly against the DNS-challenge results.

Since NSNet2 was published with the exact implementation and trained values, converting the model to a more user friendly version was straightforward. The model was published in the open ONNX format, which meant testing if the ONNX runtime software could be used as the main and biggest dependency alongside compileable code. Unfortunately the current ONNX runtime software suffers from the innability to carry over the model's Gated Recurrent Unit (GRU) hidden states from a previous run which is the biggest reason it is unsuitable for real-time versions of NSNet2. The model value data was extracted (and reorganized) from the ONNX file to be used with the converted version of the model.

TO ADD (software / code principles and links to detailed explanation site with examples)

How To Use It

Basic Conversion using a Modern Windows x64 Computer

Convert a video or audio file to a WAVE (.wav) audio file using ffmpeg and the Command Prompt (or Windows Powershell)
ffmpeg.exe -i inputFile -vn -ac 1 output.wav
where "-vn" removes any video element and "-ac 1" mixes the audio into one channel (mono)
Run (Double-click) the latest NSNet2Offline.exe executable downloaded from the releases page of this project
Navigate to and select the "output.wav" audio file created in step 1
The converter will create "output-Enhanced.wav" in the same directory that "output.wav" resides This process should take about 2-8 seconds for every 60 seconds of audio
Listen and compare the resulting noise suppressed file to the original

Comparison against RNNoise

TO ADD (use ffmpeg)

Audio Files with "Louder" but Consistent Background Noise

TO ADD (pre-process with Audacity Noise Reduction Effect)

Current Limitations; Version 0.1.1

Works only with 48kHz 1-Channel (Mono) Wave Audio Files
Offline Version Released Only
Does not convert 2-Channel (Stereo) Audio
Real-time version is RAM memory bound (reads ~23.5MB of data for every 10ms of converted audio)
Too basic and minimal WAVE file error checking
Only works on Windows OS (Tested with fresh install of Windows 10)
Requires a newish x64 CPU with AVX2 and FMA support
TO ADD

Planned Features

Live Version
2-Channel Convert; Each channel seperately and mixed stereo to mono then convert
Multithreading Capability for Offline Version
Adjustable RAM usuage (might increase offline processing speed by a tiny amount)
More Code documentation
Linux and FreeBSD support
TO ADD

How to Compile It (with Windows)

Setup and Necessary Libraries

All C code is currently written to be compiled with gcc for Windows using the MinGW-w64 software. Latest versions can be found here. The C code will be modified in the future to be compiled with gcc no-matter the operating system.

All x64 Assembly code (currently containing only subroutine functions) were written to be assembled by the flat assembler (FASM). The assembly code contains the functions that do the main processing and make AVX2 and FMA calls. The assembled object files get linked into the final executable by gcc / ld

The FFTW library is used for performing the Discrete Fourier Transform (DFT) and its inverse. The single-percision floating point static library version (for Windows) needs to be compiled and will get linked into the final executable by gcc / ld. In the current source of FFTW, CMAKE can be used with MinGW-w64 on Windows to create the static libray after a couple of modifications. The memory allocation file needs to be modified before utilizing the CMAKE script and the script needs to specify the following options: (TO ADD)

Using Make

The Makefile can be used to create the executables found on the release page. MinGW-w64 comes with Make that can process the Makefile to compile the source code once MinGW binaries and FASM binaries are added to the path. Using the Command Prompt (or Windows Powershell) change directory into the root folder of this project and run: mingw32-make.exe

The resulting executable and necessary networkData.bin file can be found in the bin subdirectory. The Makefile directs Make to use both gcc and FASM to create the intermediate object files from the source code which then get linked together with the fftw libaray in the executable by gcc / ld

What Can Be Modified?

TO ADD

Other Projects for the Future

How to (re)-train the neural network data from the DNS Challenge set
Optimized version of RNNoise project with prinicples taken from this project
New noise reduction project utilizing neural networks and ideas taken from other projects and published works
TO ADD

How to Reach The Developer

Email me: Jared.Loewenthal@proton.me

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
bin		bin
docs		docs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
windowsClean.bat		windowsClean.bat
windowsMakeAndTest.bat		windowsMakeAndTest.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

docs

docs

src

src

.gitignore

.gitignore

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

windowsClean.bat

windowsClean.bat

windowsMakeAndTest.bat

windowsMakeAndTest.bat

Repository files navigation

Deep Noise Suppression - NSNet2

What Are the Project Goals? (Features)

Slightly More Background

How To Use It

Basic Conversion using a Modern Windows x64 Computer

Comparison against RNNoise

Audio Files with "Louder" but Consistent Background Noise

Current Limitations; Version 0.1.1

Planned Features

How to Compile It (with Windows)

Setup and Necessary Libraries

Using Make

What Can Be Modified?

Other Projects for the Future

How to Reach The Developer

About

Releases 2

Languages

License

MediaEnhanced/DNS-NSNet2

Folders and files

Latest commit

History

Repository files navigation

Deep Noise Suppression - NSNet2

What Are the Project Goals? (Features)

Slightly More Background

How To Use It

Basic Conversion using a Modern Windows x64 Computer

Comparison against RNNoise

Audio Files with "Louder" but Consistent Background Noise

Current Limitations; Version 0.1.1

Planned Features

How to Compile It (with Windows)

Setup and Necessary Libraries

Using Make

What Can Be Modified?

Other Projects for the Future

How to Reach The Developer

About

Topics

Resources

License

Stars

Watchers

Forks

Languages