PySoundTool

This project stemmed from the Prototype Fund project NoIze. This fork broadens the application of the software from smart noise filtering to general sound analysis, filtering, visualization, preparation, etc. Therefore the name has been adapted to more general sound functionality.

Note: for adjusting sound files, apply only to copies of the originals. Improvements need to be made to ensure files don't get overwritten except explicitly indicated.

Functionality

Visualize signals
Visualize feature extraction (i.e. for machine and deep learning)
Sound creation
Sound manipulation (e.g. adding sounds)
Filtering
Machine Learning (i.e. Convolutional Neural Network)
Sound data adjustment (e.g. file type, bit depth, etc.). This is useful if you would like to use new sound files with these tools but that aren't compatible with scipy.io.wavfile (a Jupyter environment friendly module).

Jupyter notebooks

You can use some of the tools availabe here in my Jupyter notebooks. You can check them out here on Binder. As I do include some audio data, it may take a couple of minutes to load..

Installation

This repository serves as a place to explore sound. Therefore, small sound datasets are included in this repo. The size is appx. 30-40MB. If you clone this repo, this sound data will be cloned as well.

If you are fine with this, clone this repository. Set the working directory where you clone this repository.

Start a virtual environment:

$ python3 -m venv env
$ source env/bin/activate
(env)..$

Then install necessary installations via pip:

(env)..$ pip install -r requirements.txt

Feel free to use this tool in your own scripts (I show examples below). You can also explore some of its functionality via jupyter notebook:

(env)..$ jupyter notebook

Once this loads, click on the folder 'jupyternotebooks', and then on one of the .ipynb files.

Examples

You can run the examples below using ipython or other python console, or python script.

Install and run ipython:

(env)..$ pip install ipython
(env)..$ ipython
>>> # import what we need for the examples:
>>> import pysoundtool.explore_sound as exsound 
>>> import pysoundtool.soundprep as soundprep
>>> import pysoundtool as pyst 
>>> from pysoundtool.templates import soundclassifier
>>> from scipy.io.wavfile import write

Visualization

"Python": Time Domain

>>> exsound.visualize_signal('./audiodata/python.wav')

"Python": Frequency Domain

The mel frequecy cepstral coefficients (MFCCs) and log-mel filterbank energies (FBANK) are two very common acoustic features used in machine and deep learning.

Let's take a look and see how the word "python" looks when these features are extracted and also how window settings influence the features.

MFCC:

window 20 ms, window overlap 10 ms (default)

>>> exsound.visualize_feats('./audiodata/python.wav', features='mfcc')

window 100 ms, window overlap 50 ms

>>> exsound.visualize_feats('./audiodata/python.wav', features='mfcc',
                            win_size_ms = 100, win_shift_ms = 50)

FBANK:

window 20 ms, window overlap 10 ms (default)

>>> exsound.visualize_feats('./audiodata/python.wav', features='fbank')

window 100 ms, window overlap 50 ms

>>> exsound.visualize_feats('./audiodata/python.wav', features='fbank',
                            win_size_ms = 100, win_shift_ms = 50)

Sound Creation

>>> data, sr = exsound.create_signal(freq=500, amplitude=0.5, samplerate=8000, dur_sec=0.2)
>>> data2, sr = exsound.create_signal(freq=1200, amplitude=0.9, samplerate=8000, dur_sec=0.2)
>>> data3, sr = exsound.create_signal(freq=200, amplitude=0.3, samplerate=8000, dur_sec=0.2)
>>> data_mixed = data + data2 + data3
>>> exsound.visualize_signal(data_mixed, samplerate=sr)

Mixed with noise:

>>> noise = exsound.create_noise(len(data_mixed), amplitude=0.1)
>>> data_noisy = data_mixed + noise
>>> exsound.visualize_signal(data_noisy, samplerate = sr)

In the time domain, it is difficult to see the three different signals at all...

>>> exsound.visualize_feats(data_noisy, samplerate=sr, features='fbank')

In the frequency domain, you can see that there are distinct frequencies in the signal, approximately 3.

Note: I am working on improving the x and y labels... I am used to using Librosa.display.spechow which did make this a bit easier...

Sound File Prep

Convert to .wav file

>>> newfilename = soundprep.convert2wav('./audiodata/traffic.aiff')
>>> print(newfilename)
audiodata/traffic.wav

Ensure sound data is mono channel

>>> from scipy.io.wavfile import read
>>> sr, data = read('./audiodata/python.wav')
>>> datamono = soundprep.stereo2mono(data) # if it is already, nothing will change
>>> len(data) == sum(data==datamono)
True
>>> sr, data_2channel = read('./audiodata/dogbark_2channels.wav')
>>> data_2channel.shape
(18672, 2)
>>> data_1channel = soundprep.stereo2mono(data_2channel)
>>> data_1channel.shape
(18672,)
>>> data_2channel[:5]
array([[208, 208],
       [229, 229],
       [315, 315],
       [345, 345],
       [347, 348]], dtype=int16)
>>> data_1channel[:5]
array([208, 229, 315, 345, 347], dtype=int16)

Convert Soundfiles for use with scipy.io.wavfile

As of now, the software uses scipy.io.wavfile, a compatible module for Jupyter environments. If you have files that are not compatible, this should save the file / sound data as a compatible state.

>>> newfilename = soundprep.prep4scipywavfile('./audiodata/traffic.aiff')
Converting file to .wav
Saved file as audiodata/traffic.wav

Adding sounds

Long background sound

Here we will add traffic background noise to the speech segment 'python'.

'python':

>>> python, sr = soundprep.loadsound('./audiodata/python.wav')
>>> exsound.visualize_signal(python, samplerate=sr)

traffic background noise:

>>> traffic, sr = soundprep.loadsound('./audiodata/traffic.aiff')
Step 1: ensure filetype is compatible with scipy library
Success!
>>> exsound.visualize_signal(traffic, samplerate=sr)

Combining them:

noise/sound to a scale of 0.3
1 second delay for the speech
total length: 5 seconds

>>> python_traffic, sr = soundprep.add_sound_to_signal(
                                signal = './audiodata/python.wav',
                                sound = './audiodata/traffic.aiff',
                                scale = 0.3,
                                delay_target_sec = 1,
                                total_len_sec = 5)
>>> exsound.visualize_signal(python_traffic, samplerate=sr)

Short background sound

rain background noise:

>>> rain, sr = soundprep.loadsound('./audiodata/rain.wav')
>>> exsound.visualize_signal(rain, samplerate=sr)

This sound will be repeated to match the desired background noise length Note: sometimes artifacts can occur and may need additional processing. Longer background noises are more ideal.

>>> python_rain, sr = soundprep.add_sound_to_signal(
                                signal = './audiodata/python.wav',
                                sound = './audiodata/rain.wav',
                                scale = 0.3,
                                delay_target_sec = 1,
                                total_len_sec = 5)
>>> exsound.visualize_signal(python_rain, samplerate=sr)

Filtering

NOTE: only .wav files of bit depth 16 or 32 can currently be used. See subsection 'Convert Soundfiles for use with scipy.io.wavfile'

For visuals, we will look at the sound as their FBANK features.

Noisy sound file

Add 'python' speech segment and traffic noise to create noisy speech. Save as .wav file.

>>> from scipy.io.wavfile import write
>>> speech = './audiodata/python.wav'
>>> noise = './audiodata/traffic.aiff'
>>> data_noisy, samplerate = soundprep.add_sound_to_signal(speech, noise, delay_target_sec=1, scale = 0.3, total_len_sec=5)
>>> noisy_speech_filename = './audiodata/python_traffic.wav'
>>> write(noisy_speech_filename, samplerate, data_noisy)
>>> exsound.visualize_feats(noisy_speech_filename, features='fbank')

Then filter the traffic out:

>>> pyst.filtersignal(output_filename = 'python_traffic_filtered.wav',
                    wavfile = noisy_speech_filename,
                    scale = 1.5) # how strong the filter should be

If there is some distortion in the signal, try a post filter:

>>> pyst.filtersignal(output_filename = 'python_traffic_filtered_postfilter.wav',
                    wavfile = noisy_speech_filename,
                    scale = 1.5, # how strong the filter should be
                    apply_postfilter = True)

Convolutional Neural Network: Simple sound classification

NOTE: only .wav files of bit depth 16 or 32 can currently be used. See subsection 'Convert Soundfiles for use with scipy.io.wavfile'

>>> from pysoundtool.templates import soundclassifier
>>> project_name = 'test_backgroundnoise_classifier'
>>> headpath = 'saved_features_and_models'
>>> audio_classes_dir = './audiodata/minidatasets/background_noise/'
>>> soundclassifier(project_name,
                headpath,
                audiodir = audio_classes_dir,
                feature_type = 'mfcc',
                target_wavfile = './audiodata/rain.wav')

Some model training stuff should print out... and at the end a label the sound was classified as:

Label classified:  cafe

ToDo

Ensure files cannot be overwritten unless explicitly indicated
Expand sound file compatibility: the software is JupyterLab/ notebook friendly but can only handle .wav files with 16 or 32 bitdepth
Improve accessibility of Jupyter Notebooks. Currently available on notebooks.ai (must have an account) and Binder(due to audiodata a bit slow)
Error handling (especially of incompatible sound files)
Adding more filters
Adding more machine learning architectures
Add more options for visualizations (e.g. stft features)
Implement neural network with TensorFlow Lite
Various platforms to store sample data (aside from Notebooks.ai and GitHub :P )
Increase general speed and efficiency

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
audiodata		audiodata
jupyternotebooks		jupyternotebooks
pysoundtool		pysoundtool
.gitignore		.gitignore
Copying.docx		Copying.docx
LICENSING.docx		LICENSING.docx
README.md		README.md
THIRD-PARTY-NOTICES.docx		THIRD-PARTY-NOTICES.docx
requirements.txt		requirements.txt

deeplook/Python-Sound-Tool

Folders and files

Latest commit

History

Repository files navigation

PySoundTool

Functionality

Jupyter notebooks

Installation

Examples

Visualization

"Python": Time Domain

"Python": Frequency Domain

MFCC:

window 20 ms, window overlap 10 ms (default)

window 100 ms, window overlap 50 ms

FBANK:

window 20 ms, window overlap 10 ms (default)

window 100 ms, window overlap 50 ms

Sound Creation

Sound File Prep

Convert to .wav file

Ensure sound data is mono channel

Convert Soundfiles for use with scipy.io.wavfile

Adding sounds

Long background sound

Short background sound

Filtering

Noisy sound file

Convolutional Neural Network: Simple sound classification

ToDo

About

Resources

Stars

Watchers

Forks

Languages