Skip to content

deeplook/Python-Sound-Tool

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Binder

PySoundTool

This project stemmed from the Prototype Fund project NoIze. This fork broadens the application of the software from smart noise filtering to general sound analysis, filtering, visualization, preparation, etc. Therefore the name has been adapted to more general sound functionality.

Note: for adjusting sound files, apply only to copies of the originals. Improvements need to be made to ensure files don't get overwritten except explicitly indicated.

Functionality

Jupyter notebooks

You can use some of the tools availabe here in my Jupyter notebooks. You can check them out here on Binder. As I do include some audio data, it may take a couple of minutes to load..

Installation

This repository serves as a place to explore sound. Therefore, small sound datasets are included in this repo. The size is appx. 30-40MB. If you clone this repo, this sound data will be cloned as well.

If you are fine with this, clone this repository. Set the working directory where you clone this repository.

Start a virtual environment:

$ python3 -m venv env
$ source env/bin/activate
(env)..$

Then install necessary installations via pip:

(env)..$ pip install -r requirements.txt

Feel free to use this tool in your own scripts (I show examples below). You can also explore some of its functionality via jupyter notebook:

(env)..$ jupyter notebook

Once this loads, click on the folder 'jupyternotebooks', and then on one of the .ipynb files.

Examples

You can run the examples below using ipython or other python console, or python script.

Install and run ipython:

(env)..$ pip install ipython
(env)..$ ipython
>>> # import what we need for the examples:
>>> import pysoundtool.explore_sound as exsound 
>>> import pysoundtool.soundprep as soundprep
>>> import pysoundtool as pyst 
>>> from pysoundtool.templates import soundclassifier
>>> from scipy.io.wavfile import write

Visualization

"Python": Time Domain

>>> exsound.visualize_signal('./audiodata/python.wav')

Imgur

"Python": Frequency Domain

The mel frequecy cepstral coefficients (MFCCs) and log-mel filterbank energies (FBANK) are two very common acoustic features used in machine and deep learning.

Let's take a look and see how the word "python" looks when these features are extracted and also how window settings influence the features.

MFCC:

window 20 ms, window overlap 10 ms (default)
>>> exsound.visualize_feats('./audiodata/python.wav', features='mfcc')

Imgur

window 100 ms, window overlap 50 ms
>>> exsound.visualize_feats('./audiodata/python.wav', features='mfcc',
                            win_size_ms = 100, win_shift_ms = 50)

Imgur

FBANK:

window 20 ms, window overlap 10 ms (default)
>>> exsound.visualize_feats('./audiodata/python.wav', features='fbank')

Imgur

window 100 ms, window overlap 50 ms
>>> exsound.visualize_feats('./audiodata/python.wav', features='fbank',
                            win_size_ms = 100, win_shift_ms = 50)

Imgur

Sound Creation

>>> data, sr = exsound.create_signal(freq=500, amplitude=0.5, samplerate=8000, dur_sec=0.2)
>>> data2, sr = exsound.create_signal(freq=1200, amplitude=0.9, samplerate=8000, dur_sec=0.2)
>>> data3, sr = exsound.create_signal(freq=200, amplitude=0.3, samplerate=8000, dur_sec=0.2)
>>> data_mixed = data + data2 + data3
>>> exsound.visualize_signal(data_mixed, samplerate=sr)

Imgur

Mixed with noise:

>>> noise = exsound.create_noise(len(data_mixed), amplitude=0.1)
>>> data_noisy = data_mixed + noise
>>> exsound.visualize_signal(data_noisy, samplerate = sr)

Imgur

In the time domain, it is difficult to see the three different signals at all...

>>> exsound.visualize_feats(data_noisy, samplerate=sr, features='fbank')

Imgur

In the frequency domain, you can see that there are distinct frequencies in the signal, approximately 3.

Note: I am working on improving the x and y labels... I am used to using Librosa.display.spechow which did make this a bit easier...

Sound File Prep

Convert to .wav file

>>> newfilename = soundprep.convert2wav('./audiodata/traffic.aiff')
>>> print(newfilename)
audiodata/traffic.wav

Ensure sound data is mono channel

>>> from scipy.io.wavfile import read
>>> sr, data = read('./audiodata/python.wav')
>>> datamono = soundprep.stereo2mono(data) # if it is already, nothing will change
>>> len(data) == sum(data==datamono)
True
>>> sr, data_2channel = read('./audiodata/dogbark_2channels.wav')
>>> data_2channel.shape
(18672, 2)
>>> data_1channel = soundprep.stereo2mono(data_2channel)
>>> data_1channel.shape
(18672,)
>>> data_2channel[:5]
array([[208, 208],
       [229, 229],
       [315, 315],
       [345, 345],
       [347, 348]], dtype=int16)
>>> data_1channel[:5]
array([208, 229, 315, 345, 347], dtype=int16)

Convert Soundfiles for use with scipy.io.wavfile

As of now, the software uses scipy.io.wavfile, a compatible module for Jupyter environments. If you have files that are not compatible, this should save the file / sound data as a compatible state.

>>> newfilename = soundprep.prep4scipywavfile('./audiodata/traffic.aiff')
Converting file to .wav
Saved file as audiodata/traffic.wav 

Adding sounds

Long background sound

Here we will add traffic background noise to the speech segment 'python'.

'python':

>>> python, sr = soundprep.loadsound('./audiodata/python.wav')
>>> exsound.visualize_signal(python, samplerate=sr)

Imgur

traffic background noise:

>>> traffic, sr = soundprep.loadsound('./audiodata/traffic.aiff')
Step 1: ensure filetype is compatible with scipy library
Success!
>>> exsound.visualize_signal(traffic, samplerate=sr)

Imgur

Combining them:

  • noise/sound to a scale of 0.3
  • 1 second delay for the speech
  • total length: 5 seconds
>>> python_traffic, sr = soundprep.add_sound_to_signal(
                                signal = './audiodata/python.wav',
                                sound = './audiodata/traffic.aiff',
                                scale = 0.3,
                                delay_target_sec = 1,
                                total_len_sec = 5)
>>> exsound.visualize_signal(python_traffic, samplerate=sr)

Imgur

Short background sound

rain background noise:

>>> rain, sr = soundprep.loadsound('./audiodata/rain.wav')
>>> exsound.visualize_signal(rain, samplerate=sr)

Imgur

This sound will be repeated to match the desired background noise length Note: sometimes artifacts can occur and may need additional processing. Longer background noises are more ideal.

>>> python_rain, sr = soundprep.add_sound_to_signal(
                                signal = './audiodata/python.wav',
                                sound = './audiodata/rain.wav',
                                scale = 0.3,
                                delay_target_sec = 1,
                                total_len_sec = 5)
>>> exsound.visualize_signal(python_rain, samplerate=sr)

Imgur

Filtering

NOTE: only .wav files of bit depth 16 or 32 can currently be used. See subsection 'Convert Soundfiles for use with scipy.io.wavfile'

For visuals, we will look at the sound as their FBANK features.

Noisy sound file

Add 'python' speech segment and traffic noise to create noisy speech. Save as .wav file.

>>> from scipy.io.wavfile import write
>>> speech = './audiodata/python.wav'
>>> noise = './audiodata/traffic.aiff'
>>> data_noisy, samplerate = soundprep.add_sound_to_signal(speech, noise, delay_target_sec=1, scale = 0.3, total_len_sec=5)
>>> noisy_speech_filename = './audiodata/python_traffic.wav'
>>> write(noisy_speech_filename, samplerate, data_noisy)
>>> exsound.visualize_feats(noisy_speech_filename, features='fbank')

Imgur

Then filter the traffic out:

>>> pyst.filtersignal(output_filename = 'python_traffic_filtered.wav',
                    wavfile = noisy_speech_filename,
                    scale = 1.5) # how strong the filter should be

Imgur

If there is some distortion in the signal, try a post filter:

>>> pyst.filtersignal(output_filename = 'python_traffic_filtered_postfilter.wav',
                    wavfile = noisy_speech_filename,
                    scale = 1.5, # how strong the filter should be
                    apply_postfilter = True) 

Imgur

Convolutional Neural Network: Simple sound classification

NOTE: only .wav files of bit depth 16 or 32 can currently be used. See subsection 'Convert Soundfiles for use with scipy.io.wavfile'

>>> from pysoundtool.templates import soundclassifier
>>> project_name = 'test_backgroundnoise_classifier'
>>> headpath = 'saved_features_and_models'
>>> audio_classes_dir = './audiodata/minidatasets/background_noise/'
>>> soundclassifier(project_name,
                headpath,
                audiodir = audio_classes_dir,
                feature_type = 'mfcc',
                target_wavfile = './audiodata/rain.wav')

Some model training stuff should print out... and at the end a label the sound was classified as:

Label classified:  cafe

ToDo

  • Ensure files cannot be overwritten unless explicitly indicated
  • Expand sound file compatibility: the software is JupyterLab/ notebook friendly but can only handle .wav files with 16 or 32 bitdepth
  • Improve accessibility of Jupyter Notebooks. Currently available on notebooks.ai (must have an account) and Binder(due to audiodata a bit slow)
  • Error handling (especially of incompatible sound files)
  • Adding more filters
  • Adding more machine learning architectures
  • Add more options for visualizations (e.g. stft features)
  • Implement neural network with TensorFlow Lite
  • Various platforms to store sample data (aside from Notebooks.ai and GitHub :P )
  • Increase general speed and efficiency

About

Visualize sound, create signals, filter sound, extract acoustic features, train neural networks, etc.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 93.9%
  • Shell 5.4%
  • Python 0.7%