GitHub - mattjklassen/TorchAudioSplines: Spline modeling of audio signals

Branches Tags
Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Pd		Pd
audio		audio
code		code
doc		doc
.DS_Store		.DS_Store
PaulsFavoriteMelody.png		PaulsFavoriteMelody.png
README.txt		README.txt
Repository files navigation

These python scripts deal with the modeling of audio segments with cubic splines.
A segment is referred to as a cycle if it occurs inside a larger segment which has
approximate fundamental frequency f_0, and the cycle has approximate length 1/f_0.

Suggested tests to run:

> python torchSpline.py

... plots cubic spline on interval [0,1] interpolating n=20 points with values in [-1,1]
equal to zero at the ends and random values in between. Edit code to change n.

> python wavspline.py ../audio/input.wav 200 500 20

... computes and plots spline for audio segment from input.wav for samples 200 to 500
with n = 20 interpolation points.

> python getf0.py 

... computes estimate of f0 for audio file ../audio/guitarA445.wav getArgMax() as the first
approximation which uses torch.stft and takes the ArgMax bin below a set threshold, 
then uses average of cycle lengths for refined f0, which may be a new method.

> python findCycles.py 

... finds cycles for audio file ../audio/guitarA445.wav using getCycles() 
then prints pdf report to ../doc/out.pdf.


Summary of python files (in alphabetical order) with brief description of each:


1  argMaxSpec.py

# ----- Brief Description -----
# 
# getArgMax(waveform, rate, N, hop_size) computes weak f_0 using STFT 
# where waveform is a torch tensor, returns argMax as weak f_0.
# only use freq bins < (1/scale) * Nyquist = 1000 Hz, so scale = Nyquist / 1000.
# this is to get a weak f_0 less than 1000 Hz.
# plotSpecArgMax does the same as getArgMax but also plots the magnitude spectrum
# which is being used to get argMax.
#


2  computeBsplineVal.py

# ----- Brief Description -----
# 
# inputs: t is float assumed from 0 to 1, c (or bcoeffs) is an array (or tensor) of n B-spline coefficients,
# k is number of subintervals, d is degree (default 3), knotVals is the knot sequence, with usual default:
# 0,0,0,0,1/k,2/k,...,(k-1)/k,1,1,1,1  (so t_i goes from i=0 to N, with N=n+d+1=N+4 if d=3)
# compute value of spline f(t) for input t in interval [0,1]
# where f(t) = sum of c_i B^3_i(t) for i = 0,...,N-d-1=N-4
#
# computeSplineVal(d, k, c, t) computes f(t)
# computeSplineVal2(d, bcoeffs, knotVals, t) computes f(t)
# newBsplineVal(d, k, j, t) computes one B-spline B^d_j(t)
#


3  cycleSpline.py

# ----- Brief Description -----
# 
# Here we convert the script wavspline.py into a function with parameters:
# waveform, end_points a and b, and n (spline dim)
# 
# The function reads in a chunk of audio samples (say 2048 samples) and some float endpoints,
# a and b, with 0 <= a < b <= 2047, and graphs a cubic spline that approximates the audio
# graph over the interval [a,b]. The graph is returned as a plt.figure.  
# The cubic spline interpolates n target values which are data points on the piecewise linear 
# graph which connects sample points. (The function is way too long, need to break it up ...)
#


4  findCycles.py

# ----- Brief Description -----
# 
# In this script we do three main things:
# 1. find weak f_0 with getArgMax() (for waveform about 2048 samples)
# 2. find cycles (in waveform) with getCycles()
# 3. produce output summary and graphs of cycles to pdf
# This script is broken up into separate functions in getCycleInfo.py
#


5  genCycle.py

# ----- Brief Description -----
# 
# Generate one cycle as waveform, and return waveform.
# inputs: cycle = [a,b], B-spline coefficients vector (tensor) = bcoeffs.
# assume a and b are time values in samples between integer points, so that the
# spline is computed on interval [a,b] and evaluated at M = floor(b)-floor(a) integer 
# points or samples to produce waveform sample values. The waveform will have
# values indexed 0 to M-1 at given sample_rate. 
#


6  genWavTone.py

# ----- Brief Description -----
# 
# Generate waveform given fundamental frequency f0 and key cycles using cycle interpolation. 
# genWavTone() returns waveform as tensor, insertWavTone writes into larger waveform tensor.
# inputs:  f0 = fundamental frequency, sample_rate = sample rate
# time = waveform duration in seconds,
# key_bcoeffs = B-spline coefficients vectors of each key cycle,
# keys = indices of key cycles
# gains = scalar multipliers for each key cycle, for envelope
#


7  getBcoeffs.py

# ----- Brief Description -----
# 
# get bcoeffs (B-spline coefficients) from a cycle (or segment) [a,b] in a waveform
# inputs: waveform (tensor of audio data), cycle = [a,b], n = dimension of splines
# return: bcoeffs vector 
#


8  getCycleInfo.py

# ----- Brief Description -----
# 
# breaking up findCycles.py into separate functions which we can call in material.py
# get_segments(waveform, sample_rate, segment_size)
# process_segment(segment, index, segment_size, sample_rate, n, N, hop_size, txt1, txt2)
# 
#


9  getCycles.py

# ----- Brief Description -----
# 
# functions:  getCycles() and getf0withCycles()
# In the function getCycles we find "cycles" in an audio segment given weak f_0 (fundamental frequency).
# The weak f_0 is found by the function getArgMax() and the cycles are then found with getCycles()
# By "cycle" we mean a time interval [a,b] (with a and b time values in float samples)
# where time is measured from 0 in the audio segment, and where b-a has length in samples
# predicted by f_0, so b-a is approximately sample_rate * 1/f_0 (samples/cycle). 
# The function getf0withCycles() uses the above and then simply averages cycle lengths to get refined f0.
#


10  getf0.py

# ----- Brief Description -----
# 
# In this script we use getf0withCycles() applied to an audio file input.
# (change audio file with path variable below ...)
# Briefly, this constructs an estimate of f0 by first doing STFT and argMax
# then refining this estimate using zero crossings to form cycles with getCycles()
#


11  getKeyCycles.py

# ----- Brief Description -----
# 
# In this script we have modified findCycles.py to do three main things:
# 1. find weak f_0 with getArgMax() in segments of 2048 samples
# 2. find average of weak f_0 values to use when finding cycles
# 2. find cycles (as endpoints only) in entire audio file input with getCycles()
# 3. write selected bcoeffs of key cycles to files
# (do not produce output summary and graphs of cycles to pdf)
#


12  getKnots.py

# ----- Brief Description -----
# 
# import and export knot sequences to text files knots-[description].txt
# also generate knotVals as standard sequence given n = dim of cubic splines
#


13  getStatVals.py

# ----- Brief Description -----
# 
# get stationary points from bcoeffs on interval [0,1]
#


14  material.py

# ----- Brief Description -----
# 
# create directory material-<name>-dim<n> where name = audio file name without .wav,
# and n = dimension of splines.  Then put melodic segments with transformed versions
# and report into directory.  Report should include plot of splines used also.
# command line args: 
# 1: audio file prefix (name)
# 2: dimension of splines (n)
#


15  melody.py

# ----- Brief Description -----
# 
# create melody based on spline curve
#


16  melody2.py

# ----- Brief Description -----
# 
# create melody based on spline curve, also with varying note durations
#


17  melody3.py

# ----- Brief Description -----
# 
# create melody based on spline curve, also with varying note durations
# but now use the stationary points to determine pitch and note duration.
#


18  melody4.py

# ----- Brief Description -----
# 
# Create melody based on spline curve using spline values y to determine pitch
# and x values to determine time durations.  If notes=0 then we use stationary points
# and if notes>0 we use that many equal divisions of the interval [0,1].
# Durations are scaled so that first note lasts for time0 seconds.
# possible command line: (see below for details)
# python melody4.py bcoeffs0.txt f0=234 scale=3 notes=4 shift=5 time0=0.123 r i
# or to use stationary points and do retrograde inversion (and other defaults):
# python melody4.py bcoeffs0.txt r i
#


19  melody5.py

# ----- Brief Description -----
# 
# melody5.py is derived from melody4.py but now expanded in various ways:
#
# 1. we now use config file mel5config.txt which contains the parameters used
#    to construct the melody or melodic fragment from bcoeffs files etc.
# 2. now use a sequence of key bcoeffs for note timbre, contained in files listed in config
# 3. use one designated cycle for the melodic contour
# 4. allow for regularly spaced notes in melodic countour sampling with "notes=12" etc.
# 5. allow for stationary point melodic contour sampling with "stat=1"
#


20  melody6.py

# ----- Brief Description -----
# 
# melody6.py is derived from melody5.py but now adding polyphony or voicing.
#
# 1. use config file mel6config.txt which contains the parameters used
#    to construct the melody or melodic fragment from bcoeffs files etc.
#
# 2. add polyphony, or voicing:
#    in addition to those configs in mel5config.txt we now control the duration
#    of notes in various ways.  For instance, note durations can all be set to
#    last for twice as long as the designated duration in the melody, so they overlap.
#    The waveform for each note will then be stored in a buffer chosen from a list of 
#    buffers, each representing a voice, or channel, and then these buffers will all 
#    be mixed before writing the final output.  
#


21  melodySplinusoid.py

# ----- Brief Description -----
# 
# create melody based on spline curve approx of sin(2Pi*x) with splinusoid.
#


22  plotBcoeffs.py

# ----- Brief Description -----
#
# Plot cubic spline f(t) with bcoeffs coming from file as arg1 on command line, and
# optional knot sequence from file as arg2, and optional inputs from file as arg3.
# Default knot sequence is 0,0,0,0,1/k,2/k,...,(k-1)/k,1,1,1,1 and if inputs are
# given then they are also plotted as points in red. (Inputs are not needed for plot)
#


23  plotSegmentSpline.py

# ----- Brief Description -----
# 
# Here we convert the script wavspline.py into a function with parameters:
# waveform, end_points a and b (float time values measured in samples), and n (spline dim).
# Output to console gives audio file and segment info,
# output with matplot has audio graph as piecewise linear in grey
# overlayed with spline curve in green and n interpolation points highlighted in red.
#


24  plotSpec.py

# ----- Brief Description -----
# 
# This first draft of script to read in audio file and plot spectrum.
# reads in audio file left.wav and computes magnitude spectrum with stft and plots it.
#


25  rec2Spec.py

# ----- Brief Description -----
# 
# Record two seconds of audio and save as output.wav, then reopen this file 
# and do several spectrograms and predict a sequence of f_0 for each segment.
# Output this info to console, or also to spectrogram graphs with matplot.
#


26  record.py

# ----- Brief Description -----
# 
# Record mono from mic at given sample RATE in chunks of size CHUNK samples.
# Duration is in SECONDS, output is output.wav
#


27  recSpec.py

# ----- Brief Description -----
# 
# this program is superceded by rec2spec.py with options to print pdf of spec or not
# records one second of audio and saves as output.wav and then reopens
# this file and does spectrogram and graph.
# sample rate is 16K
#


28  scale.py

# ----- Brief Description -----
# 
# write chromatic scale to wav file
#


29  scale2.py

# ----- Brief Description -----
# 
# write two octave chromatic scale to wav file
#


30  t2Spline.py

# ----- Brief Description -----
#
# Plot cubic spline f(t) through n points (x,y) with x in [0,1], y in [-1,1]
# with f(0)=0=f(1), and f'(0)=0=f'(1) and other y-values randomly generated.
#


31  t3Spline.py

# ----- Brief Description -----
#
# Plot cubic spline f(t) through n points (x,y) with x in [0,1], y in [-1,1]
# with f(0)=0=f(1), and f'(0)=Pi, f'(1)=-Pi, f(1/2)=1, f(1/4)=2^(-1/2)=f(3/4).
# These seven conditions match the function y = sin(pi*x) on [-1,1].
# So n=7, k=4, d=3.
#


32  testBcoeffs.py

# ----- Brief Description -----
# 
# get bcoeffs (B-spline coefficients) from a cycle (or segment) [a,b] in a waveform
# inputs: waveform (tensor of audio data), cycle = [a,b], n = dimension of splines
# return: bcoeffs vector 
#


33  testGenWav.py

# ----- Brief Description -----
# 
# construct one tone of 1 sec long with genWavTone()
#


34  testing.py



35  testSegmentSpline.py

# ----- Brief Description -----
# 
# Here we convert the script wavspline.py into a function with parameters:
# waveform, end_points a and b (float time values measured in samples), and n (spline dim).
# Output to console gives audio file and segment info,
# output with matplot has audio graph as piecewise linear in grey
# overlayed with spline curve in green and n interpolation points highlighted in red.
#


36  testSplinusoid.py

# ----- Brief Description -----
# 
# construct one tone of 1 sec long with genWavTone2()
# uses bcoeffs and knotVals to allow for new knot sequence like splinusoid
#


37  tone5.py

# ----- Brief Description -----
# 
# (based on testGenWav.py which was to construct one tone of 1 sec long with genWavTone() )
# This one is for testing a dulcimer tone using 32 key cycles, with various cases.
# The key cycles are chose from a 3 second long sample called dulcimerA3-f.wav with
# fundamental frequency f_0 = 220 Hz, so approximately 3*220 = 660 cycles, from which we chose 32.
# The bcoeffs files are given below.
#


38  tone6.py

# ----- Brief Description -----
# 
# Continuing with testing a dulcimer tone using 32 key cycles, now we break those up into
# 17 subsequences of 16 consecutive key cycles, and construct a waveform in each case.
# 
# The key cycles are chosen from a 3 second long sample called dulcimerA3-f.wav with
# fundamental frequency f_0 = 220 Hz, so approximately 3*220 = 660 cycles, from which we chose 32.
# The bcoeffs files are given below.
#


39  tone7.py

# ----- Brief Description -----
# 
# Continuing with testing a dulcimer tone using 32 key cycles, now we break those up into
# 17 subsequences of 16 consecutive key cycles, and construct a waveform in each case.
# 
# The key cycles are chosen from a 3 second long sample called dulcimerA3-f.wav with
# fundamental frequency f_0 = 220 Hz, so approximately 3*220 = 660 cycles, from which we chose 32.
# The bcoeffs files are given below.
#


40  torchSpline.py

# ----- Brief Description -----
#
# Plot cubic spline f(t) through n points (x,y) with x in [0,1], y in [-1,1]
# with f(0)=0=f(1) and other y-values randomly generated.
#


41  wavplot.py

# ----- Brief Description -----
# 
# This program takes a wav file and begin and end sample numbers as command line input
# and draws a plot of the wav file sample values with matplot as piecewise linear graph.
# The sample values are also printed out as text on the command line.
#


42  wavspline.py

# ----- Brief Description -----
# 
# This program takes the following input on the command line:
# [1] wav file 
# [2] start sample number
# [3] end sample number
# [4] n (number of spline interpolation points = dimension of cubic spline vector space)
# output graph with matplot is the original audio graph as piecewise linear function
# overlayed with spline curve in green and interpolation points highlighted in red.
#


43  writewav.py

# ----- Brief Description -----
# 
# read a wav file at sample_rate (like 16K), do linear interpolation bewteen samples
# and write output at 3 * sample_rate (like 48K), wav file.  
# Assume both files' data is 16-bit, short ints.
#


44  yinapp.py

# ----- Brief Description -----
# 
# import audio file and apply yin to chunks of samples
#


45  yinPyTorch.py

# ----- Brief Description -----
# 
# This script tests torchyin as in the example code, by creating a tensor of the
# piano fundamental frequencies f_0 and corresponding sinusoids with those f_0
# and then running YIN f_0 prediction on those audio samples.  YIN predicts a
# period length tau by using correlation function comparisons for various values
# of tau to detect maximum values (see https://brentspell.com/2022/pytorch-yin/).
# We add to this example some output which shows that the predictions are less
# accurate as frequency increases and especially when tau is small and is at
# the midpoint between two integers. (see comments at the bottom of output)
#