/
melosynth.py
404 lines (322 loc) · 15.4 KB
/
melosynth.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
#!/usr/bin/env python
# CREATED: 6/13/14 10:57 AM by Justin Salamon <justin.salamon@nyu.edu>
"""
@file melosynth.py
@author Justin Salamon <www.justinsalamon.com>
@version 0.1.1
@section DESCRIPTION
MeloSynth: synthesize a melody
MeloSynth is a python script to synthesize melodies represented as a sequence of
pitch (frequency) values. It was written to synthesize the output of the
MELODIA Melody Extraction Vamp Plugin (http://mtg.upf.edu/technologies/melodia),
but can be used to synthesize any pitch sequence represented as a two-column txt
or csv file where the first column contains timestamps and the second contains
the corresponding frequency values in Hertz.
@section USAGE
usage: melosynth.py [-h] [--output OUTPUT] [--fs FS] [--nHarmonics NHARMONICS]
[--square] [--useneg] [--batch]
inputfile
positional arguments:
inputfile Path to input file containing the pitch sequence
optional arguments:
-h, --help show this help message and exit
--output OUTPUT Path to output wav file. If not specified a file will
be created with the same path/name as inputfile but
ending with "_melosynth.wav".
--fs FS Sampling frequency for the synthesized file. If not
specified the default value of 16000 Hz is used.
--nHarmonics NHARMONICS
Number of harmonics (including the fundamental) to use
in the synthesis (default is 1). As the number is
increased the wave will become more sawtooth-like.
--square Converge to square wave instead of sawtooth as the
number of harmonics is increased.
--useneg By default, negative frequency values (unvoiced
frames) are synthesized as silence. Setting the
--useneg option will synthesize these frames using
their absolute values (i.e. as voiced frames).
--batch Treat inputfile as a folder and batch process every
file within this folder that ends with .csv or .txt.
If --output is specified it is expected to be a folder
too. If --output is not specified, all synthesized
files will be saved into the input folder.
@section EXAMPLES
Basic usage, without any options:
>python melosynth.py ~/Documents/daisy3_melodia.csv
This will create a file called daisy3_melodia_melosynth.wav in the same folder
as the input file (~/Documents/) and use all the default parameter values for
the synthesis.
Advanced usage, including options:
>python melosynth.py ~/Documents/daisy3_melodia.csv --output ~/Music/mynewfile.wav --fs 44100 --nHarmonics 10 --square --useneg
Here we are providing a specified path for the output instead of the default
location. Next we specify the sample rate for the output (44.1 kHz) instead of
the default value of 16000 Hz. Next, we specify the number of harmonics to use
(10) instead of the default value of 1. Normally, as the number of harmonics is
increased the waveform will converge to a sawtooth wave, however, since we
specify the --square option, it will converge to a square wave instead. Finally,
by specifying the --useneg (use negative) option we make the script use the
absolute value of the frequencies so that negative frequencies are not
synthesized as silence (which is the default behaviour).
Batch processing:
>python melosynth.py ~/Documents/melodia_pitch/ --output ~/Documents/melodia_synth/ --batch
This will batch process all files ending with .txt or .csv in the melodia_pitch
folder, and save the synthesized melodies into the melodia_synth folder. Every
synthesized file will have the same name as its corresponding input file but
with the ending _melosynth.wav.
@section INSTALLATION
Simply download the script and run it from your terminal as instructed above.
Dependencies: python (tested on 2.7) and numpy (http://www.numpy.org/)
@section LICENSE
MeloSynth: synthesize a melody
Copyright (C) 2014 Justin Salamon.
MeloSynth is free software: you can redistribute it and/or modify it under the
terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.
MeloSynth is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program. If not, see <http://www.gnu.org/licenses/>.
"""
import argparse, os, wave, logging, glob
import numpy as np
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.INFO)
short_version = '0.1'
version = '0.1.1'
def wavwrite(x, filename, fs=44100, N=16):
'''
Synthesize signal x into a wavefile on disk. The values of x must be in the
range [-1,1].
:parameters:
- x : numpy.array
Signal to synthesize.
- filename: string
Path of output wavfile.
- fs : int
Sampling frequency, by default 44100.
- N : int
Bit depth, by default 16.
'''
maxVol = 2**15-1.0 # maximum amplitude
x = x * maxVol # scale x
# convert x to string format expected by wave
signal = b"".join((wave.struct.pack('h', int(item)) for item in x))
wv = wave.open(filename, 'w')
nchannels = 1
sampwidth = int(N / 8) # in bytes
framerate = fs
nframe = 0 # no limit
comptype = 'NONE'
compname = 'not compressed'
wv.setparams((nchannels, sampwidth, framerate, nframe, comptype, compname))
wv.writeframes(signal)
wv.close()
def loadmel(inputfile, delimiter=None):
'''
Load a pitch (frequency) time series from a file.
The pitch file must be in the following format:
Double-column - each line contains two values, separated by ``delimiter``:
the first contains the timestamp, and the second contains its corresponding
frequency value in Hz.
:parameters:
- inputfile : str
Path to pitch file
- delimiter : str
Column separator. By default, lines will be split by any amount of
whitespace, unless the file ending is .csv, in which case a comma ','
is used as the delimiter.
:returns:
- times : np.ndarray
array of timestamps (float)
- freqs : np.ndarray
array of corresponding frequency values (float)
'''
if os.path.splitext(inputfile)[1] == '.csv':
delimiter = ','
try:
data = np.loadtxt(inputfile, 'float', '#', delimiter)
except ValueError:
raise ValueError('Error: could not load %s, please check if it is in \
the correct 2 column format' % os.path.basename(inputfile))
# Make sure the data is in the right format
data = data.T
if data.shape[0] != 2:
raise ValueError('Error: %s should be of dimension (2,x), but is of \
dimension %s' % (os.path.basename(inputfile), data.shape))
times = data[0]
freqs = data[1]
return times, freqs
def melosynth_batch(inputfolder, outputfolder, fs, nHarmonics, square, useneg):
'''
Run melosynth on every .txt and .csv file in inputfolder, and save the
synthesized files to outputfolder. If outputfolder is None, the files are
saved to intputfolder instead.
:parameters:
- inputfolder : str
Path to input folder containing all the files with pitch sequences.
- outputfolder: str
Path to output folder. If outputfolder is None all files will be saved to
inputfolder. In either case, each output file will be created with the same
name as its corresponding inputfile but ending with "_melosynth.wav"
- fs : int
Sampling frequency for the synthesized file.
- nHarmonics : int
Number of harmonics (including the fundamental) to use in the synthesis
(default is 1). As the number is increased the wave will become more
sawtooth-like.
- square : bool
When set to true, the waveform will converge to a square wave instead of
a sawtooth as the number of harmonics is increased.
- useneg : bool
By default, negative frequency values (unvoiced frames) are synthesized as
silence. If useneg is set to True, these frames will be synthesized using
their absolute values (i.e. as voiced frames).
'''
# Load all files in input folder that end with .txt or .csv
inputfiles = glob.glob(os.path.join(inputfolder, "*.txt"))
inputfiles.extend(glob.glob(os.path.join(inputfolder, "*.csv")))
for inputfile in inputfiles:
if outputfolder is not None:
outfolder = outputfolder
if not os.path.isdir(outfolder):
os.mkdir(outfolder)
else:
outfolder = inputfolder
outputfilename = os.path.basename(inputfile)[:-4] + "_melosynth.wav"
outputfile = os.path.join(outfolder, outputfilename)
logging.info("Processing: " + inputfile)
logging.info("Target : " + outputfile)
melosynth(inputfile, outputfile, fs, nHarmonics, square, useneg)
def melosynth(inputfile, outputfile, fs, nHarmonics, square, useneg):
'''
Load pitch sequence from a txt/csv file and synthesize it into a .wav
:parameters:
- inputfile : str
Path to input file containing the pitch sequence.
- outputfile: str
Path to output wav file. If outputfile is None a file will be
created with the same path/name as inputfile but ending with
"_melosynth.wav"
- fs : int
Sampling frequency for the synthesized file.
- nHarmonics : int
Number of harmonics (including the fundamental) to use in the synthesis
(default is 1). As the number is increased the wave will become more
sawtooth-like.
- square : bool
When set to true, the waveform will converge to a square wave instead of
a sawtooth as the number of harmonics is increased.
- useneg : bool
By default, negative frequency values (unvoiced frames) are synthesized as
silence. If useneg is set to True, these frames will be synthesized using
their absolute values (i.e. as voiced frames).
'''
# Preprocess input parameters
fs = int(float(fs))
nHarmonics = int(nHarmonics)
if outputfile is None:
outputfile = inputfile[:-4] + "_melosynth.wav"
# Load pitch sequence
logging.info('Loading data...')
times, freqs = loadmel(inputfile)
# Preprocess pitch sequence
if useneg:
freqs = np.abs(freqs)
else:
freqs[freqs < 0] = 0
# Impute silence if start time > 0
if times[0] > 0:
estimated_hop = np.median(np.diff(times))
prev_time = max(times[0] - estimated_hop, 0)
times = np.insert(times, 0, prev_time)
freqs = np.insert(freqs, 0, 0)
logging.info('Generating wave...')
signal = []
translen = 0.010 # duration (in seconds) for fade in/out and freq interp
phase = np.zeros(nHarmonics) # start phase for all harmonics
f_prev = 0 # previous frequency
t_prev = 0 # previous timestamp
for t, f in zip(times, freqs):
# Compute number of samples to synthesize
nsamples = int(np.round((t - t_prev) * fs))
if nsamples > 0:
# calculate transition length (in samples)
translen_sm = float(min(np.round(translen*fs), nsamples))
# Generate frequency series
freq_series = np.ones(nsamples) * f_prev
# Interpolate between non-zero frequencies
if f_prev > 0 and f > 0:
freq_series += np.minimum(np.arange(nsamples)/translen_sm, 1) *\
(f - f_prev)
elif f > 0:
freq_series = np.ones(nsamples) * f
# Repeat for each harmonic
samples = np.zeros(nsamples)
for h in range(nHarmonics):
# Determine harmonic num (h+1 for sawtooth, 2h+1 for square)
hnum = 2*h+1 if square else h+1
# Compute the phase of each sample
phasors = 2 * np.pi * (hnum) * freq_series / float(fs)
phases = phase[h] + np.cumsum(phasors)
# Compute sample values and add
samples += np.sin(phases) / (hnum)
# Update phase
phase[h] = phases[-1]
# Fade in/out and silence
if f_prev == 0 and f > 0:
samples *= np.minimum(np.arange(nsamples)/translen_sm, 1)
if f_prev > 0 and f == 0:
samples *= np.maximum(1 - (np.arange(nsamples)/translen_sm), 0)
if f_prev == 0 and f == 0:
samples *= 0
# Append samples
signal.extend(samples)
t_prev = t
f_prev = f
# Normalize signal
signal = np.asarray(signal)
signal *= 0.8 / float(np.max(signal))
logging.info('Saving wav file...')
wavwrite(np.asarray(signal), outputfile, fs)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Melosynth {}. Synthesize "
"a pitch sequence.".format(version))
parser.add_argument("inputfile", help="Path to input file containing the "
"pitch sequence")
parser.add_argument("--output", help="Path to output wav file. If "
"not specified a file will be created with the same "
"path/name as inputfile but ending with "
"\"_melosynth.wav\".")
parser.add_argument("--fs", default=16000, help="Sampling frequency for "
"the synthesized file. If not specified the default "
"value of 16000 Hz is used.")
parser.add_argument("--nHarmonics", default=1, help="Number of harmonics "
"(including the fundamental) to use in the synthesis "
"(default is 1). As the number is increased the wave "
"will become more sawtooth-like.")
parser.add_argument("--square", default=False, dest='square',
action='store_const', const=True, help="Converge to "
"square wave instead of sawtooth as the number of "
"harmonics is increased.")
parser.add_argument("--useneg", default=False, dest='useneg',
action='store_const', const=True, help="By default, "
"negative frequency values (unvoiced frames) are "
"synthesized as silence. Setting the --useneg option "
"will synthesize these frames using their absolute "
"values (i.e. as voiced frames).")
parser.add_argument("--batch", default=False, dest='batch',
action='store_const', const=True, help="Treat "
"inputfile as a folder and batch process every file "
"within this folder that ends with .csv or .txt. If "
"--output is specified it is expected to be a folder "
"too. If --output is not specified, all synthesized "
"files will be saved into the input folder.")
args = parser.parse_args()
if args.inputfile is not None:
if args.batch:
melosynth_batch(args.inputfile, args.output, args.fs,
args.nHarmonics, args.square, args.useneg)
else:
melosynth(args.inputfile, args.output, args.fs, args.nHarmonics,
args.square, args.useneg)