System Requirments:

Python3
- google cloud text to speech
- music21
- textgrid
Matlab
- digital signals processing toolbox

How to use:

$ python3 generate.py [optional flags]

select a valid MusicXML file from the dialog, and then the song performances will be placed in the /output folder.

Several command-line flags can be used to affect how the program runs:

--validate - Run validation on speech alignment (typically the program will crash the first time running a new song without this option)
--no-tts - Skip the downloading the words from the song. Speech files are cached, so this typically isn't necessary
--no-align - Skip the speech audio alignment to phoneme information
--reset-cache - Delete all previously downloaded speech audio files that have been cached
--no-text - Replace all lyrics with the vowel 'Ah'

Examples:

The sheet music (MusicXML) for several pieces is available in the /sheet_music folder. Example performances (WAV) of the pieces are available in the /output/demos folder

To-Do:

fix the extend method so that it is always a smooth transition into the the stretched center of the vowel. It works most of the time, but there are still a lot of cases where it does not get a proper pitch for the vowel. Perhaps look at the mean pitch in the vowel, and try to match that rather than just whatever period we landed on.
look into dynamics control based on the intensity of the waveform
make it so that parts can have multiple notes at the same time (chords I think?)
make it so that multiple voices can be on the same line at a time
when stitching syllables, make each boundary a zero crossing (if next word derivative at zcc is wrong, invert the sound signal)
in python, detect when the same syllable is repeated over multiple notes (i.e. redo the function for extracting words and determining what syllable the of the word (according to the music) is being sung)
figure out why the forced aligner fails in a lot of cases
make each voice part use multiple singers
add vibratto and tremolo to voices, esp for sustained notes
for time stretching samples, look into some method to evaluate the quality of the period selected. Sometimes it sounds like an artifact is selected as part of the period, so perhaps the method could be to sweep over the vowel and pull out a period that is most average
look into integrating audiveris for optical music recognition, so that the software can do a full end to end performance, starts with a PDF of sheet music, and ending with the audio recording
move all signal processing from matlab to python (i.e. replace pitch detection, pitch shifting, and time stretching, with numpy and probably c++ libraries)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
output		output
recipes		recipes
recordings		recordings
sheet_music		sheet_music
speech		speech
.gitignore		.gitignore
Ensemble_Presentation.odp		Ensemble_Presentation.odp
README.md		README.md
add_reverb.m		add_reverb.m
call_matlab.bash		call_matlab.bash
find_zero_cross.m		find_zero_cross.m
generate.py		generate.py
perform.m		perform.m
squish.m		squish.m
stretch.m		stretch.m

david-andrew/Ensemble

Folders and files

Latest commit

History

Repository files navigation

System Requirments:

How to use:

Examples:

To-Do:

About

Topics

Resources

Stars

Watchers

Forks

Languages