Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 2.14 KB

REFLECTION.md

File metadata and controls

7 lines (4 loc) · 2.14 KB

Reflections

I've made a number of reflections about the processes and methods that I used. First, I considered the specific type of model that I used. I used the standard 2-bar melody model called "cat-mel_2bar_big" as defined in configs.py in the "own_model" folder. I chose to use this because I wanted to restrict my initial samples to shorter 2-bar pieces, which is significantly more manageable than the 16-bar alternatives. In addition, I chose to restrict the model to just a single track (ie. not layering multiple types of instruments) in order to simplify the trends that the model needs to learn.

A limitation that I had in the sampling of my music from this model stemmed from this design choice. Due to the short duration of the 2 bar music pieces, it meant that it was hard to get a sense for the greater melodical flow of a potential piece generated by MusicVAE. In addition, we do not know whether sounds that may appear discordant within a 2 bar frame will actually pan out to be more thematic over a longer generated piece. Indeed, other results posted online suggest that longer generated musical pieces have more melodical flow and harmony. A future extension of this project would be projecting out to create more sophisitcated models and analyzing the results of sampling from them.

Another limitation that I faced was the issue of computing power with a CPU. While I initially tried to utilize the GPU options available in Google Collaboratory, there were numerous conflicting package requirements that were not easily navigable using an .ipynb file. Due to the off-campus arrangements this semester, I also was not able to physically get access to a GPU. Therefore, I had to utilize my local CPU to train the model. This meant that I had to limit the sample size in order to make it train at a relatively fast rate on my CPU. This definitely resulted in detriments to the quality of the model, which may have been partially why the model did not produce musical sequences that did not fully align with the "calm" feeling given off by the source music. A logical extension of this is by accessing a GPU and using a larger, more comprehensive training dataset.