Dynamic Time Warping

Dynamic Time Warping (DTW) is a technique used to measure the similarity between two sequences of temporal events, accommodating variations in their speeds. In this project, I intend to utilize DTW to assess the similarity between specific audio tracks: "My Flame" by Bobby Caldwell, "Sky's the Limit" by The Notorious B.I.G. (which samples "My Flame"), and "Take Five" by Dave Brubeck.

Song 1 - My Flame by Bobby Caldwell (https://www.youtube.com/watch?v=3hK6IgvZ0CY)
Song 2 - Sky's the Limit by The Notorious B.I.G. (https://www.youtube.com/watch?v=d3vOeCkeCNA)
Song 3 - Take Five by Dave Brubeck (https://www.youtube.com/watch?v=vmDDOFXSgAs)

It's noteworthy that Song 2 ("Sky's the Limit") is sampled from Song 1 ("My Flame"), while Song 3 ("Take Five") is entirely distinct from the first two songs.

For the purpose of this project, DTW will be applied solely to the instrumental versions of the songs. Due to computational considerations, the analysis will focus on the initial 30 seconds of each track. By employing DTW on these selected segments, I aim to compute similarity scores and quantify the degree of resemblance between the songs.

1) Converting the .mp4 files into .wav files

In this section, I extracted the audio content from the .mp4 video files (which I downloaded) and converted them into the widely used .wav audio format. This step is essential for further analysis and processing of the audio data. Each .mp4 file will be processed to produce a corresponding .wav file, focusing on the audio component for subsequent Dynamic Time Warping (DTW) analysis.

2) Loading the .wav files and visualizing their audio frequencies

A Mel spectrogram, also known as a Mel-frequency spectrogram, is a representation of an audio signal's frequency content in a way that is more perceptually relevant to human hearing. It's a popular tool used in speech and audio processing tasks. I will be using the librosa package to visualize the 3 songs with the help of a log Mel-frequency spectrogram. The logarithm (log) transformation is applied to the Mel spectrogram because it helps replicate the logarithmic perception of loudness by the human auditory system. This makes the visualization more aligned with how we perceive audio, especially at different frequencies and amplitudes. It's a common practice in audio processing to use the log scale to better capture the human auditory experience and highlight the relevant features for analysis.

3) Performing DTW

With the assistance of Dynamic Time Warping (DTW), we can determine the alignment cost between the songs, providing insights into their level of similarity. DTW is particularly effective in comparing sequences with different speeds, making it well-suited for our task of comparing audio tracks.

To facilitate a more meaningful comparison, we will focus on the normalized alignment cost, a crucial metric derived from DTW. The normalized alignment cost is obtained by normalizing the alignment cost with respect to the lengths of the sequences being compared. It provides a standardized measure of similarity, enabling fair comparisons irrespective of sequence length.

In interpreting the results, a lower normalized alignment cost indicates a higher degree of similarity between the songs. Conversely, a higher normalized alignment cost signifies greater dissimilarity. This metric simplifies the assessment of similarity levels, aiding in the determination of how closely related the songs are in terms of their audio characteristics.

Alignment between Song 1 and 2

Alignment between Song 1 and 3

Alignment between Song 2 and 3

Upon reviewing the results, it's evident that the alignment cost between songs 1 and 2 is notably lower compared to the alignment cost between songs 1 and 3, as well as between songs 2 and 3. This disparity in alignment costs strongly indicates a higher degree of similarity between songs 1 and 2. This aligns with the expectation, considering that song 2 is sampled from song 1. These results from DTW affirm the inherent musical resemblance between these two tracks.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
audio_output		audio_output
videos		videos
DTW.ipynb		DTW.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audio_output

audio_output

videos

videos

DTW.ipynb

DTW.ipynb

README.md

README.md

Repository files navigation

Dynamic Time Warping

1) Converting the .mp4 files into .wav files

2) Loading the .wav files and visualizing their audio frequencies

3) Performing DTW

About

Releases

Packages

Languages

Balajirvp/Dynamic-Time-Warping

Folders and files

Latest commit

History

Repository files navigation

Dynamic Time Warping

1) Converting the .mp4 files into .wav files

2) Loading the .wav files and visualizing their audio frequencies

3) Performing DTW

About

Topics

Resources

Stars

Watchers

Forks

Languages