Skip to content

jakemull13/spotify_musical_tastes_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Creating a Spotify API Web Application to Analyze Taste in Popular Music by Country

Table of Contents

- Goal

- Questions

- Methods
             
- Hypothesis
           
- Similarity in Track Sets

- Similarity in Genres

- Similarity in Features

- Distribution of Features

- Null Hyothesis Test

- Stones Left Unturned:
    
- Path Forward:

Goal:

Explore Spotify's datasets to gain an understanding of the features that their apps use to classify audio tracks and tailor its music reccomendations to users.

Question:

How similar or different is the popular music in different countries/regions?

Methods:

Analyze the current "Top 50" Tracks of the United States, Canada, Mexico, the United Kingdom, and the Globe. Calculate the similarities using the following metrics:

    - Similarity in Popular Tracks
    
    - Similarity in Popular Genres
    
    - Similarity in the Features of Popular Music (aka the essential musical/audio
      charachteristics of Popular Tracks)

Hypothesis: The USA is the country whose "Top 50" tracks are the most similar to those of the Global "Top 50"

Similarity in Track Sets

Similarity in Genres

Use the scikit.learn vectorization module to take the lists of genres for each playlist and calculate the frequency of each genre. Then, calculate the cosine-similarity between every playlist's genre-vector, and create a similarity matrix. Finally, plot the matrix using a heatmap to visualize which playlists are most similar in their genres.

Similarity Matrix:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
global usa uk mex can
global 1.000000 0.208407 -0.863271 -0.220740 0.488657
usa 0.208407 1.000000 -0.363109 -0.707902 0.220003
uk -0.863271 -0.363109 1.000000 0.192726 -0.562035
mex -0.220740 -0.707902 0.192726 1.000000 -0.660480
can 0.488657 0.220003 -0.562035 -0.660480 1.000000

The heatmap shows us that the two most similar playlists (whose intersection is the darkest shade of blue) are USA and Canada. However, contrary to my prediction, the playlist most similart to the global playlist is Canadas

Similarity in Features

Description of Features & Correlation between Features

Calculate Similarity

Take the mean values for every feature in a playlist. Then, use these vectors to once again calculate the cosine similarity between each playlist.

Cosine Similarity Matrix
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
global usa uk mex can
global 1.000000 0.208407 -0.863271 -0.220740 0.488657
usa 0.208407 1.000000 -0.363109 -0.707902 0.220003
uk -0.863271 -0.363109 1.000000 0.192726 -0.562035
mex -0.220740 -0.707902 0.192726 1.000000 -0.660480
can 0.488657 0.220003 -0.562035 -0.660480 1.000000

Distributions of Track Features

Null Hypothesis: There is no difference in the means of features in the USA and Global Playlists

two_tailed_test(global_df, usa_df, label1='Global', label2='USA', feature='acousticness')
pval = 0.612864976409074
fail to reject null hypothesis

png

two_tailed_test(global_df, usa_df, label1='Global', label2='USA', feature='danceability')
pval = 0.9898624889912536
fail to reject null hypothesis

png

two_tailed_test(global_df, usa_df, label1='Global', label2='USA', feature='energy')
pval = 0.9966979993191145
fail to reject null hypothesis

png

two_tailed_test(global_df, usa_df, label1='Global', label2='USA', feature='loudness')
pval = 0.6585050655009175
fail to reject null hypothesis

png

two_tailed_test(global_df, usa_df, label1='Global', label2='USA', feature='speechiness')
pval = 0.7483005130667619
fail to reject null hypothesis

png

Stones Left Unturned:

    - Which country most INFLUENCES the top 50?
    
    - Which Features most INFLUENCE the top 50

Path Forward:

    - Expand the Datasets and use Machine Learning to Predict
      the popularity/ranking of a track.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published