song-popularity-prediction

This is a machine learning project to predict the popularity of a track on Spotify.

About the dataset

The dataset was sourced from Kaggle. The dataset contains 170,653 tracks from 1921 to 2020. It also contains 16 features. Visualizing feature importances reveals that 'year' is the most important predictor of the popularity of a track (Well, the Spotify documentation does state that popularity is calculated with respect to year).

Predicting popularity

The approach to predict popularity is simple. It is broken into four steps:

Create a simple decision tree model
Optimize the decision tree model with GridSearchCV
Tweak only the max_leaf_nodes parameter of the decision tree model
Create a random forest model

The random forest model has the best results, with a mean absolute error of 6.75.

Model	MAE
Simple decision tree	9.196
Decision tree (GridSearchCV)	6.915
Decision tree (max_leaf_nodes)	6.829
Random Forest	6.750

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
README.md		README.md
popularity_prediction.ipynb		popularity_prediction.ipynb
requirements.txt		requirements.txt
spotify_tracks.csv		spotify_tracks.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

popularity_prediction.ipynb

popularity_prediction.ipynb

requirements.txt

requirements.txt

spotify_tracks.csv

spotify_tracks.csv

Repository files navigation

song-popularity-prediction

About the dataset

Predicting popularity

About

Releases

Packages

Languages

akin-oladejo/song-popularity-prediction

Folders and files

Latest commit

History

Repository files navigation

song-popularity-prediction

About the dataset

Predicting popularity

About

Resources

Stars

Watchers

Forks

Languages