top_music_analysis

The pipeline for getting tracks features from the most popular Spotify playlists, cluster analysis, and visualization.

Data description

Dataframe index - playlist name.

Columns:

name - name of the song
artist - name of the artist
popularity - the popularity of the track. The value will be between 0 and 100, with 100 being the most popular
danceability - this value describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity
energy - energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity
key - the key the track is in. Integers map to pitches using standard Pitch Class notation.
loudness - the overall loudness of a track in decibels (dB)
mode - mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived
speechiness - this value detects the presence of spoken words in a track
acousticness - a confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic
instrumentalness - predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context
liveness - detects the presence of an audience in the recording
valence - a measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track
tempo - the overall estimated tempo of a track in beats per minute (BPM)
duration_ms - the duration of the track in milliseconds

Repository structure

|    ├── data
|       ├── clustered_data.csv -- Source data with labels after the clustering.
|       └── data.csv -- Source data obtained from the Spotify API.
|    ├── notebooks
|       ├── eda.ipynb -- Jupyter notebook with an exploration data analysis.
|       └── eda.pdf -- PDF-version of EDA notebook.
|    ├── top_music_analysis
|       ├── clustering.py -- Script with clustering functions.
|       ├── config.py -- Settings and path configurations.
|       ├── pipeline.py -- Luigi pipeline to run tasks (getting the data, clustering).
│       └── spotify.py -- Script to get the songs data from the Spotify API.

How to run

First of all, you need to install Poetry - a tool for dependency management and packaging in Python. Follow poetry installation guide

Install dependencies

To install dependencies for the project, just run

poetry install

To install packages without dev run

poetry install --no-dev

After packages installation you can run scripts in top_music_analysis folder.

Pipeline

The project uses Luigi package for the workflow management.

Project pipeline is a DAG (directed acyclic graph) consisted of two tasks:

GetSpotifyDataTask - task for getting the input data from Spotify API
ClusteringTask - task to perform clustering with kmeans algorithm

To run pipeline launch luigi local server as a daemon with

luigid --background --pidfile <PATH_TO_PIDFILE> --logdir <PATH_TO_LOGDIR> --state-path <PATH_TO_STATEFILE>

After that you can run tasks with python top_music_analysis/pipeline.py

CLI

Also, CLI created with Click is available. To get the data from Spotify run

python top_music_analysis/spotify.py --out spotify_data.csv

To perform clustering with collected data run

python top_music_analysis/clustering.py --data spotify_data.csv --out cls_result.csv

Publishing

To build and publish project to pypi-test run

poetry config repositories.test-pypi https://test.pypi.org/legacy/
poetry config pypi-token.test-pypi <your-token>
poetry publish --build -r test-pypi

Note: you can obtain your token in test-pypi profile settings.

Installation from test-pypi

Current project is published to test-pypi - top-music-analysis 0.1.0, so you can install it directly with pip installation:

pip install -i https://test.pypi.org/simple/ top-music-analysis

Code style

The project supports code style checking and formatting.

Formatters

isort and black are used to auto format the code.

Linters

For the code linting flake8 is used. These flake8 plugins are installed:

Also, custom plugin flake8-global-variables is used.

Pre-commit

To run hooks automatically on every commit use pre-commit.

To install pre-commit run poetry run pre-commit install

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.dvc		.dvc
models		models
notebooks		notebooks
references		references
tests		tests
top_music_analysis		top_music_analysis
.dvcignore		.dvcignore
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
data.dvc		data.dvc
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

pacifikus/top_music_analysis

Folders and files

Latest commit

History

Repository files navigation

top_music_analysis

Data description

Repository structure

How to run

Install dependencies

Pipeline

CLI

Publishing

Installation from test-pypi

Code style

Formatters

Linters

Pre-commit

About

Topics

Resources

License

Stars

Watchers

Forks

Languages