TuneInsights

TuneInsights is a streaming data pipeline that uses Apache Kafka to collect and process data from Spotify API. The project aims to provide real-time insights and analytics on Spotify Playback data using Opensearch Dashboard to help users analyze and understand trends, patterns, and user behavior on Spotify platform. The project is built using Python, Kafka-Python, and Opensearch library, and consists of several components, including a Kafka Producer that collects and sends data to a Kafka topic, a Kafka Consumer that reads data from the Kafka topic and processes it, and an OpenSearch cluster that serves as the sink for the processed data.

Appendix

The project includes instructions on how to set up and configure the pipeline, as well as sample code and configuration files that can be used as a starting point. It also includes a sample dashboard built on OpenSearch Dashboard that showcases the potential of the pipeline for data analysis and visualization.

Overall, the project provides a powerful tool for collecting and analyzing real-time data from Spotify API platform, and can be used by researchers, analysts, and developers alike to gain insights and create innovative applications and services.

Prerequisites

Before running the project, make sure you have the following prerequisites installed on your system:

Python 3.6 or later
kafka-python library (install using pip install kafka-python)
requests library (install using pip install requests)
opensearch library (install using pip install opensearch-py)
Apache Kafka cluster, either locally (Using Windows Subsystem Linux) or remotely accessible(Conduktor or

Confluent) - Opensearch Cluster, either locally(Using Docker) or remotely accessible(Using Bonsai.io)

Run Locally

Clone the project

  git clone https://github.com/Sparab16/TuneInsights.git

Go to the project directory

  cd TuneInsights

Install dependencies

  pip install -r requirements.txt

Update necessary configuration inside config folder

Kafka Connect Credentials

    {
      "sasl_plain_username": "__SASL_PLAIN_USERNAME__",
      "sasl_plain_password": "__SASL_PLAIN_PASSWORD",
      "bootstrap_servers": "__BOOTSTRAP_SERVERS__"
    }

OpenSearch Connect Credentials

{
  "host": "__HOST__",
  "port": "__PORT__",
  "auth": ["__ACCESS_KEY__", "__ACCESS_SECRET__"]
}

Spotify Connect Credentials

{
  "client_id": "__CLIENT_ID__",
  "client_secret": "__CLIENT_SECRET__",
  "scopes": "__SCOPES__",
  "redirect_uri": "__REDIRECT_URI__"
}

Spotify API Tokens

{
    "auth_token": "",
    "refresh_token": "",
    "access_token": ""
}

Run the produce_topic.py and consume_topic.py. This will start reading streaming data from the Spotify Playback API and sending it to the specified Kafka topic.

Note :- By default, the script will run indefinitely until you manually stop it by pressing Ctrl+C

OpenSearch API Reference

Bonsai Console

1. Get Index

  GET /<index-name>

Parameter	Type	Description
`host`	`string`	Required : Your Host
`port`	`string`	Required : Your Port
`auth`	`tuple`	Required : Your Username & Password

2. Get Document

  GET /<index-name>/_doc/<id>

Parameter	Type	Description
`host`	`string`	Required : Your Host
`port`	`string`	Required : Your Port
`auth`	`tuple`	Required : Your Username & Password

3. Delete Index

  DELETE /<index-name>

Parameter	Type	Description
`host`	`string`	Required : Your Host
`port`	`string`	Required : Your Port
`auth`	`tuple`	Required : Your Username & Password

For more information related to API's. Click here

Conduktor Platform

Conduktor is a platform that provides an intuitive GUI for managing and monitoring Apache Kafka clusters. It is designed to simplify and streamline the management of Kafka clusters and make it easier for developers, DevOps teams, and data engineers to work with Kafka.

Topic UI

Consumer Group UI

OpenSearch

OpenSearch is a distributed, open source search and analytics engine that is designed to handle large-scale data processing and analysis. It is a fork of Elasticsearch and is fully compatible with Elasticsearch APIs, making it a popular choice for organizations looking to build robust and scalable search and analytics solutions.

Integration of OpenSearch with Apache Kafka provides a powerful platform for real-time data processing and analysis. By using Kafka as a data pipeline, users can easily collect and send data from a wide range of sources to OpenSearch, where it can be analyzed, visualized, and stored

Console UI

Dashboard UI

For PDF version, click here

Authors

@Sparab16

🔗 Links

LICENSE

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
app		app
configs/authorization		configs/authorization
ref		ref
services		services
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
consume_topic.py		consume_topic.py
produce_topic.py		produce_topic.py
requirements.txt		requirements.txt

License

Sparab16/TuneInsights

Folders and files

Latest commit

History

Repository files navigation