Skip to content

TuneInsights is a streaming data pipeline that uses Apache Kafka to collect and process data from Spotify API. The project aims to provide real-time insights and analytics on Spotify Playback data using Opensearch Dashboard to help users analyze and understand trends, patterns, and user behavior on Spotify platform.

License

Notifications You must be signed in to change notification settings

Sparab16/TuneInsights

Repository files navigation

TuneInsights

TuneInsights is a streaming data pipeline that uses Apache Kafka to collect and process data from Spotify API. The project aims to provide real-time insights and analytics on Spotify Playback data using Opensearch Dashboard to help users analyze and understand trends, patterns, and user behavior on Spotify platform. The project is built using Python, Kafka-Python, and Opensearch library, and consists of several components, including a Kafka Producer that collects and sends data to a Kafka topic, a Kafka Consumer that reads data from the Kafka topic and processes it, and an OpenSearch cluster that serves as the sink for the processed data.

Appendix

The project includes instructions on how to set up and configure the pipeline, as well as sample code and configuration files that can be used as a starting point. It also includes a sample dashboard built on OpenSearch Dashboard that showcases the potential of the pipeline for data analysis and visualization.

Overall, the project provides a powerful tool for collecting and analyzing real-time data from Spotify API platform, and can be used by researchers, analysts, and developers alike to gain insights and create innovative applications and services.

Prerequisites

Before running the project, make sure you have the following prerequisites installed on your system:

  • Python 3.6 or later
  • kafka-python library (install using pip install kafka-python)
  • requests library (install using pip install requests)
  • opensearch library (install using pip install opensearch-py)
  • Apache Kafka cluster, either locally (Using Windows Subsystem Linux) or remotely accessible(Conduktor or
Confluent) - Opensearch Cluster, either locally(Using Docker) or remotely accessible(Using Bonsai.io)

Run Locally

  • Clone the project
  git clone https://github.com/Sparab16/TuneInsights.git
  • Go to the project directory
  cd TuneInsights
  • Install dependencies
  pip install -r requirements.txt
  • Update necessary configuration inside config folder

    • Kafka Connect Credentials

          {
            "sasl_plain_username": "__SASL_PLAIN_USERNAME__",
            "sasl_plain_password": "__SASL_PLAIN_PASSWORD",
            "bootstrap_servers": "__BOOTSTRAP_SERVERS__"
          }
    • OpenSearch Connect Credentials

      {
        "host": "__HOST__",
        "port": "__PORT__",
        "auth": ["__ACCESS_KEY__", "__ACCESS_SECRET__"]
      }
    • Spotify Connect Credentials

      {
        "client_id": "__CLIENT_ID__",
        "client_secret": "__CLIENT_SECRET__",
        "scopes": "__SCOPES__",
        "redirect_uri": "__REDIRECT_URI__"
      }
    • Spotify API Tokens

      {
          "auth_token": "",
          "refresh_token": "",
          "access_token": ""
      }
  • Run the produce_topic.py and consume_topic.py. This will start reading streaming data from the Spotify Playback API and sending it to the specified Kafka topic.

Note :- By default, the script will run indefinitely until you manually stop it by pressing Ctrl+C

OpenSearch API Reference

Bonsai Console

bonsai_console.png

1. Get Index

  GET /<index-name>
Parameter Type Description
host string Required : Your Host
port string Required : Your Port
auth tuple Required : Your Username & Password

2. Get Document

  GET /<index-name>/_doc/<id>
Parameter Type Description
host string Required : Your Host
port string Required : Your Port
auth tuple Required : Your Username & Password

3. Delete Index

  DELETE /<index-name>
Parameter Type Description
host string Required : Your Host
port string Required : Your Port
auth tuple Required : Your Username & Password

For more information related to API's. Click here

Conduktor Platform

Conduktor is a platform that provides an intuitive GUI for managing and monitoring Apache Kafka clusters. It is designed to simplify and streamline the management of Kafka clusters and make it easier for developers, DevOps teams, and data engineers to work with Kafka.

Topic UI

conduktor_topic.png

Consumer Group UI

conduktor_consumer.png

OpenSearch

OpenSearch is a distributed, open source search and analytics engine that is designed to handle large-scale data processing and analysis. It is a fork of Elasticsearch and is fully compatible with Elasticsearch APIs, making it a popular choice for organizations looking to build robust and scalable search and analytics solutions.

Integration of OpenSearch with Apache Kafka provides a powerful platform for real-time data processing and analysis. By using Kafka as a data pipeline, users can easily collect and send data from a wide range of sources to OpenSearch, where it can be analyzed, visualized, and stored

Console UI

opensearch_console.png

Dashboard UI

opensearch_dashboard.png

For PDF version, click here

Authors


🔗 Links

linkedin

LICENSE

MIT License

Copyright (c) [2023] [Shreyas Parab]

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

TuneInsights is a streaming data pipeline that uses Apache Kafka to collect and process data from Spotify API. The project aims to provide real-time insights and analytics on Spotify Playback data using Opensearch Dashboard to help users analyze and understand trends, patterns, and user behavior on Spotify platform.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages