Skip to content

mukmookk/streamDAQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StreamDAQ

In this toy project, I am revamping the existing architecture to address the limitations and challenges encountered during its initial implementation. By rebuilding from the bottom up, I have the opportunity to reevaluate design decisions, incorporate the latest industry best practices, and introduce new technologies that align with my vision for the project.

Table of Contents

Project Description

As a pre-data engineer, I constantly seek opportunities to enhance my skills and expand my knowledge in the field. This repository serves as a platform where I explore various data engineering concepts, experiment with different technologies, and develop small projects to tackle interesting challenges.

Goals

  • Enhancing performance to handle large-scale data processing more efficiently.
  • Improving scalability to accommodate growing data volumes and user base.
  • Enhancing fault tolerance and resilience to ensure high availability.
  • Simplifying the codebase and improving maintainability for easier development and troubleshooting.
  • Incorporating modern architectural patterns and design principles.
  • Adopting the latest technologies and frameworks that better suit our requirements.

I'm excited about this rebuilding process as it provides us with a unique opportunity to learn and apply advanced data engineering concepts. I look forward to gaining a deeper understanding of data engineering principles, such as data integration, data pipelines, data quality, and more.

Technologies Used

  • Python
  • bs4
  • selenium
  • Apache Kafka
  • Apache Nifi
  • Cassandra
  • PostgreSQL
  • Presto
  • Redis
  • Power BI

Features

  • Data sourced from web crawling (Yahoo Finance) and REST API (AlphaVantage API).
  • Developed a big data platform for Nasdaq using Apache Kafka and Apache Nifi for data extraction.
  • Utilized Spark and Apache Nifi for data transformation.
  • Employed Cassandra and PostgreSQL for data loading.
  • Implemented Presto for data virtualization.
  • Utilized Redis for in-memory analytics.
  • Integrated Power BI as the BI tool for interactive visualizations.

Installation

  1. Clone the repository:
git clone https://github.com/mukmookk/streamDAQ.git
  1. Install the required dependencies:
pip install -r requirements.txt