StreamDAQ

In this toy project, I am revamping the existing architecture to address the limitations and challenges encountered during its initial implementation. By rebuilding from the bottom up, I have the opportunity to reevaluate design decisions, incorporate the latest industry best practices, and introduce new technologies that align with my vision for the project.

Project Description

As a pre-data engineer, I constantly seek opportunities to enhance my skills and expand my knowledge in the field. This repository serves as a platform where I explore various data engineering concepts, experiment with different technologies, and develop small projects to tackle interesting challenges.

Goals

Enhancing performance to handle large-scale data processing more efficiently.
Improving scalability to accommodate growing data volumes and user base.
Enhancing fault tolerance and resilience to ensure high availability.
Simplifying the codebase and improving maintainability for easier development and troubleshooting.
Incorporating modern architectural patterns and design principles.
Adopting the latest technologies and frameworks that better suit our requirements.

I'm excited about this rebuilding process as it provides us with a unique opportunity to learn and apply advanced data engineering concepts. I look forward to gaining a deeper understanding of data engineering principles, such as data integration, data pipelines, data quality, and more.

Technologies Used

Python
bs4
selenium
Apache Kafka
Apache Nifi
Cassandra
PostgreSQL
Presto
Redis
Power BI

Features

Data sourced from web crawling (Yahoo Finance) and REST API (AlphaVantage API).
Developed a big data platform for Nasdaq using Apache Kafka and Apache Nifi for data extraction.
Utilized Spark and Apache Nifi for data transformation.
Employed Cassandra and PostgreSQL for data loading.
Implemented Presto for data virtualization.
Utilized Redis for in-memory analytics.
Integrated Power BI as the BI tool for interactive visualizations.

Installation

Clone the repository:

git clone https://github.com/mukmookk/streamDAQ.git

Install the required dependencies:

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 222 Commits
.github/workflows		.github/workflows
container		container
env		env
srcs		srcs
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

container

container

env

env

srcs

srcs

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

StreamDAQ

Table of Contents

Project Description

Goals

Technologies Used

Features

Installation

About

Releases 1

Packages

Contributors 2

Languages

mukmookk/streamDAQ

Folders and files

Latest commit

History

Repository files navigation

StreamDAQ

Table of Contents

Project Description

Goals

Technologies Used

Features

Installation

About

Topics

Resources

Stars

Watchers

Forks

Languages