In this toy project, I am revamping the existing architecture to address the limitations and challenges encountered during its initial implementation. By rebuilding from the bottom up, I have the opportunity to reevaluate design decisions, incorporate the latest industry best practices, and introduce new technologies that align with my vision for the project.
As a pre-data engineer, I constantly seek opportunities to enhance my skills and expand my knowledge in the field. This repository serves as a platform where I explore various data engineering concepts, experiment with different technologies, and develop small projects to tackle interesting challenges.
- Enhancing performance to handle large-scale data processing more efficiently.
- Improving scalability to accommodate growing data volumes and user base.
- Enhancing fault tolerance and resilience to ensure high availability.
- Simplifying the codebase and improving maintainability for easier development and troubleshooting.
- Incorporating modern architectural patterns and design principles.
- Adopting the latest technologies and frameworks that better suit our requirements.
I'm excited about this rebuilding process as it provides us with a unique opportunity to learn and apply advanced data engineering concepts. I look forward to gaining a deeper understanding of data engineering principles, such as data integration, data pipelines, data quality, and more.
- Python
- bs4
- selenium
- Apache Kafka
- Apache Nifi
- Cassandra
- PostgreSQL
- Presto
- Redis
- Power BI
- Data sourced from web crawling (Yahoo Finance) and REST API (AlphaVantage API).
- Developed a big data platform for Nasdaq using Apache Kafka and Apache Nifi for data extraction.
- Utilized Spark and Apache Nifi for data transformation.
- Employed Cassandra and PostgreSQL for data loading.
- Implemented Presto for data virtualization.
- Utilized Redis for in-memory analytics.
- Integrated Power BI as the BI tool for interactive visualizations.
- Clone the repository:
git clone https://github.com/mukmookk/streamDAQ.git
- Install the required dependencies:
pip install -r requirements.txt