Skip to content

stejul/smikic-dwh

Repository files navigation

Smikic DWH

It's a personal Data Warehouse to test out different APIs and Tools like:

  • Kafka
  • Luigi
  • Docker
  • Mongo and PostgreSQL

What the Project does

  • Perform ETL tasks with Luigi
  • Kafka Streams for the Twitter Stream API
  • MongoDB as a Archive
  • PostgreSQL for the transformed Data
  • Generate docker environment dynamically

Tech Stack

  • Kafka
  • MongoDB
  • PostgreSQL
  • Python
    • Jinja2
    • Kafka wrapper
    • Luigi
    • PyMongo
    • SQLAlchemy
    • Tweepy

Environment Variables

To run this project, you will need to add the following environment variables to your .env file

POSTGRES_USER

POSTGRES_PASSWORD

POSTGRES_HOST

POSTGRES_DB

TWITTER_CONSUMER_KEY

TWITTER_CONSUMER_KEY_SECRET

TWITTER_ACCESS_TOKEN

TWITTER_ACCESS_TOKEN_SECRET

MONGO_USER

MONGO_PASSWORD

MONGO_DB

Installation

To install the project either use pip or poetry

    pip install -r requirements.txt

or

    poetry install

Run Locally

Clone the project

  git clone https://github.com/stejul/smikic-dwh

Generate docker environment

  python dwh/utils/create_docker_environment.py

Change the MongoDB credentials for the MongoDB-Kafka connector:

kafka_docker/connector/MongoSinkConnector.properties

Start the docker environment

  docker-compose up

License

MIT

Authors

Acknowledgements