HiringChallenge

Install and setup Kafka locally (https://kafka.apache.org/quickstart)

For MacOS, go to the installed Kafka folder and follow these steps

Open Terminal and start zookeeper
- bin/zookeeper-server-start.sh config/zookeeper.properties
Open second Terminal and start Kafka server
- bin/kafka-server-start.sh config/server.properties
Open third Terminal and Create topic to consume in python application
- bin/kafka-topics.sh --create --topic consumerTopic --bootstrap-server localhost:9092
Create topic to produce data from python application
- bin/kafka-topics.sh --create --topic producerTopic --bootstrap-server localhost:9092

To Download the dummy data please use this Link

http://tx.tamedia.ch.s3.amazonaws.com/challenge/data/stream.jsonl.gz

Now we need to run the code and start consuming data

Open terminal in root folder of project and Run this command to create virtual env
- python3 -m venv venv
Run this command to activate virtual env
- source venv/bin/activate
Run this command to install dependencies
- pip install -r requirements
Open project and run main.py
Open Terminal where Kafka is installed, run this command to push dummy data so python application can consume it (Change path/to/data with path where data is located)
- gzcat path/to/data | bin/kafka-console-producer.sh --broker-list localhost:9092 --topic consumerTopic

Report:

Pandas is being used because it provides dataframes which makes it easier to implement grouping, uniqueness and to count the unique ids.

Approach:

Converting received json string from Kafka stream to json object, getting the required 'ts' and 'uid' column data and converting it to DataFrame to generate metrics. Producing data after every 5 seconds because at that time we have 99.99% of correct data. We read the Kafka stream, convert it to DataFrame and after every 5 seconds produce the results.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
services		services
utilities		utilities
.gitignore		.gitignore
Answers.txt		Answers.txt
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

services

services

utilities

utilities

.gitignore

.gitignore

Answers.txt

Answers.txt

README.md

README.md

main.py

main.py

requirements.txt

requirements.txt

Repository files navigation

HiringChallenge

About

Releases

Packages

Languages

roshanbaig17/HiringChallenge

Folders and files

Latest commit

History

Repository files navigation

HiringChallenge

About

Resources

Stars

Watchers

Forks

Languages