GitHub - ArmanShakeri/PAIR-Finance-Case-Study: this is data engineering challenge

Running the docker

The insert part was not commented out in the faker code(main.py), so the records were not being saved in the table. This has been fixed now.
I have tried to transfer the minimum amount of data to the processing node(analytics.py) and to perform as much aggregation and processing as possible on the database side to maintain data locality.
Due to the increasing size of the input table, running queries will be time-consuming. A simple solution is to segment the execution of queries and fetching data from the database, for example, every five minutes, the last three hours of data are getting aggregated. Considering that the data is being received stream and duplicate data is not written to the output table due to the keys consideration(device_id,event_time), the metrics will be updated if new data arrives. Determining the interval depends on the business logics and needs. As this task is only for evaluation, there are no specific restrictions on reading from the database.
The requested reports are saved in the form of a table named "report" in the "analytics" database in MySQL.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Screenshots		Screenshots
analytics		analytics
main		main
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml