Skip to content

CloudComputingProject-2022/Data_visualization_and_analysis_tool_for_telemetry_data

Repository files navigation

Cloud-based data visualization and analysis tool for telemetry data

A naive data visualization and analysis tool for F1 on board telemetry data.


Logo

Cloud-based data visualization and analysis tool for telemetry data


View Demo

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contacts
  8. Acknowledgments

About The Project

In both minor motorsport categories and racing e-sports there seems to be no easily accessible tool to collect, visualize and analyze live telemetry data. The user often has to perform complex installation tasks to run these tools on his own machine, which might not be powerful enough to handle real-time data stream analysis.

This work proposes a possible baseline architecture to implement a data visualization and analysis tool for on-board telemetry data, completely based on cloud technologies and distributed systems. The proposed system falls under the Software-as-a-Service (SaaS) paradigm and relies on Infrastructure-as-a-Service (IaaS) cloud solutions to provide hardware support to its software components.

For more info, please refer to the Project report.

Built With

This section lists all major frameworks/libraries used in this project.

Data source and front-end:

Back-end Apache services:

(back to top)

Getting Started

To get your system up and running, follow these simple steps.

Prerequisites

First, you need to have an account on any cloud platform from which you can access cluster services. We used Google Cloud Dataproc clusters, but any other cloud provider should do.

Following the next section, this is the architecture you will end up with.

Installation

Make sure to have two clusters on which you can deploy the following technologies:

  1. Apache ZooKeeper (v. 3.7.1) and Apache Kafka (v. 3.1.0) on one cluster.
  2. Apache Spark (v. 3.1.2) on the other cluster.
  • ZooKeeper is required in order to run Kafka. The following example shows how to properly setup on each cluster node the zoo.cfg file in the conf directory under the ZooKeeper home, to run a ZooKeeper ensemble over a three-nodes cluster:

    ticktime=2000
    dataDir=/var/lib/zookeeper
    clientPort=2181
    initLimit=20
    syncLimit=5
    server.1=hostnameA:2888:3888
    server.2=hostnameB:2888:3888
    server.3=hostnameC:2888:3888
    
  • On each cluster node, the following key properties must be specified in the server.properties file, located in the config directory under the Kafka home.

    • broker.id=UID (where UID is a unique ID for this broker).
    • listeners=PLAINTEXT://internalIP:9092
    • advertised.listeners=PLAINTEXT://externalIP:9092
    • zookeeper.connect=hostnameA:2181,hostnameB:2181,hostnameC:2181/kafka_root_znode
  • If you're using Google Cloud Dataproc clusters, you don't need to manually install and configure Spark as it is already included in the cluster's VM image.

(back to top)

Usage

Before launching the streamlit client, make sure that:

  • Both Kafka and Spark clusters are up and running.
  • Specify the correct broker IPs and topic names in configuration.ini.
  • The data source is active and publishing on the correct Kafka topic. For test purposes, you could run the data stream producer process provided in this repo:
    python ./datastream_producer.py
  • Start the Spark streaming analysis script on the spark cluster:
    spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2 ./structured_stream_process.py --broker <IP:port> --intopic <topicName> --outtopic <topicName>

Finally, you are ready to run the client:

streamlit run ./main.py

(back to top)

Roadmap

These are some of the features we would like to add to this project.

  • Add anomaly threshold real-time choice
  • Multidriver support (this involves kafka topics re-organization)
  • Add statefulness to streamlit
    • Counter variables
    • Data dict
  • Use MLlib into the Spark SS data analysis module

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

(back to top)

Contacts

(back to top)

Acknowledgments

Thanks to O'Reilly books about:

Infrastructure-as-a-Service used for this project:

(back to top)