kafka-flight-demo

Author: Alec Powell (apowell@confluent.io)

This repo uses a Python producer script to call the OpenSky REST API (https://opensky-network.org/apidoc/rest.html) to fetch live flight locations for thousands of planes around the globe and feed them into an Apache Kafka cluster.

Last updated: 01-07-2022 for Confluent Cloud & BigQuery ("SaaS-ified" version)

"SaaS-ified" version (Confluent Cloud + Google BigQuery)

STEPS (as of 01-07-2022)

Install pre-reqs (note new v2 of Confluent CLI was released Nov 2021)

Make Confluent Cloud and GCP accounts

Install the Confluent CLI

curl -sL --http1.1 https://cnfl.io/cli | sh -s -- latest
export PATH=$(pwd)/bin:$PATH
pip3 install -r requirements.txt

Spin up Kafka cluster, topic(s)

confluent login
confluent kafka cluster create ...
confluent kafka cluster use <cluster-id>
confluent kafka topic create flights
confluent kakfa topic list

Set up configs

Create your API keys for Kafka & Schema Registry first:

confluent api-key create --resource <cluster-id>
confluent api-key create --resource <schema-registry-id>

Now, create your librdkafka.config file and paste in the API keys/Secrets. It should look like this:

# Kafka
bootstrap.servers=<cluster-id>.<cloud>.confluent.cloud:9092
security.protocol=SASL_SSL
sasl.mechanisms=PLAIN
sasl.username=<API-key>
sasl.password=<API-secret>

# Confluent Cloud Schema Registry
schema.registry.url=https://<schema-registry-id>.<region>.<cloud>.confluent.cloud
basic.auth.credentials.source=USER_INFO
basic.auth.user.info=<API-key>:<API-secret>

Produce data!

Our producer is based on Confluent's example script using confluent-kafka Python client. Call the event producer script from the kafka directory. -f is path to your librdkafka.config file, -t is topic name, -n is number of consecutive API calls you want to make (there is a 10second sleep between API calls, so -n 3 will take ~ 30seconds to run)

./event_producer.py -f librdkafka.config -t flights -n 20

ksqlDB for fun & profit

Create a ksqlDB app using confluent CLI or in the Confluent Cloud UI. Browse ksqldb-streams.sql for query ideas.

Spin up BigQuery Sink connector

Download a json file for your BigQuery ServiceAccount, Use the UI -> "Connectors" page to create a BigQuery Sink connector, with 1 task, which will sink data from your Kafka topic to your BigQuery project.

Query your data in BigQuery!

Browse bigquery.sql for query ideas.

Dockerized/local version (Confluent Platform + MemSQL/SingleStore)

STEPS (Last tested on Ubuntu Bionic-18.04 in 04-2020):

Install pre-reqs

sudo apt update
sudo apt install git -y
#install mysql-client
sudo apt install mysql-client -y
#install docker
sudo apt install python3-pip libffi-dev -y
curl -fsSL https://get.docker.com/ | sh
#you may have to logout and log back in for this usermod to register
sudo usermod -aG docker $(whoami) 
sudo systemctl start docker
sudo systemctl enable docker
sudo apt install docker-compose -y

[1b]. Get MemSQL license key (https://portal.memsql.com/), set as env variable $MEMSQL_LICENSE_KEY

Clone this repo and spin up the containers using Docker Compose

git clone https://github.com/alecpowell18/kafka-flight-demo.git 
cd kafka-memsql-demo/
docker-compose up -d

Produce data

#install dependencies for producer script
sudo apt install librdkafka-dev -y
pip3 install -r requirements.txt
#create topic
./kafka/create_topic.py
#produce records - default Kafka topic = 'locs', default time between API calls = 10s
#run ./kafka/make_events.py --help for script options
nohup ./kafka/make_events.py --time 500 > producer.log &

Prepare the db and pipelines

#Check connectivity to MemSQL cluster
mysql -uroot -h 127.0.0.1 -P 3306
exit
#Create the schemas, enable load data local to load from source file
cd memsql/
mysql -uroot -h0 --local-infile < 01-tables-setup.sql
#create pipeline(s)
mysql -uroot -h0 < 02-pipelines-setup.sql 
#create UDFs and SP
mysql -uroot -h0 < 03-udfs-sps.sql
#create pipeline which will use SP
mysql -uroot -h0 < 04-pipeline-into-sp.sql

Start pipelines! Browse to <your-instance-ip-addr>:8080 for MemSQL Studio, or use the command line to do so:

mysql -uroot -h0 demo -e 'START ALL PIPELINES;'

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
bigquery		bigquery
kafka		kafka
looker		looker
memsql		memsql
.env		.env
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bigquery

bigquery

kafka

kafka

looker

looker

memsql

memsql

.env

.env

.gitignore

.gitignore

README.md

README.md

docker-compose.yml

docker-compose.yml

requirements.txt

requirements.txt

Repository files navigation

kafka-flight-demo

Author: Alec Powell (apowell@confluent.io)

This repo uses a Python producer script to call the OpenSky REST API (https://opensky-network.org/apidoc/rest.html) to fetch live flight locations for thousands of planes around the globe and feed them into an Apache Kafka cluster.

Last updated: 01-07-2022 for Confluent Cloud & BigQuery ("SaaS-ified" version)

"SaaS-ified" version (Confluent Cloud + Google BigQuery)

Dockerized/local version (Confluent Platform + MemSQL/SingleStore)

About

Releases

Packages

Contributors 2

Languages

alecpowell18/kafka-flight-demo

Folders and files

Latest commit

History

Repository files navigation

kafka-flight-demo

Author: Alec Powell (apowell@confluent.io)

This repo uses a Python producer script to call the OpenSky REST API (https://opensky-network.org/apidoc/rest.html) to fetch live flight locations for thousands of planes around the globe and feed them into an Apache Kafka cluster.

Last updated: 01-07-2022 for Confluent Cloud & BigQuery ("SaaS-ified" version)

"SaaS-ified" version (Confluent Cloud + Google BigQuery)

Dockerized/local version (Confluent Platform + MemSQL/SingleStore)

About

Topics

Resources

Stars

Watchers

Forks

Languages