Skip to content

MGloder/stream-processing-workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stream-processing-workshop

Current Status

Trello Board

Data Source- GDELT

The GDELT Project

Supported by Google Jigsaw, the GDELT Project monitors the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, themes, sources, emotions, counts, quotes, images and events driving our global society every second of every day, creating a free open platform for computing on the entire world. (Copied from official website)

Workflow

Requirements

Install and run Pravega

[Option 1] from installation package

[Option 2] from docker

docker run -it -e HOST_IP=<ip> -p 9090:9090 -p 12345:12345 pravega/pravega:latest standalone

Install and run Flink

Run on docker

  • docker pull flink:scala_2.11
  • cd references/flink-docker
  • docker-compose up

Install and run Kafka

Quick start

Create a new topic

  • Example
    • kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 3 --partitions 1 --topic my-replicated-topic

Start a producer

  • Example
    • kafka-console-producer.sh --broker-list localhost:9092 --topic test

Start a consumer

  • Example
    • kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

Install and run Druid

Download Druid

Run Druid

bin/supervise -c quickstart/tutorial/conf/tutorial-cluster.conf

modify tutorial-cluster.conf as desired

Run Jobs

Option 4

Export Data Producer

--class com.machinedoll.projectdemo.jobs.option4.ExportDataProducer

Export Data Consumer

--class com.machinedoll.projectdemo.jobs.option4.ExportDataConsumer