Skip to content

aliavni/docker

Repository files navigation

Docker data stack

Table of Contents

Run

  1. Install Docker Desktop
  2. Create .env file in the repo root by copying .env.template
  3. Fill in the desired POSTGRES_PASSWORD value in the .env file
  4. Build containers:
docker compose up -d --build

Jupyter

Check out the jupyterlab container logs and click on the link that looks like http://127.0.0.1:8089/lab?token=...

Trino

docker exec -it trino trino
SHOW SCHEMAS FROM db;
USE db.public;
SHOW TABLES FROM public;

Spark

docker exec -it spark-master /bin/bash
cd /opt/spark/bin
./spark-submit --master spark://0.0.0.0:7077 \
  --name spark-pi \
  --class org.apache.spark.examples.SparkPi  \
  local:///opt/spark/examples/jars/spark-examples_2.12-3.5.1.jar 100

Thrift

docker exec -it spark-master /bin/bash
./bin/beeline
!connect jdbc:hive2://localhost:10000 scott tiger
show databases;
create table hive_example(a string, b int) partitioned by(c int);
alter table hive_example add partition(c=1);
insert into hive_example partition(c=1) values('a', 1), ('a', 2),('b',3);
select count(distinct a) from hive_example;
select sum(b) from hive_example;

ScyllaDB

Connect to cqlsh

docker exec -it scylla-1 cqlsh

Create keyspace

CREATE KEYSPACE data
WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3};

Use keyspace and create table

USE data;

CREATE TABLE data.users (
    user_id uuid PRIMARY KEY,
    first_name text,
    last_name text,
    age int
);

Insert data

INSERT INTO data.users (user_id, first_name, last_name, age)
  VALUES (123e4567-e89b-12d3-a456-426655440000, 'Polly', 'Partition', 77);

Kafka

Create topic

docker exec -it kafka kafka-topics.sh --create --topic test --bootstrap-server 127.0.0.1:9092

Kafka producer

See kafka_producer.ipynb

Kafka consumer

kafka_consumer.ipynb