Hoptimator

Multi-hop declarative data pipelines

What is Hoptimator?

Hoptimator is an SQL-based control plane for complex data pipelines.

Hoptimator turns high-level SQL subscriptions into multi-hop data pipelines. Pipelines may involve an auto-generated Flink job (or similar) and any arbitrary resources required for the job to run.

How does it work?

Hoptimator has a pluggable adapter framework, which lets you wire-up arbtitary data sources. Adapters loosely correspond to connectors in the underlying compute engine (e.g. Flink Connectors), but they may include custom control plane logic. For example, an adapter may create a cache or a CDC stream as part of a pipeline. This enables a single pipeline to span multiple "hops" across different systems (as opposed to, say, a single Flink job).

Hoptimator's pipelines tend to have the following general shape:

                                _________
topic1 ----------------------> |         |
table2 --> CDC ---> topic2 --> | SQL job | --> topic4
table3 --> rETL --> topic3 --> |_________|

The three data sources on the left correspond to three different adapters:

topic1 can be read directly from a Flink job, so the first adapter simply configures a Flink connector.
table2 is inefficient for bulk access, so the second adapter creates a CDC stream (topic2) and configures a Flink connector to read from that.
table3 is in cold storage, so the third adapter creates a reverse-ETL job to re-ingest the data into Kafka.

In order to deploy such a pipeline, you only need to write one SQL query, called a subscription. Pipelines are constructed automatically based on subscriptions.

Quick Start

Prerequistes

docker is installed and docker daemon is running
kubectl is installed and cluster is running
1. minikube can be used for a local cluster
helm for Kubernetes is installed

Run

  $ make quickstart
  ... wait a while ...
  $ ./bin/hoptimator
  > !intro
  > !q

Subscriptions

Subscriptions are SQL views that are automatically materialized by a pipeline.

  $ kubectl apply -f deploy/samples/subscriptions.yaml

In response, the operator will deploy a Flink job and other resources:

  $ kubectl get subscriptions
  $ kubectl get flinkdeployments
  $ kubectl get kafkatopics

You can verify the job is running by inspecting the output:

  $ ./bin/hoptimator
  > !tables
  > SELECT * FROM RAWKAFKA."products" LIMIT 5;
  > !q

The Operator

Hoptimator-operator is a Kubernetes operator that orchestrates multi-hop data pipelines based on Subscriptions (a custom resource). When a Subscription is deployed, the operator:

creates a plan based on the Subscription SQL. The plan includes a set of resources that make up a pipeline.
deploys each resource in the pipeline. This may involve creating Kafka topics, Flink jobs, etc.
reports Subscription status, which depends on the status of each resource in the pipeline.

The operator is extensible via adapters. Among other responsibilities, adapters can implement custom control plane logic (see ControllerProvider), or they can depend on external operators. For example, the Kafka adapter actively manages Kafka topics using a custom controller. The Flink adapter defers to flink-kubernetes-operator to manage Flink jobs.

The CLI

Hoptimator includes a SQL CLI based on sqlline. This is primarily for testing and debugging purposes, but it can also be useful for runnig ad-hoc queries. The CLI leverages the same adapters as the operator, but it doesn't deploy anything. Instead, queries run as local, in-process Flink jobs.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github/workflows		.github/workflows
bin		bin
deploy		deploy
etc		etc
gradle		gradle
hoptimator-catalog		hoptimator-catalog
hoptimator-cli-integration		hoptimator-cli-integration
hoptimator-cli		hoptimator-cli
hoptimator-flink-adapter		hoptimator-flink-adapter
hoptimator-flink-runner		hoptimator-flink-runner
hoptimator-kafka-adapter		hoptimator-kafka-adapter
hoptimator-models		hoptimator-models
hoptimator-mysql-adapter		hoptimator-mysql-adapter
hoptimator-operator-integration		hoptimator-operator-integration
hoptimator-operator		hoptimator-operator
hoptimator-planner		hoptimator-planner
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle
test-model.yaml		test-model.yaml

License

linkedin/Hoptimator

Folders and files

Latest commit

History

Repository files navigation

Hoptimator

What is Hoptimator?

How does it work?

Quick Start

Prerequistes

Run

Subscriptions

The Operator

The CLI

About

Topics

Resources

License

Stars

Watchers

Forks

Languages