kotlin-beam-starter

If you want to clone this repository to start your own project, you can choose the license you prefer and feel free to delete anything related to the license you are dropping.

Built With

Getting Started

Requirements

Docker Engine Installation Guide
Java OpenJDK 11+ Installation Guide

Build

./gradlew clean build

Source file structure

This project aims to have a multiple-module/pipeline, to achieve that, every pipeline should be something like this example-spark-runner This project have a few files:

The main /src/main/kotlin
- App.kt Defines the main() entrypoint
- /schema Contains all schema definitions
- /pipeline Contains the pipeline definition
- /ptranform Contains all the pipeline transformations (ptransforms)
The test src/test/kotlin defines the unit tests
The test src/integrationTest/kotlin defines the integration tests

FAQ

What examples this starter includes?

✅ : Done! ⚠️ : TODO

Apache Spark Runner ✅

Read/Write .CSV files ✅

AWS S3 integration ✅

Kafka integration ⚠️

Oracle JDBC Integration ✅

How can i use another runner?

To keep this template small, it only includes the Direct Runner for runtime tests, and Spark Runner as a integration test example. For a comparison of what each runner currently supports, look at the Beam Capability Matrix. To add a new runner, visit the runner's page for instructions on how to include it.

What X means? (Glossary)

Pipeline: A Pipeline encapsulates your entire data processing task, from start to finish. This includes reading input data, transforming that data, and writing output data. All Beam driver programs must create a Pipeline. When you create the Pipeline, you must also specify the execution options that tell the Pipeline where and how to run.

PCollection: A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection from in-memory data within your driver program. From there, PCollections are the inputs and outputs for each step in your pipeline.

PTransform: A PTransform represents a data processing operation, or a step, in your pipeline. Every PTransform takes one or more PCollection objects as input, performs a processing function that you provide on the elements of that PCollection, and produces zero or more output PCollection objects.

Contributing

Thank you for your interest in contributing! All contributions are welcome! 🎉🎊

License

This software is distributed under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
beam-commons		beam-commons
deployment/kubernetes/flink		deployment/kubernetes/flink
example-package-batch		example-package-batch
example-portable-flink-runner		example-portable-flink-runner
example-spark-runner		example-spark-runner
gradle		gradle
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

License

arthurpessoa/kotlin-beam-starter

Folders and files

Latest commit

History

Repository files navigation

kotlin-beam-starter

Built With

Getting Started

Requirements

Build

Source file structure

FAQ

What examples this starter includes?

How can i use another runner?

What X means? (Glossary)

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages