Data Orchestration with Kestra

This repository will:

Help you get started with Kestra
Provide examples of how to use Kestra
Provide examples of how to integrate Kestra with Infrastructure as Code tools (Terraform, GitHub Actions), Modern Data Stack products, and public cloud provider services
Share best practices for managing data workflows across environments so that moving from development to production is as easy as possible without sacrificing security or reliability

Video tutorials

Getting started video explaining key concepts: https://youtu.be/yuV_rgnpXU8
Managing development and production environments in Kestra: https://youtu.be/tiHa3zucS_Q

How to install Kestra

Download the Docker Compose file:

curl -o docker-compose.yml https://raw.githubusercontent.com/kestra-io/kestra/develop/docker-compose.yml

Start Kestra:

docker-compose up

Hello-World example

Here is a simple example logging hello world message to the terminal:

id: hello  
namespace: prod
tasks:
  - id: hello-world
    type: io.kestra.core.tasks.log.Log
    message: Hello world!

Adding a schedule

Here is how you can add a schedule trigger to run the flow every minute: helloParametrizedScheduled.yml

To add multiple schedules, each running with different input parameter values, use helloParametrizedMultipleSchedules.yml

How to integrate Kestra with other tools in the Modern Data Stack

Airbyte

Here is an example of using Kestra with Airbyte running in other Docker container: airbyteSync.yml

And example running multiple Airbyte syncs in parallel: airbyteSyncParallel.yml

Fivetran

Here is an example of a flow with a single task triggering a Fivetran sync: fivetranSync.yml

Custom Scripts

How to run Bash and Python tasks

Here is an example of a Bash task: csvKit

Custom Docker image per task

If you prefer to run the Python or Bash task in a (potentially custom) Docker container: pythonScriptContainer

Using dockerOptions with the dockerConfig attribute, you can also configure credentials to private Docker registries:

auths: { "my.registry.com" : { auth: "token" } }

Docker image with `requirements.txt` built at runtime

id: hello-python-docker
namespace: prod
tasks:
  - id: python-container
    type: io.kestra.core.tasks.scripts.Python
    inputFiles:
      main.py: |
        import pandas as pd
        import requests
        
        print(pd.__version__)
        print(requests.__version__)
    requirements:
      - requests
      - pandas
    runner: DOCKER
    dockerOptions:
      image: python:3.11-slim

Keyboard shortcuts for the built-in editor in the UI

On Mac:

fn + control + Space = Autocomplete
command + K + C = Comment
command + K + U = Uncomment
opt + up/down = Move line up/down

On Windows:

ctrl + space = Autocomplete
ctrl + K + C = Comment
ctrl + K + U = Uncomment
alt + up/down = Move line up/down

Credentials management

Open-source Kestra

In the open-source version, you can leverage environment variables. Create an .env file and add any environment variables there, as shown in the .env_example file.

Then, you can reference that environment variable in your flow using {{envs.aws_access_key_id}}.

For security reasons, environment variables are fixed at the application startup (JVM startup).

Note that the reference must be lowercase:

{{envs.aws_access_key_id}} is correct ✅
{{envs.AWS_ACCESS_KEY_ID}} is NOT correct ❌ because it must be referenced in lowercase, even though the .env file contains the variable in uppercase AWS_ACCESS_KEY_ID=xxx

Also, make sure that your Kestra container contains this configuration in your docker-compose.yml file:

  kestra:
    image: kestra/kestra:develop-full
    ...
    env_file:
      - .env
    environment:
      KESTRA_CONFIGURATION: |
        kestra:
          ...
          variables:
            env-vars-prefix: ""

Setting env-vars-prefix to an empty string will allow you to reference environment variables without a prefix.

Without this settings, your AWS_ACCESS_KEY_ID environment variable would need to be prefixed with KESTRA_ in the .env file: KESTRA_AWS_ACCESS_KEY_ID.

Cloud & Eneterprise Edition

Cloud & Enterprise Editions have a dedicated credentials managers with extra encryption, namespace-bound credential inheritance hierarchy and an RBAC-setting behind it.

You can add a Secret in the relevant namespace from the namespace tab in the UI. To reference that secret in your flow, use {{secret('AWS_ACCESS_KEY_ID')}} instead of {{envs.aws_access_key_id}}.

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
.github/workflows		.github/workflows
blueprints		blueprints
dockerfiles-manual		dockerfiles-manual
dockerfiles		dockerfiles
examples		examples
images		images
.env_example		.env_example
.gitignore		.gitignore
.terraform.lock.hcl		.terraform.lock.hcl
LICENSE		LICENSE
README.md		README.md
data_infrastructure_as_code.md		data_infrastructure_as_code.md
docker-compose.yml		docker-compose.yml
encode_env_vars.bash		encode_env_vars.bash
main.tf		main.tf
requirements.txt		requirements.txt
taskDefaults.yml		taskDefaults.yml
terraform_cloud.md		terraform_cloud.md
terraform_oss.md		terraform_oss.md
variables.tf		variables.tf

License

kestra-io/examples

Folders and files

Latest commit

History

Repository files navigation

Data Orchestration with Kestra

Video tutorials

How to install Kestra

Hello-World example

Adding a schedule

How to integrate Kestra with other tools in the Modern Data Stack

Airbyte

Fivetran

Custom Scripts

How to run Bash and Python tasks

Custom Docker image per task

Docker image with requirements.txt built at runtime

Keyboard shortcuts for the built-in editor in the UI

Credentials management

Open-source Kestra

Cloud & Eneterprise Edition

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

Docker image with `requirements.txt` built at runtime