Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running on Azure #157

Open
rsignell opened this issue Dec 4, 2023 · 3 comments
Open

Support running on Azure #157

rsignell opened this issue Dec 4, 2023 · 3 comments

Comments

@rsignell
Copy link

rsignell commented Dec 4, 2023

What would be the best way to run Beam pipelines on Azure?
I think @yuvipanda would say "Flink!".
If so, does this look like the best resource? https://github.com/microsoft/flink-on-azure

@rsignell
Copy link
Author

rsignell commented Dec 4, 2023

@TomAugspurger, I thought you said something about running Beam on Azure somewhere, but can't find it (and neither can Bing!)

@rsignell
Copy link
Author

rsignell commented Dec 4, 2023

Actually, I decided to just ask Bing if I could run Apache Beam pipelines on Azure using Apache Flink. Here's the response:

Yes, you can run Apache Beam pipelines on Azure using Apache Flink. There are different ways to do this, depending on your needs and preferences. Here are some possible options:

  • You can use Azure Data Factory Managed Airflow to orchestrate your Apache Flink jobs. This allows you to create and manage workflows that run on a fully managed Apache Airflow service in Azure. You can use the Flink operator to submit your Beam pipelines to a Flink cluster running on Azure HDInsight or Azure Kubernetes Service³.
  • You can use the Flink Kubernetes Operator to run your Hop pipelines on a Flink cluster that is automatically started on Kubernetes. Hop is a data orchestration tool that supports Apache Beam as a pipeline engine. You can use Hop to design and execute your Beam pipelines using a graphical user interface or a code editor⁴.
  • You can use Azure Event Hubs for Apache Kafka to stream data from and to your Apache Flink jobs. Event Hubs is a fully managed, real-time data ingestion service that is compatible with the Kafka protocol. You can use the Flink Kafka connector to read and write data from Event Hubs as if it were a Kafka topic⁵.

I hope this helps you to run your Apache Beam pipelines on Azure using Apache Flink. If you have any more questions, feel free to ask me. 😊.

Source: Conversation with Bing, 12/4/2023
(1) Apache Flink® job orchestration using Azure Data Factory Managed Airflow. https://learn.microsoft.com/en-us/azure/hdinsight-aks/flink/flink-job-orchestration.
(2) Running a Hop pipeline using the Flink Kubernetes Operator. https://hop.apache.org/manual/latest/pipeline/beam/flink-k8s-operator-running-hop-pipeline.html.
(3) Use Apache Flink with Azure Event Hubs for Apache Kafka. https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-kafka-flink-tutorial.
(4) Running the Apache Beam samples With Apache Flink. https://hop.apache.org/manual/latest/pipeline/beam/beam-samples-flink.html.
(5) Using the Apache Flink Runner - The Apache Software Foundation. https://beam.apache.org/documentation/runners/flink/.
(6) en.wikipedia.org. https://en.wikipedia.org/wiki/Apache_Flink.

@TomAugspurger
Copy link

TomAugspurger commented Dec 4, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants