Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google CloudRun job operator #24730

Closed
2 tasks done
thinhnd2104 opened this issue Jun 29, 2022 · 21 comments · Fixed by #33067
Closed
2 tasks done

Google CloudRun job operator #24730

thinhnd2104 opened this issue Jun 29, 2022 · 21 comments · Fixed by #33067
Assignees
Labels
good first issue kind:feature Feature Requests provider:google Google (including GCP) related issues

Comments

@thinhnd2104
Copy link
Contributor

Description

Like AWS ECS, In Google Cloud Service has Cloud Run Job beta. So does anyone need to use this feature from GCS?

Use case/motivation

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@thinhnd2104 thinhnd2104 added the kind:feature Feature Requests label Jun 29, 2022
@uranusjr uranusjr added the provider:google Google (including GCP) related issues label Jun 30, 2022
@eladkal
Copy link
Contributor

eladkal commented Jul 3, 2022

feel free to submit PR

@jmantegazza
Copy link

Oh yeah, I would like to have this operator. I am using Cloud Run Jobs to execute light data processing and scripting. It would be really nice to have a dedicated operator to trigger this flow in Cloud Composer (apart from the bash operator).

@potiuk
Copy link
Member

potiuk commented Sep 11, 2022

Oh yeah, I would like to have this operator. I am using Cloud Run Jobs to execute light data processing and scripting. It would be really nice to have a dedicated operator to trigger this flow in Cloud Composer (apart from the bash operator).

Why not contribute it then ?

@jmantegazza
Copy link

jmantegazza commented Sep 12, 2022

Oh yeah, I would like to have this operator. I am using Cloud Run Jobs to execute light data processing and scripting. It would be really nice to have a dedicated operator to trigger this flow in Cloud Composer (apart from the bash operator).

Why not contribute it then ?

I would love to. But I do not have a single idea on how to do it. Is there an established process to work on it?

@potiuk
Copy link
Member

potiuk commented Sep 12, 2022

@potiuk
Copy link
Member

potiuk commented Sep 12, 2022

Includes:

  • intro
  • step-by-step-guide how to approach contribution
  • links to <10 minutes setup of development environment
  • quick contribution guides which provide quick-and-dirty way how to start with screenshots depending on which IDE you use (PyCharm/VCcode or even gitpod or codespaces if you feel like developing in remote env) if you just want to "do" without reading too much of "why and how".

I think that is a good starting point - and you can choose the learning path that is best for you

@v-hunt
Copy link

v-hunt commented Oct 24, 2022

I think, I also can contribute on this, if required

@o-nikolas
Copy link
Contributor

I think, I also can contribute on this, if required

We haven't heard back from Juan, so assigning to @v-hunt, thanks for taking this one!

@o-nikolas o-nikolas assigned v-hunt and unassigned thinhnd2104 Oct 26, 2022
@corridordigital
Copy link

Possible solution with this PR

@mharrisb1
Copy link

mharrisb1 commented Dec 4, 2022

Brief Thoughts/Notes

I've done a little bit of work on this and here are some notes.

A PR for this feature should include operators for both:

  • Cloud Run services: Used to run code that responds to web requests, or events.
  • Cloud Run jobs: Used to run code that performs work (a job) and quits when the work is done.

Source.

It looks like #27638 only include the services operator. This would be a good start but it looks like it also uses transport instead of the official client. Other GCP operators (e.g. tasks and others) use the official clients so it would be best to go the same route with this one.

In my mind the biggest benefit comes from the jobs operators since that would allow users who do not want to deal with/manage K8s to use Cloud Run Jobs with arbitrary containers.

The Google team is great and recently released support for jobs in the official Cloud Run Python client (see googleapis/python-run#65) but it won't be available until v0.5.0 with no ETA. It also currently won't build with apache-airflow-providers-google because of incompatible protobuf support (see googleapis/python-run#70).

I created my own plugin for this if anyone is interested in using Cloud Run Jobs (currently does not support services) before these issues are resolved and I plan to use the official client once that is ready.

https://github.com/mharrisb1/airflow-google-cloud-run-plugin

Please note that this plugin will only be supported until this is available in Airflow.

Requirements Proposal

Cloud Run Services and Jobs would be great additions to GCP resources in Airflow. I think a PR to add these features should cover the following:

Some additional thoughts:

  • There seems to be a pretty low quota for sequential requests so any ping mechanism should try to respect this otherwise tasks will fail often.
  • For Cloud Run jobs operators specifically, it would be nice if instead of only having CRUD-based operators, the main job run operator could also have an option to "create if not exists" and "deleted on exit" to avoid extra tasks. This is simply a personal preference (I added it to my plugin see example).

Would love to contribute and collaborate with anyone on this. I do think we're blocked on progress until v0.5.0 of the official client is released w/ support for compatible protofbuf lib but we can definitely go ahead and start progressing on this in preparation for those to be released in the near future (also go comment/like this issue to increase awareness to Google team googleapis/python-run#70).

@v-hunt
Copy link

v-hunt commented Dec 4, 2022

Hi guys,
I'm sorry for not responding for a while. (I'm in Ukraine, so I think you understand why).
The question is - is it still actual? If yes, I can contribute. But I can't promise it will be too fast.

@v-hunt
Copy link

v-hunt commented Dec 4, 2022

What I've found: this guy created a custom Airflow plugin for Cloud Run Jobs: link
Possibly, he solved this problem. I'm going to look deeper into this

@VinceLegendre
Copy link

VinceLegendre commented Dec 22, 2022

Hey guys, just found out about this issue !

@mharrisb1 happy to collaborate with you on this one.
I think we should be be able to merge the CRUD operations you introduced in your plugin with the changes I proposed in this PR : #28525.

What I had in mind was to design the execution of an existing job in the same way as DataflowStartFlexTemplateOperator does with flex templates.

Would you have any thoughts on this ?

@VinceLegendre
Copy link

After a closer look a this plugin, here are some thoughts :

  • CloudRunHook should extend GoogleBaseHookto ease authentication and GCP configuration
  • As gcloud CLI propose the --execute-now flag when creating jobs, I think the following logic/naming convention may be more straightforward :
    • Have a CloudRunCreateJobOperator with execute_now, update_if_exists and delete_on_exit capabilities, to allow job definition, run and deletion from Airflow
    • Have a separate CloudRunExecuteJobOperator, allowing one to execute a pre-created job in a GCP project
  • CloudRunListJobs and CloudRunDeleteJob operators to complete CRUD capabilities for jobs, as introduced in the plugin documentation
  • Regarding executions, is the DELETE operation mandatory to support ? I may miss some use cases here

Happy to have this issue assigned if necessary @v-hunt , as I think this would be a game changer for many GCP users!
I guess the required development would be mostly based on @mharrisb1 's really good plugin (congrats btw 👏 )

@mharrisb1
Copy link

@VinceLegendre all great thoughts.

I think most of my plugin is obsolete once someone can get https://github.com/googleapis/python-run to build correctly with the rest of the Google Cloud providers code https://pypi.org/project/apache-airflow-providers-google/. The only issue is resolving protobuf versions between the 2 (see googleapis/python-run#70). The Google team will not solve this on their side so someone will need to solve it in the Google providers code.

The official python-run library is definitely preferred over my custom client. Then yes, taking the same approach as other Google Cloud providers would be the goal. And exactly as you pointed out: extend GoogleBaseHook for auth, etc.

Once the protobuf issue is resolved then it should be an easy path to just implement operators for all CRUD and execution options. I think sensors, custom links, etc. are nice to have but could potentially be introduced in subsequent versions if someone doesn't want to implement it all at once. I would though consider all the CRUD operators as part of the completion requirements since that allows full control over the resource lifecycle.

@VinceLegendre
Copy link

@mharrisb1 Is the build issue you mention specific to cloud-run v2 API ?
As v2 does not seem to support jobs & executions CRUD for the moment, maybe we can stick to v1 for the time being ?

If so, v1 API seems to build correctly with the rest of google cloud providers, at least locally. CloudRunJobHook.get_conn worked well in breeze with this piece of code : https://github.com/VinceLegendre/airflow/blob/add_google_cloud_run_execute_job_operator/airflow/providers/google/cloud/hooks/cloud_run.py#L166

@mohithg
Copy link

mohithg commented Feb 3, 2023

When will the official CloudRun Job operator be ready to use in production? Is there an alternative for this?

@jmantegazza
Copy link

jmantegazza commented Feb 3, 2023 via email

@r-richmond
Copy link
Contributor

r-richmond commented Feb 22, 2023

It also currently won't build with apache-airflow-providers-google because of incompatible protobuf support (see googleapis/python-run#70).

The only issue is resolving protobuf versions between the 2 (see googleapis/python-run#70). The Google team will not solve this on their side so someone will need to solve it in the Google providers code.

Edit
#29644 has been merged which should solve the protobuf==3.2.0 issue.

@yan-hic
Copy link

yan-hic commented Apr 10, 2023

Glad this sparks a lot of interest.

One thought once the operators have migrated to the official SDK is to consider a new CloudRunExecutor as an alternative to k8s - in a different github thread.

It could combine with parallelism: inject arbitrary py code + number of tasks to run concurrently, with default of one (=current executor behavior).

I have a few use cases where 100+ similar tasks run in parallel and I don't need/want each to be defined as an airflow task (would kill the UI, among others).

@eladkal eladkal mentioned this issue May 21, 2023
2 tasks
@EamonKeane
Copy link

Cloud run jobs can now last up to 24 hours, making this viable for the vast majority of tasks.

https://cloud.google.com/run/docs/create-jobs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue kind:feature Feature Requests provider:google Google (including GCP) related issues
Projects
None yet