Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apache-airflow-providers-google incompatible with latest google-cloud-dataproc (2.6.0) #18485

Closed
2 tasks done
SamWheating opened this issue Sep 23, 2021 · 9 comments
Closed
2 tasks done
Assignees
Labels
area:core kind:bug This is a clearly a bug provider:google Google (including GCP) related issues

Comments

@SamWheating
Copy link
Contributor

SamWheating commented Sep 23, 2021

Apache Airflow version

2.1.4 (latest released)

Operating System

MacOS

Versions of Apache Airflow Providers

apache-airflow-providers-google==5.1.0

Deployment

Other

Deployment details

No response

What happened

Installing the google provider and importing the DataprocSubmitJobOperator fails.

I believe this is due to the removal of the Dataproc V1Beta2 Client in googleapis/python-dataproc#253.

How to reproduce

From a fresh virtualenv:

> pip install apache-airflow-providers-google
> python
Python 3.7.3 (default, Jan 24 2020, 16:24:47)
[Clang 11.0.0 (clang-1100.0.33.16)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from airflow.providers.google.cloud.operators.dataproc import DataprocSubmitJobOperator
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/samwheating/Desktop/tmpairflow/lib/python3.7/site-packages/airflow/providers/google/cloud/operators/dataproc.py", line 33, in <module>
    from google.cloud.dataproc_v1beta2 import Cluster
ModuleNotFoundError: No module named 'google.cloud.dataproc_v1beta2'

Anything else

Its easy enough to pin the google-cloud-dataproc requirement to an older version, but should we also refactor the dataproc operators to stop using the deprecated client?

Either way I can help out with a PR.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@SamWheating SamWheating added area:core kind:bug This is a clearly a bug labels Sep 23, 2021
@kaxil
Copy link
Member

kaxil commented Sep 23, 2021

assigned the issue to you

@kaxil kaxil added the provider:google Google (including GCP) related issues label Sep 23, 2021
@SamWheating
Copy link
Contributor Author

SamWheating commented Sep 23, 2021

Thanks -

Would it be preferable to simply pin the version to >=2.2.0,<2.6.0 as a quick fix, or should I update all of the dataproc hooks and operators?

@uranusjr
Copy link
Member

uranusjr commented Sep 23, 2021

How long do you estimate the hook/operator fix would be? If that needs a while, it may be a good idea to merge the quick fix first and then work on upgrading the hooks in a subsequent PR.

@SamWheating
Copy link
Contributor Author

SamWheating commented Sep 23, 2021

Yeah agreed, its hard to estimate how long operator fix will take but it will be very easy to pin the version to a non-breaking one in the meantime.

I'll open a PR shortly and we can open a separate issue to track the deprecation of the v1_beta2 client.

@mik-laj
Copy link
Member

mik-laj commented Sep 24, 2021

I created a issue in Google repository: googleapis/python-dataproc#271

kaxil pushed a commit that referenced this issue Sep 24, 2021
…a2`` client (#18486)

Re: #18485

The removal of the v1beta2 client from the google-cloud-dataproc library in release 2.6.0 makes dataproc operators unusable.

We can get around this temporarily by pinning the installation of the library to the previous version.

In a follow-up PR I can update all of the dataproc integrations to use the stable dataproc_v1 client.
@kaxil
Copy link
Member

kaxil commented Sep 24, 2021

If you install this provider with constraints file, this will work for now.

@kaxil
Copy link
Member

kaxil commented Sep 24, 2021

e.g :

pip install apache-airflow-providers-google==5.1.0 -c https://raw.githubusercontent.com/apache/airflow/constraints-2.1.4/constraints-3.6.txt

as we have the following

https://github.com/apache/airflow/blob/constraints-2.1.4/constraints-3.6.txt#L251

@mik-laj
Copy link
Member

mik-laj commented Sep 25, 2021

Latest release has been YANKED, so it is not problem anymore.
Screenshot_20210925-140844
https://pypi.org/project/google-cloud-dataproc/#history

@lwyszomi
Copy link
Contributor

@SamWheating @kaxil @mik-laj @uranusjr

FYI migration to v1 and upgrade to 3.0.0 already sent to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug provider:google Google (including GCP) related issues
Projects
None yet
Development

No branches or pull requests

5 participants