Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove [gcp] from requirements for apache-beam[gcp] #5782

Open
ammarck opened this issue Mar 10, 2023 · 4 comments
Open

Remove [gcp] from requirements for apache-beam[gcp] #5782

ammarck opened this issue Mar 10, 2023 · 4 comments
Assignees

Comments

@ammarck
Copy link

ammarck commented Mar 10, 2023

System information

  • TFX Version (you are using): 1.12
  • Environment in which you plan to use the feature (e.g., Local
    (Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc..): Azure
  • Are you willing to contribute it (Yes/No): Yes

Describe the feature and the current behavior/state.
Removing [gcp] from apache-beam[gcp] in setup.py and all downstream projects.
We are big users of TFX in our company, however, we do not run on GCP. The problem we run into is we get a lot of python package conflicts because of that. The use of [gcp] should be optional and not a requirement.

What can be done, is to make tfx[gcp] import apache-beam[gcp], however, right now tfx imports apache-beam[gcp] by default.

Will this change the current API? How?
No
Who will benefit with this feature?
Everyone who uses TFX but do not use GCP.
Do you have a workaround or are completely blocked by this? :
Right now completely blocked. I tried to patch each TFX library to remove apache-beam[gcp] however I ran into problems when I tried to patch data-validation, I got errors when I tried to build.
Name of your Organization (Optional)

Any Other info.

@singhniraj08
Copy link
Contributor

@ammarck, Thank you for filling out this feature request.

@briron, Can you please take a look into this feature request to make [gcp] optional in apache-beam[gcp] to fix python package conflicts for TFX users not using GCP. Thanks.

@rcrowe-google
Copy link
Contributor

Hi @ammarck,

We are discussing your proposal, and had some questions:

  1. Rather than making tfx[gcp] add the import for apache-beam[gcp], would it be just as good to create tfx[no_gcp] in order to avoid the import of apache-beam[gcp]? The reasoning is to leave the default as is and avoid breaking existing users, but add the option to avoid the GCP import.
  2. What is your use-case? Are you using TFX now, and if so how?

@ammarck
Copy link
Author

ammarck commented Mar 23, 2023

@rcrowe-google hi!

  1. Yes your suggestion works for us. Our main problem is that we don't want to be limited by the transitive dependencies that gcp brings.
  2. We are using it right now to streamline ML Pipeline for an internal team. It is right now production use.

By the way, the problem of [gcp] also exist in other TFX related packages like data-validation and model-analysis. Will this work also change those as well?

@rcrowe-google
Copy link
Contributor

Hi @ammarck,

Good point about the other packages! You're correct that in order to really eliminate the dependency on GCP those other packages will also need to be updated to include that option. However, while we may be able to update TFX to make these changes (and this is far from certain), we are very unlikely to be able to make the required changes in the other packages due to resource issues on those teams.

In the original comment you replied "Are you willing to contribute it (Yes/No): Yes". Does that mean that you are willing to submit a PR to make these changes? If so, please don't start writing code right away, we need to go through a process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants