Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Necessary imports not included in setup.py #242

Open
carlosfrutos opened this issue Oct 12, 2021 · 0 comments
Open

Necessary imports not included in setup.py #242

carlosfrutos opened this issue Oct 12, 2021 · 0 comments

Comments

@carlosfrutos
Copy link

carlosfrutos commented Oct 12, 2021

Hi,

I'm developing a neural network using Pytorch in a non-databricks cluster to ensure its functionality prior migrating to a databricks cluster.

Since I'm using Pytorch, I don't need Keras or TensorFlow. I installed successfully Horovod and Sparkdl, however, when I try to run the Spark process I found (for now) three consecutive exceptions related to missing dependencies:

    from sparkdl import HorovodRunner
  File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/__init__.py", line 17, in <module>
    from sparkdl.transformers.keras_image import KerasImageFileTransformer
  File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/transformers/keras_image.py", line 16, in <module>
    import keras.backend as K
  File "/opt/conda/default/lib/python3.8/site-packages/keras/__init__.py", line 21, in <module>
    from tensorflow.python import tf2
ModuleNotFoundError: No module named 'tensorflow'
    from sparkdl import HorovodRunner
  File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/__init__.py", line 17, in <module>
    from sparkdl.transformers.keras_image import KerasImageFileTransformer
  File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/transformers/keras_image.py", line 16, in <module>
    import keras.backend as K
ModuleNotFoundError: No module named 'keras'

This one is DEPRECATED!!:

    from sparkdl import HorovodRunner
  File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/__init__.py", line 17, in <module>
    from sparkdl.transformers.keras_image import KerasImageFileTransformer
  File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/transformers/keras_image.py", line 27, in <module>
    from sparkdl.transformers.tf_image import TFImageTransformer
  File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/transformers/tf_image.py", line 18, in <module>
    import tensorframes as tfs
ModuleNotFoundError: No module named 'tensorframes'

On one hand, I don't understand why should I need these dependencies if I'm not going to use them... Shouldn't it be checked and disabled instead of forcing it to be installed?

On the other hand, if those dependencies are unavoidable, they should be included in the setup.py script to avoid having these errors and losing time, since installing Horovod packages in an ephemeral cluster takes a lot of time just to discover that you cannot run the program...

I'm sure I won't have a problem in a Databricks cluster, but I cannot use it yet and that shouldn't be a problem to test HorovodRunner functionality as stated in the warning message when running a program in a non-databricks cluster...

Kind regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant