Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets: improve lazy import error msg for missing deps #2054

Merged
merged 12 commits into from May 15, 2024

Conversation

adamjstewart
Copy link
Collaborator

@adamjstewart adamjstewart commented May 8, 2024

TorchGeo has a number of optional dependencies only needed by certain datasets. We use lazy imports to allow these dependencies to be optional. This PR improves these error messages to make it more clear how to solve the issue.

Before

ImportError: scipy is not installed and is required to use this dataset

After

torchgeo.datasets.errors.DependencyNotFoundError: scipy is not installed and is required to use this dataset. Either run:

$ pip install scipy

to install just this dependency, or:

$ pip install torchgeo[datasets]

to install all optional dataset dependencies.

This has a number of other benefits:

  • Reduces code duplication (making it easier to update the message for all imports if we want to, e.g., add conda installation instructions)
  • Allows us to easily fail fast during dataset instantiation
  • Makes it easier to get 100% coverage on all datasets in CI: test optional datasets on every commit #2045

Closes #2045
Prerequisite for #2043

@adamjstewart adamjstewart added this to the 0.6.0 milestone May 8, 2024
@github-actions github-actions bot added documentation Improvements or additions to documentation datasets Geospatial or benchmark datasets testing Continuous integration testing labels May 8, 2024
torchgeo/datasets/errors.py Outdated Show resolved Hide resolved
module_to_pypi: dict[str, str] = collections.defaultdict(lambda: name)
module_to_pypi |= {'cv2': 'opencv-python', 'skimage': 'scikit-image'}
name = module_to_pypi[name]
raise MissingDependencyError(name) from None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about raise ... from ...

Without

> python3 -c 'from torchgeo.datasets.utils import lazy_import; lazy_import("foo")'
Traceback (most recent call last):
  File "/Users/Adam/torchgeo/torchgeo/datasets/utils.py", line 781, in lazy_import
    return importlib.import_module(name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Adam/spack/var/spack/environments/default/.spack-env/view/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1140, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'foo'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/Adam/torchgeo/torchgeo/datasets/utils.py", line 788, in lazy_import
    raise MissingDependencyError(name)
torchgeo.datasets.errors.MissingDependencyError: foo is not installed and is required to use this dataset. Either run:

$ pip install foo

to install just this dependency, or:

$ pip install torchgeo[datasets]

to install all optional dataset dependencies.

With

> python3 -c 'from torchgeo.datasets.utils import lazy_import; lazy_import("foo")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/Adam/torchgeo/torchgeo/datasets/utils.py", line 788, in lazy_import
    raise MissingDependencyError(name) from None
torchgeo.datasets.errors.MissingDependencyError: foo is not installed and is required to use this dataset. Either run:

$ pip install foo

to install just this dependency, or:

$ pip install torchgeo[datasets]

to install all optional dataset dependencies.

@@ -465,7 +459,7 @@ def plot(
# Add masks
if show_feats in {'masks', 'both'} and 'masks' in sample:
mask = masks[i]
contours = find_contours(mask, 0.5) # type: ignore[no-untyped-call]
contours = skimage.measure.find_contours(mask, 0.5)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only downside of the lazy import function is that we lose type hints.

"""
try:
return importlib.import_module(name)
except ModuleNotFoundError:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently ImportError and ModuleNotFoundError are different:

>>> import foo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'foo'
>>> import numpy.foo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'numpy.foo'
>>> from numpy import foo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'foo' from 'numpy' (/Users/Adam/spack/var/spack/environments/default/.spack-env/view/lib/python3.11/site-packages/numpy/__init__.py)

"""Lazy import of *name*.

Args:
name: Name of module to import.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also check the version, but that would require an extra dep on packaging.

@calebrob6
Copy link
Member

This is just plain nice

@adamjstewart adamjstewart marked this pull request as ready for review May 9, 2024 10:42
@adamjstewart adamjstewart merged commit 189dabd into microsoft:main May 15, 2024
19 checks passed
@adamjstewart adamjstewart deleted the datasets/lazy-import branch May 15, 2024 16:03
killian31 pushed a commit to killian31/torchgeo that referenced this pull request May 17, 2024
)

* Datasets: improve lazy import error msg for missing deps

* Add type annotation

* Use lazy imports throughout datasets

* Fix support for older scipy

* Fix support for older scipy

* CI: test optional datasets on every commit

* Update minversion and fix tests

* Double quotes preferred over single quotes

* Undo for now

* Fast-fail during dataset initialization

* Remove extraneous space

* MissingDependencyError -> DependencyNotFoundError
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants