Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Suppressing GDAL errors #289

Open
cheginit opened this issue Sep 14, 2023 · 8 comments
Open

ENH: Suppressing GDAL errors #289

cheginit opened this issue Sep 14, 2023 · 8 comments

Comments

@cheginit
Copy link

GDAL's python binding has this option to suppress all warnings temporarly:

from osgeo import gdal

with gdal.ExceptionMgr(useExceptions=False), gdal.quiet_errors():
   # some operation

This is gdal's quiet_errors function:

@contextlib.contextmanager
def quiet_errors():
    """Temporarily install an error handler that silents all warnings and errors.

   Returns
   -------
        A context manager

   Example
   -------

       with gdal.ExceptionMgr(useExceptions=False), gdal.quiet_errors():
           gdal.Error(gdal.CE_Failure, gdal.CPLE_AppDefined, "you will never see me")
    """
    PushErrorHandler("CPLQuietErrorHandler")
    try:
        yield
    finally:
        PopErrorHandler()

I haven't been able to find a similar functionality in pyogrio. It would be very helpful since the warning messages can be very extensive!

@cheginit cheginit changed the title Feature Request: Suppressing GDAL errors ENH: Suppressing GDAL errors Sep 14, 2023
@brendan-ward
Copy link
Member

Are you wanting to hide only the warnings emitted by GDAL, or also hide warnings emitted by pyogrio when using the GDAL API? We also emit our own warnings that may indicate certain issues.

Fatal errors using GDAL are converted to Python exceptions instead of warnings; you can sidestep some of those with try / except blocks but generally they indicate something failed badly.

@cheginit
Copy link
Author

I want to hide the warning/errors when using pyogrio as the engine for reading files with geopandas. Seeing those warnings and errors is useful only once, so I can investigate and make necessary changes. But, I would rather not see the warnings when I rerun the code. When I use gdal's quiet_errors with gpd.read_file it works well and hides the warnings, but I would prefer not to have osgeo as a hard dependency.

@brendan-ward
Copy link
Member

Can you give us some examples of warnings / errors emitted by GDAL / pyogrio that you'd like to suppress?

You should be able to use warnings.filterwarnings to suppress warnings and try / exception to suppress errors, but it is possible some of the GDAL errors / warnings are still making their way through.

@cheginit
Copy link
Author

Sure. This is what I have:

if self._engine == "pyogrio":
    import pyogrio

    try:
        pyogrio.set_gdal_config_options(
            {"OGR_GEOMETRY_ACCEPT_UNCLOSED_RING": "YES", "OGR_ORGANIZE_POLYGONS": "SKIP"}
        )
        warnings.filterwarnings("ignore", message=".*Non closed ring detected.*")
        warnings.filterwarnings("ignore", message=".*translated to Simple Geometry.*")
        return gpd.read_file(gdb, engine="pyogrio", use_arrow=True)
    except GEOSException:
        return gpd.read_file(gdb)
else:
    return gpd.read_file(gdb)

But I still get Warning 1: Non closed ring detected. and Warning 1: Geometry of polygon cannot be translated to Simple Geometry. All polygons will be contained in a multipolygon.

@brendan-ward
Copy link
Member

Thanks for the extra info, that should help us consider how best to suppress warnings like these from GDAL.

Are you able to share a small subset of your dataset with a record that triggers that error? I'm thinking it might be hard for us to fabricate a test dataset with such issues.

@cheginit
Copy link
Author

That's strange. While I was preparing a reproducible example and creating an environment that includes only the necessary packages, the warnings didn't show. But when I include other packages that I need for my project, the warnings appear again. Is it possible that some other packages can cause warnings to not get caught?

The source code and the dataset are public. Here's the code for retrieving the data:

from pygeohydro import EHydro
import numpy as np
import shapely
from pynhd import NLDI

nldi = NLDI()
flw = nldi.navigate_byid("nwissite", "USGS-14246900", "upstreamMain", "flowlines", 400)

ehydro = EHydro()
idx = ehydro.survey_grid.sindex.query(shapely.box(*flw.total_bounds))
grid = ehydro.survey_grid.iloc[idx].reset_index(drop=True)
_, idx = grid.sindex.query(grid.geometry, predicate="intersects")
_, freq = np.unique(idx, return_counts=True)
grid = grid.iloc[np.where(freq > 1)[0]]
geom = grid.unary_union
bathy = ehydro.bygeom(geom, grid.crs)

The snippet downloads the datasets, which are saved as (many) zip files under ./cache directory. So, if you want the files, you can find them there.

The relevant part of the code that uses pyogrio is here.

If you create a simple environment like this, the warnings will not be shown:

I just pushed the latest commit to pygeohydro so you need to install it from git:

mamba create -y -n ogr pygeohydro pyogrio ipykernel
mamba activate ogr
pip install --no-deps git+https://github.com/hyriver/pygeohydro

But if you create the env using this, the warning will be shown:

name: ogr
channels:
- conda-forge
- nodefaults
dependencies:
- python>=3.10

# async-retriever deps
- aiodns
- aiosqlite
- aiohttp >=3.8.3
- brotli
- cytoolz
- nest-asyncio
- aiohttp-client-cache >=0.8.1
- ujson

# pygeoogc deps
# - async-retriever>=0.15,<0.16
- cytoolz
- defusedxml
- joblib
- multidict
- owslib>=0.27.2
- pyproj>=3.0.1
- requests
- requests-cache>=0.9.6
- shapely>=1.8.5
- ujson
- url-normalize>=1.4
- urllib3
- yarl

# pygeoutils deps
- cytoolz
- geopandas >=0.7
- netcdf4
- numpy >=1.21
- pyproj >=2.2
- rasterio >=1.2
- rioxarray >=0.11
- scipy
- shapely >=2.0
- ujson
- xarray >=2023.01.0

# hydrosignatures deps
- numpy
- pandas
- scipy
- xarray
# optional deps
- numba

# py3dep
# - async-retriever >=0.3.6
- click >=0.7
- cytoolz
- numpy >=1.21
# - pygeoogc >=0.13.7
# - pygeoutils >=0.13.7
- rasterio >=1.2
- rioxarray >=0.11
- scipy
- shapely >=2.0
- xarray >=2023.01.0
# optional dep
- pyflwdir >=0.5.6

# pynhd deps
# - async-retriever >=0.3.6
- cytoolz
- geopandas >=0.9
- networkx
- numpy >=1.21
- pandas >=1.0
- pyarrow >=1.0.1
# - pygeoogc >=0.13.7
# - pygeoutils >=0.13.7
- shapely >=2.0
# optional deps
- pyogrio
- py7zr

# pydaymet deps
# - async-retriever >=0.3.6
- click >=0.7
- lxml
- numpy >=1.21
- pandas >=1.0
# - py3dep >=0.13.7
# - pygeoogc >=0.13.7
# - pygeoutils >=0.13.9
- rasterio >=1.2
- scipy
- shapely >=2.0
- xarray >=2023.01.0
# optional deps
- numba

# pygeohydro deps
- cytoolz
- defusedxml
- folium
- geopandas >=0.7
- h5netcdf
# - hydrosignatures >=0.1.1
- lxml
- matplotlib-base >=3.5
- numpy >=1.21
- pandas >=1.0
# - pygeoogc >=0.13.7
# - pygeoutils >=0.13.9
# - pynhd >=0.13.7
- rasterio >=1.2
- rioxarray >=0.11.0
- scipy
- shapely >=2.0
- xarray >=2023.01.0
# optional deps
- planetary-computer
- pystac-client

# pynldas2
# - async-retriever >=0.3.6
- h5netcdf
- numpy >=1.21
- pandas >=1.0
# - pygeoutils >=0.13.10
- pyproj >=2.2
- rioxarray >=0.11
- xarray >=2023.01.0

# optional deps for speeding up some operations
- bottleneck

# bathy deps
- numba
- pandamesh
- shapelysmooth

# plotting deps
- mapclassify
- contextily
- hvplot
- tqdm
- xarray-spatial
- datashader

# dev deps
- ipywidgets
- ipykernel
- pre-commit

- pip
- pip:
  - git+https://github.com/hyriver/async-retriever.git
  - git+https://github.com/hyriver/hydrosignatures.git
  - git+https://github.com/hyriver/pygeoogc.git
  - git+https://github.com/hyriver/pygeoutils.git
  - git+https://github.com/hyriver/pynhd.git
  - git+https://github.com/hyriver/py3dep.git
  - git+https://github.com/hyriver/pydaymet.git
  - git+https://github.com/hyriver/pynldas2.git
  - git+https://github.com/hyriver/pygeohydro.git

@brendan-ward
Copy link
Member

I wonder if one of the other packages in the larger environment is changing the state of warning filtering. Like you say, in the minimal environment, I do not get these warnings if I filter them warnings.filterwarnings("ignore", message=".*Non closed ring detected.*"), but I get them if I don't filter them when I try to read one of the problematic MultiPolygon layers (e.g., read_dataframe(".../cache/CL_27_MGN_20150423.ZIP!CL_27_MGN_20150423.gdb",layer="Bathymetry_Vector").

Within the same script, if I set warnings to show all warnings via warnings.simplefilter("always"), even after first setting the filter on warnings, then I see all instances of the GDAL warnings raised. I'm not sure how the state of warnings filtering gets updated across the packages you import. However, the only import I'm seeing within geopandas.read_file when using pyogrio imports pyogrio, which you already have in scope. So I'm not seeing a place where warnings would be filtered differently after you set them.

I don't use conda / mamba enough to guess at how that might cause one environment to raise warnings and the other not to.

This doesn't negate wanting to add a global way of disabling warnings / errors from GDAL, just that warning suppression seems to be dependent on environment.

@cheginit
Copy link
Author

cheginit commented Oct 4, 2023

I also think in the large environment, the versions of gdal and other packages are not the same and maybe this was an issue in previous versions that has been fixed in later versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants