Skip to content

Commit

Permalink
Refactor Raster.dtypes tuple to a Raster.dtype string (#528)
Browse files Browse the repository at this point in the history
  • Loading branch information
rhugonnet committed Mar 28, 2024
1 parent 914ddf3 commit ae96994
Show file tree
Hide file tree
Showing 10 changed files with 57 additions and 68 deletions.
2 changes: 1 addition & 1 deletion doc/source/about_geoutils.md
Expand Up @@ -38,7 +38,7 @@ In particular, GeoUtils:
- Strives to rely on **lazy operations** under-the-hood to avoid unnecessary data loading,
- Allows for **match-reference operations** to facilitate geospatial handling,
- Re-implements **several of [GDAL](https://gdal.org/)'s features** missing in other packages (e.g., proximity, gdalDEM),
- Naturally handles **different `dtypes` and `nodata`** values through its NumPy masked-array interface.
- Naturally handles **different `dtype` and `nodata`** values through its NumPy masked-array interface.


```{note}
Expand Down
2 changes: 1 addition & 1 deletion doc/source/api.md
Expand Up @@ -64,7 +64,7 @@ documentation.
Raster.bands_on_disk
Raster.res
Raster.bounds
Raster.dtypes
Raster.dtype
Raster.is_loaded
Raster.is_modified
Raster.name
Expand Down
2 changes: 1 addition & 1 deletion doc/source/core_array_funcs.md
Expand Up @@ -30,7 +30,7 @@ Universal functions can take one or two inputs, and return one or two outputs. T
the output will be a {class}`~geoutils.Raster`. If there is a second input, it can be a {class}`~geoutils.Raster` or {class}`~numpy.ndarray` with
matching georeferencing or shape, respectively.

These functions inherently support the casting of different {attr}`~geoutils.Raster.dtypes` and values masked by {attr}`~geoutils.Raster.nodata` in the
These functions inherently support the casting of different {attr}`~geoutils.Raster.dtype` and values masked by {attr}`~geoutils.Raster.nodata` in the
{class}`~numpy.ma.MaskedArray`.

Below, we re-use the same example created in {ref}`core-py-ops`.
Expand Down
2 changes: 1 addition & 1 deletion doc/source/core_py_ops.md
Expand Up @@ -66,7 +66,7 @@ rast - (rast**0.5)
```

If an unmasked {class}`~numpy.ndarray` is passed, it will internally be cast into a {class}`~numpy.ma.MaskedArray` to respect the propagation of
{class}`~geoutils.Raster.nodata` values. Additionally, the {attr}`~geoutils.Raster.dtypes` are also reconciled as they would for {class}`~numpy.ndarray`,
{class}`~geoutils.Raster.nodata` values. Additionally, the {attr}`~geoutils.Raster.dtype` are also reconciled as they would for {class}`~numpy.ndarray`,
following [standard NumPy coercion rules](https://numpy.org/doc/stable/reference/generated/numpy.find_common_type.html).

## Logical comparisons cast to {class}`~geoutils.Mask`
Expand Down
4 changes: 2 additions & 2 deletions doc/source/feature_overview.md
Expand Up @@ -179,7 +179,7 @@ rast += 1
```

Additionally, the {class}`~geoutils.Raster` object possesses a NumPy masked-array interface that allows to apply to it any [NumPy universal function](https://numpy.org/doc/stable/reference/ufuncs.html) and
most other NumPy array functions, while logically casting {class}`dtypes<numpy.dtype>` and respecting {attr}`~geoutils.Raster.nodata` values.
most other NumPy array functions, while logically casting {class}`dtype<numpy.dtype>` and respecting {attr}`~geoutils.Raster.nodata` values.

```{code-cell} ipython3
# Apply a normalization to the raster
Expand All @@ -204,7 +204,7 @@ Masks can then be used for indexing a {class}`~geoutils.Raster`, which returns a
values_aoi = rast[mask_aoi]
```

Masks also have simplified, overloaded {class}`~geoutils.Raster` methods due to their boolean {class}`dtypes<numpy.dtype>`. Using {func}`~geoutils.Raster.polygonize` with a
Masks also have simplified, overloaded {class}`~geoutils.Raster` methods due to their boolean {class}`dtype<numpy.dtype>`. Using {func}`~geoutils.Raster.polygonize` with a
{class}`~geoutils.Mask` is straightforward, for instance, to retrieve a {class}`~geoutils.Vector` of the area-of-interest:

```{code-cell} ipython3
Expand Down
4 changes: 2 additions & 2 deletions doc/source/mask_class.md
Expand Up @@ -37,8 +37,8 @@ There is no {class}`~geoutils.Raster.nodata` value defined in a {class}`~geoutil
method from {class}`~geoutils.Raster`.

```{important}
Most raster file formats such a [GeoTIFFs](https://gdal.org/drivers/raster/gtiff.html) **do not support {class}`bool` array {class}`dtypes<numpy.dtype>`
on-disk**, and **most of Rasterio functionalities also do not support {class}`bool` {class}`dtypes<numpy.dtype>`**.
Most raster file formats such a [GeoTIFFs](https://gdal.org/drivers/raster/gtiff.html) **do not support {class}`bool` array {class}`dtype<numpy.dtype>`
on-disk**, and **most of Rasterio functionalities also do not support {class}`bool` {class}`dtype<numpy.dtype>`**.
To address this, during opening, saving and geospatial handling operations, {class}`Masks<geoutils.Mask>` are automatically converted to and from {class}`numpy.uint8`.
The {class}`~geoutils.Raster.nodata` of a {class}`~geoutils.Mask` can now be defined to save to a file, and defaults to `255`.
Expand Down
6 changes: 3 additions & 3 deletions doc/source/raster_class.md
Expand Up @@ -35,7 +35,7 @@ A first category includes georeferencing attributes directly derived from {attr}
{attr}`~geoutils.Raster.height`, {attr}`~geoutils.Raster.width`, {attr}`~geoutils.Raster.res`, {attr}`~geoutils.Raster.bounds`.

A second category concerns the attributes derived from the raster array shape and type: {attr}`~geoutils.Raster.count`, {attr}`~geoutils.Raster.bands` and
{attr}`~geoutils.Raster.dtypes`. The two former refer to the number of bands loaded in a {class}`~geoutils.Raster`, and the band indexes.
{attr}`~geoutils.Raster.dtype`. The two former refer to the number of bands loaded in a {class}`~geoutils.Raster`, and the band indexes.

```{important}
The {attr}`~geoutils.Raster.bands` of {class}`rasterio.io.DatasetReader` start from 1 and not 0, be careful when instantiating or loading from a
Expand Down Expand Up @@ -150,14 +150,14 @@ might result in larger memory usage than in the original {class}`~geoutils.Raste
Thanks to the {ref}`core-array-funcs`, **NumPy functions applied directly to a {class}`~geoutils.Raster` will respect {class}`~geoutils.Raster.nodata`
values** as well as if computing with the {class}`~numpy.ma.MaskedArray` or an unmasked {class}`~numpy.ndarray` filled with {class}`~numpy.nan`.
Additionally, the {class}`~geoutils.Raster` will automatically cast between different {class}`dtypes<numpy.dtype>`, and possibly re-define missing
Additionally, the {class}`~geoutils.Raster` will automatically cast between different {class}`dtype<numpy.dtype>`, and possibly re-define missing
{class}`nodatas<geoutils.Raster.nodata>`.
```

## Arithmetic

A {class}`~geoutils.Raster` can be applied any pythonic arithmetic operation ({func}`+<operator.add>`, {func}`-<operator.sub>`, {func}`/<operator.truediv>`, {func}`//<operator.floordiv>`, {func}`*<operator.mul>`,
{func}`**<operator.pow>`, {func}`%<operator.mod>`) with another {class}`~geoutils.Raster`, {class}`~numpy.ndarray` or number. It will output one or two {class}`Rasters<geoutils.Raster>`. NumPy coercion rules apply for {class}`dtypes<numpy.dtype>`.
{func}`**<operator.pow>`, {func}`%<operator.mod>`) with another {class}`~geoutils.Raster`, {class}`~numpy.ndarray` or number. It will output one or two {class}`Rasters<geoutils.Raster>`. NumPy coercion rules apply for {class}`dtype<numpy.dtype>`.

```{code-cell} ipython3
# Add 1 and divide raster by 2
Expand Down
33 changes: 16 additions & 17 deletions geoutils/raster/raster.py
Expand Up @@ -175,7 +175,6 @@ def _default_nodata(dtype: DTypeLike) -> int:
"count",
"crs",
"driver",
"dtypes",
"height",
"indexes",
"name",
Expand Down Expand Up @@ -556,7 +555,7 @@ def __init__(
self._is_modified = True
self._disk_shape: tuple[int, int, int] | None = None
self._disk_bands: tuple[int] | None = None
self._disk_dtypes: tuple[str] | None = None
self._disk_dtype: str | None = None
self._disk_transform: affine.Affine | None = None
self._downsample: int | float = 1
self._area_or_point: Literal["Area", "Point"] | None = None
Expand Down Expand Up @@ -614,7 +613,7 @@ def __init__(

self._disk_shape = (ds.count, ds.height, ds.width)
self._disk_bands = ds.indexes
self._disk_dtypes = ds.dtypes
self._disk_dtype = ds.dtypes[0]
self._disk_transform = ds.transform

# Check number of bands to be loaded
Expand Down Expand Up @@ -746,11 +745,11 @@ def is_loaded(self) -> bool:
return self._data is not None

@property
def dtypes(self) -> tuple[str, ...]:
"""Data type for each raster band (string representation)."""
if not self.is_loaded and self._disk_dtypes is not None:
return self._disk_dtypes
return (str(self.data.dtype),) * self.count
def dtype(self) -> str:
"""Data type of the raster (string representation)."""
if not self.is_loaded and self._disk_dtype is not None:
return self._disk_dtype
return str(self.data.dtype)

@property
def bands_on_disk(self) -> None | tuple[int, ...]:
Expand Down Expand Up @@ -1092,7 +1091,7 @@ def to_rio_dataset(self) -> rio.io.DatasetReader:
height=self.height,
width=self.width,
count=self.count,
dtype=self.dtypes[0],
dtype=self.dtype,
crs=self.crs,
transform=self.transform,
nodata=self.nodata,
Expand Down Expand Up @@ -1729,8 +1728,8 @@ def set_nodata(
raise ValueError("Type of nodata not understood, must be float or int.")

if new_nodata is not None:
if not rio.dtypes.can_cast_dtype(new_nodata, self.dtypes[0]):
raise ValueError(f"nodata value {new_nodata} incompatible with self.dtype {self.dtypes[0]}")
if not rio.dtypes.can_cast_dtype(new_nodata, self.dtype):
raise ValueError(f"nodata value {new_nodata} incompatible with self.dtype {self.dtype}")

# If we update mask or array, get the masked array
if update_array or update_mask:
Expand Down Expand Up @@ -1838,8 +1837,8 @@ def data(self, new_data: NDArrayNum | MArrayNum) -> None:
dtype = str(self._data.dtype)
orig_shape = self._data.shape
# If filename exists
elif self._disk_dtypes is not None:
dtype = self._disk_dtypes[0]
elif self._disk_dtype is not None:
dtype = self._disk_dtype
if self._out_count == 1:
orig_shape = self._out_shape
else:
Expand Down Expand Up @@ -2027,7 +2026,7 @@ def info(self, stats: bool = False, verbose: bool = True) -> None | str:
f"Modified since load? {self.is_modified} \n",
f"Grid size: {self.width}, {self.height}\n",
f"Number of bands: {self.count:d}\n",
f"Data types: {self.dtypes}\n",
f"Data types: {self.dtype}\n",
f"Coordinate system: {[self.crs.to_string() if self.crs is not None else None]}\n",
f"Nodata value: {self.nodata}\n",
f"Pixel interpretation: {self.area_or_point}\n",
Expand Down Expand Up @@ -2596,7 +2595,7 @@ def reproject(
# Set output dtype
if dtype is None:
# Warning: this will not work for multiple bands with different dtypes
dtype = self.dtypes[0]
dtype = self.dtype

# --- Set source nodata if provided -- #
if force_source_nodata is None:
Expand Down Expand Up @@ -3000,7 +2999,7 @@ def to_xarray(self, name: str | None = None) -> xr.DataArray:
"""

# If type was integer, cast to float to be able to save nodata values in the xarray data array
if np.issubdtype(self.dtypes[0], np.integer):
if np.issubdtype(self.dtype, np.integer):
# Nodata conversion is not needed in this direction (integer towards float), we can maintain the original
updated_raster = self.astype(np.float32, convert_nodata=False)
else:
Expand Down Expand Up @@ -4012,7 +4011,7 @@ def polygonize(
gpd_dtypes = ["uint8", "uint16", "int16", "int32", "float32"]
list_common_dtype_index = []
for gpd_type in gpd_dtypes:
polygonize_dtype = np.promote_types(gpd_type, self.dtypes[0])
polygonize_dtype = np.promote_types(gpd_type, self.dtype)
if str(polygonize_dtype) in gpd_dtypes:
list_common_dtype_index.append(gpd_dtypes.index(gpd_type))
if len(list_common_dtype_index) == 0:
Expand Down

0 comments on commit ae96994

Please sign in to comment.