Refactor for boundary lat/lon extraction #546

ghiggi · 2023-10-10T21:23:53Z

This PR introduces a consistent interface for dealing with area boundaries in pyresample.
It simplify the codebase and will enable to significantly simplify the code of geometry.py towards pyresample 2.0

The PR arises from my understanding that users and developers often needs:

the boundary sides (top, right, bottom, left) of the coordinate(s) array(s)
the concatenated boundary sides (without duplicates at the corners) in the form of
- as 2D vertices array: np.array([x,y]) or np.array([lon, lat])
- as a tuple : (x, y), or (lon,lat)
- only a specific geographic or projection coordinate: x , y , lon, lat
- with the last vertex equal to the first for i.e. correct shapely Polygon creation
- with the last vertex not equal to the first for i.e correct pyresample spherical operations
- with the order of the vertices to be clockwise for i.e. correct pyresample spherical operations
- with the order of the vertices to be counterclockwise for i.e correct shapely operations
- and being able to change the vertices order dynamically

The user need to be able to retrieve such coordinates in the geographic space (available for all areas) or, for specific cases (AreaDefinition), in the projection coordinates. A simple and easy to use interface for that is currently missing.

Currently we had methods which returned tuple (x,y) or sides with unconsistent behaviour regarding:

function naming
last vertex behaviour,
duplicated consecutive coordinates vertices,
(correct !!!) vertices order,
...

With this PR, I aim to make all boundary related operations consistent across a unique interface.
To this end I introduced to classes: SphericalBoundary and PlanarBoundary:

SwathDefinition objects only have the SphericalBoundary class
AreaDefinition objects:
- if the CRS is geographic, they have only the SphericalBoundary class
- if the CRS is a planar projection, they have a PlanarBoundary and SphericalBoundary

To retrieve AreaDefinition boundary projection coordinates, I added the area.projection_boundary(). This method returns a SphericalBoundary if the CRS is geographic, otherwise a PlanarBoundary class.
To retrieve the SphericalBoundary, SwathDefinition and AreaDefinition share the area.boundary() method.

To deal with boundary geographical coordinates, 4 different boundary classes existed: AreaBoundary, AreaDefBoundary, SimpleBoundary, Boundary.
In the PR, I introduced the SphericalBoundary which inherits methods from AreaBoundary
(to guarantee backward compatibility), and I deprecated the use of the other existing boundary classes.
The SphericalBoundary can be retrieved using area.boundary().

Here below I document the deprecations and the proposed replacement:

area.get_boundary_lonlats() --> DEPRECATED 
area.get_edge_lonlats()  -->  area.boundary().contour() 
get_geostationary_bounding_box_in_lonlats(area) -->  area.boundary().contour() 

area.get_edge_bbox_in_projection_coordinates() --> self.projection_boundary().contour()
get_geostationary_bounding_box_in_proj_coords(area) --> area.projection_boundary().contour() 

area.get_bbox_lonlats() -->  area.boundary().sides
area._get_geo_boundary_sides() --> area.boundary().sides

<previously missing> -->  area.projection_boundary().sides

For the SphericalBoundary and PlanarBoundary there are the following common methods:

set_clockwise()
set_counterclockwise()
sides
- Return the (sides_x, and sides_y) or (sides_lons, sides_lats) tuple depending on the class
- Each of the sides_* is an object of the new class BoundarySides
- Each of the sides_* has the properties: sides.top, sides.bottom, sides.left, sides.right, sides.vertices
vertices
- Return the concatenated boundary sides 2D vertices array.
- Depending on the class it will be np.array([x,y]) or np.array([lon, lat])
contour(closed=False)
- return the (x, y) or (lon, lat) concatenated boundary sides tuple
- if closed=True, closes the vertices so that can be passed to shapely Polygon directly
if closed=False, the vertices are not closed, and they can be passed directly to pyresample SPolygon
contour(closed=False)
to_shapely()
- returns a shapely polygon with the correct vertices orientation
plot()
- Plot the area in the geographic or projection space respectively

The SphericalBoundary has the unique properties:

boundary.x, boundary.y (returns the concatenated sides)
boundary.sides_x, boundary.sides_y (returns the single coordinate sides)
polygon

The PlanarBoundary has the unique properties:

boundary.lons, boundary.lats (returns the concatenated sides)
boundary.sides_lons, boundary.sides_lats (returns the single coordinate sides)

This PR fixes several long-standing issues:

It now ensure the correct vertices order from SwathDefinition and AreaDefinition and therefore fix the downstream spherical operations performed in get_area_slices
For AreaDefinition whose corner lies outside of the Earth disk (geostationary, polar projections, global projections others than PlateeCaree), it now enable to retrieve the correct Earth internal boundary (in spherical and planar coordinates), thus allowing data reduction in pykdtree and gradient resampling !
For geostationary AreaDefinition:
- if the area is entirely within the Earth disk, it now use the classical approach of selecting the coordinate array sides
- if some of the coordinate arrays coordinates are out of Earth disk, it uses the previously existing ad-hoc approach and
  - it now returns a correctly boundary for partial sides out of Earth disk (like CONUS)
  - it now returns the correct ordering of such geostationary partial areas.
It ensure a fixed ordering of boundary sides (top, right, bottom, left) that was previously messed up when doing force_clockwise=True in get_bbox_lonlats

This PR introduces the following backward-incompatibilities:

The number of vertices of geostationary area, for a given vertices_per_side, are now different compared to before
The pykdtree and gradient resampling now does data reduction for AreaDefinition whose corner lies outside of the Earth disk
The boundary method now returns a SphericalBoundary object instead of AreaBoundary object. The SphericalBoundary deprecates the countour_poly method in favour of the polygon attribute and deprecates the use of the draw method in favour of a plot method using cartopy. Moreover, the boundary sides object of a coordinate (sides_lons and sides_lons) are now returned as a BoundarySides objects instead of a list of 4 ordered arrays [top, right, left, bottom]. However the BoundarySides has an __iter__ method and can be treated as the previous list except that it does not allow to modify/assign list element values.

Ongoing bug in get_geostationary_bounding_box_in_proj_coords(nb_points)

The function does not return the number of nb_points defined as argument
I temporary added a fix to deal with the changing number of vertices returned
I provide here below a MRE:

 import xarray as xr
from pyproj import CRS, Proj
from shapely.geometry import Polygon
import pyresample
from pyresample import geo_filter, parse_area_file
from pyresample import AreaDefinition
from pyresample.geometry import _get_geostationary_bounding_box_in_lonlats, get_full_geostationary_bounding_box_in_proj_coords
    
def create_test_area(crs, width, height, area_extent, **kwargs):
    """Create an AreaDefinition object for testing."""
    args = (crs, (height, width), area_extent)
    args = (crs, width, height, area_extent)
    attrs = kwargs.pop("attrs", {})
    area_id = attrs.pop("name", "test_area")
    args = (area_id, "", "") + args
    area = AreaDefinition(*args, **kwargs)
    return area


def truncated_geos_area():
    """Create a truncated geostationary area."""
    projection = {'a': '6378169', 'h': '35785831', 'lon_0': '9.5', 'no_defs': 'None', 'proj': 'geos',
                  'rf': '295.488065897014', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
    area_extent = (5567248.0742, 5570248.4773, -5570248.4773, 1393687.2705)
    width = 3712
    height = 1392
    geos_area = create_test_area(projection, width, height, area_extent)
    return geos_area


geos_area = truncated_geos_area()
self = geos_area
vertices_per_side = 20


lons, lats = _get_geostationary_bounding_box_in_lonlats(self, nb_points=vertices_per_side)
lons.shape # 11 points ... not 20 !
lats.shape # 11 points

lons, lats = _get_geostationary_bounding_box_in_lonlats(self, nb_points=50)
lons.shape # 22 points  ... not 50 !
lats.shape # 22 points 

# The problem arise in get_geostationary_bounding_box_in_proj_coords
x, y = _get_geostationary_bounding_box_in_lonlats(self, 50)
x.shape # 22 ... ... not 50 !
geo_p = Polygon(np.vstack((x, y)).T)

# The nb_points is just used to define the nb_points of the full disc boundary coords
x, y = get_full_geostationary_bounding_box_in_proj_coords(self, 50)
x.shape # 50 
geo_full = Polygon(np.vstack((x, y)).T)

# This is what happen within get_geostationary_bounding_box_in_proj_coords
x, y = get_full_geostationary_bounding_box_in_proj_coords(self, 50)
ll_x, ll_y, ur_x, ur_y = geos_area.area_extent

geo_bbox = Polygon(np.vstack((x, y)).T)
area_bbox = Polygon(((ll_x, ll_y), (ll_x, ur_y), (ur_x, ur_y), (ur_x, ll_y)))
intersection = area_bbox.intersection(geo_bbox)
try:
    x, y = intersection.boundary.xy  # here random number of points (not granted that is even !)
except NotImplementedError:
    return [], []
return np.asanyarray(x[:-1]), np.asanyarray(y[:-1])

pyresample/geometry.py

ghiggi · 2023-10-11T11:24:28Z

@mraspaud this PR is ready for review. There are 3 test units failing related the gradient.__init__ code ... but I can't manage to find the root cause because of the complexity of the code. Can you start to review it and try to see what is going on?
Or alternatively, can you point me on how you debug such test units interactively? Thanks in advance !

pyresample/geometry.py

djhoese

I don't think I'm against anything that you've done here, but it makes me nervous to change some of these methods that have been around for so long. We'll need to triple check that Satpy or maybe even pyspectral isn't importing/using some of these methods and functions. I do agree that you've made things cleaner and this probably puts us in a good place to refactor this into a separate helper class in the future.

You asked about the geos boundary lonlats methods and yeah I'm not sure if that makes a ton of sense for geostationary bounds, but if there is an option for the user to get the x/y bounds then yeah it might make sense...oh wait, for non-full disk geostationary areas the lon/lats wouldn't be NaNs. You mentioned get_boundary_lonlats being used in resampling, could you point to where exactly that is?

I don't think the geostatonary bounding box functions need to be made private, but if you feel that it saves us headaches in future refactorings them I'm OK with it (as long as they aren't used in other pytroll packages).

Lastly, I just had an idea perhaps just for future interfaces: what if all of these functions that have _lonlats and _proj_coords variants of each other had just a single version of the function that returned a single object and that object had properties (or methods) for getting access to the x/y, lon/lat, or image coordinate versions of some of the information? You'd end up with something like:

my_area.get_boundary(vertices_per_side=20).get_lonlats(chunks=1000)

Or something like that.

pyresample/geometry.py

ghiggi · 2023-10-19T10:30:18Z

I try to answer to all the elements of discussion point by point here below.

Geostationary x/y sides

Sides order: [top, right, bottom, left]
Currently, for all geostationary area we have "dummy" right and left sides of shape 2
I recognize that for geostationary areas, we should implement a function that check if the image has nan edge values (out of Earth disk pixels). If some edge pixels are out of disk, we retrieve the geo sides with the current geo routine (ending up with dummy right/left sides, otherwise we use the classical approach of selecting the sides !

get_geo_<utility> functions

To be safe, I will keep the geostationary utility function as public.
However, I checked ... and satpy, pyspectral and the pyorbital packages does not use any of these functions.
In satpy, in the satpy/readers/utils.py there a some utility to extract specialized geostationary area information from AreaDefinition objects.
Some functions looks quite old code like get_geostationary_bounding_box, _lonlat_from_geos_angle.
Maybe could be useful, in a future PR, to move all geostationary utilities in a single separate module in pyresample.

Future Boundary Interface

If we want to have _lonlats and _proj_coords functions, it means there we would need a boundary class for AreaDefinition and one for SwathDefinition (without _proj_coords) objects.
Right now, from the AreaBoundary object (returned by area.boundary) we have the properties vertices, lons, lats, sides_lon, sides_lat.
We also have the method lons, lats = boundary.contour(). This method could be made a property !
We could add the property lons, lats = boundary.lonslats but I looks a bit strange to me.
When I will finalize the new spherical interface, I would like to add the to_line and to_polygon methods.
As a personal a taste, I find more readable code without too much _get_<stuffs>. _get_<stuffs> … and I guess most of relevant stuffs can be defined as properties.
Regarding the chunks option, are you sure that is needed when retrieving the lon/lat edges? One can always .rechunk or dask.array.from(array, chunks)

Boundary Sides Consistency

The order of sides can be non consistent !
Here we clearly mention that sides are defined from upper left in clockwise order: [top, right, bottom, left].
However, I noticed that when we extract a boundary, if the original sides are not clockwise, and we force_clockwise=True, then the side position currently returned by get_bbox_lonlats become [left, bottom, right, top] (see here)I think this should be fixed ... as it's prone to introduce bugs (I don't tell you the time wasted to discover this point when implementing the spherical tools for my projects ... ).
I believe that the clockwise/anticlockwise reordering should be applied only when returning the contours/vertices/lons/lats in the boundary object and not at the sides list definition in get_bbox_lonlats !
We could add to AreaBoundary the method set_clockwise() and set_anticlockwise(), which regulates the order of the returned contours/vertices/lons/lats arrays.
For clarity, we could also rename get_bbox_lonlats as get_bbox_sides. Across the repo, I often see lons, lats = self.get_bbox_lonlats which I interpret as the lons / lats arrays, while instead are the lon_sides, lat_sides list !
Regarding the boundary sides, it’s not always clear to me when I see such name, what is the inner structure of the object.
Most often is [side1, side2, side3, side4] with side = (lon, lat), sometimes we have sides_lon and sides_lats (i.e. in AreaBoundary.
Maybe we could create a BoundarySides class … so that we can store and extract easily the information.
We could have for example the properties: sides.top, sides.bottom, sides.left, sides.right, sides.vertices, sides.lons, sides.lats
Note that in the code we often use the terms side1, side2, side3, side4
The AreaBoundary methods could extract and modify the BoundarySides instance of the class.
Currently the SimpleBoundary object just accept the sides of a single array of coordinates (lat or lon) and is only used by get_boundary_lonlats. The class has no methods and just the properties side1, side2, side3, side4.

get_boundary_lonlats and SimpleBoundary usage

Return two objects: the longitude and latitudes SimpleBoundary objects.
From each SimpleBoundary object, which we can extract the side1, side2, side3, side4.
The function is used in pykdtree.py, bilinear.py and gradient.py. See here the search result
In pykdtree.py and bilinear,py is just used to call the get_valid_index_from_lonlat_boundaries method. And if we look at the code of get_valid_index_from_lonlat_boundaries, it just extracts the lon and lat of each side to pass the single arrays to another function. Thus could be easily refactored with the new interface.
In gradient.py, the method is just used to retrieve the lon, lat arrays. Could be replaced by lon, lat = area.boundary().contour()
Note also, that in data_reduce.py, the function swath_from_lonlat_boundaries expects as input the two SimpleBoundary objects, but it’s not used at all across the codebase (neither in satpy). Should be removed?
So I think we could easily remove/deprecate SimpleBoundary and the use of get_boundary_lonlats. SimpleBoundary is just used in the above described cases, and unused in satpy, pyorbital, etc.

Conclusion
Lot of stuffs to digest. I guess any of these points can be addressed in a separate PR 😄
In meanwhile, I will take care of your review comments.
Let me know your opinion for the bug (and MRE) provided in the PR description, and then let’s try to understand the cause of the failing test in gradient.py; I struggle to debug such test units ... and I look forward for your feedbacks 😄

djhoese · 2023-10-19T14:11:27Z

If we want to have _lonlats and _proj_coords functions, it means there we would need a boundary class for AreaDefinition and one for SwathDefinition (without _proj_coords) objects.

I'm not so sure. I think it is one of the failings of our past understandings that all latitude/longitude degrees are the same. That is just not true. There can be prime meridian shifts or different Earth models/datums used. So there is some idea of "native" CRS and a preferred geographic/geodetic CRS. I could easily see a helper object (as I mentioned in my previous comment) that gets you x or y or lon or lat, but allows you to convert that object to another version with a different preferred geodetic CRS.

As a personal a taste, I find more readable code without too much get. get … and I guess most of relevant stuffs can be defined as properties.
Regarding the chunks option, are you sure that is needed when retrieving the lon/lat edges? One can always .rechunk or dask.array.from(array, chunks)

Is it the get that you don't like? Or the many small methods/functions? I think hiding some of this functionality in helper classes makes it irrelevant. I also think in a lot of cases we could get rid of the get, but I'm fearful of turning most things into properties ever since we started playing with dask/xarray. With dask some of these methods end up being able to produce dask arrays and there is a real benefit performance-wise of knowing/defining what that chunk size is from the creation point versus re-chunking after.

I'm sure there are some use cases where we define polygons or boundaries where this idea of needing to define dask arrays is unnecessary, but I'd want to be careful. In a lot of cases I don't think I'd want to generate a polygon and then not immediately use it. However, that polygon may be generated from large lon/lat arrays (this is the worst case scenario I think of all the scenarios) and in that case there may be a benefit to computing the resulting polygon at the same time as all other computations (if possible) to reuse the compute lon/lat arrays (dask would share the result of the tasks). But that being the worst case maybe we just "deal with it". Lon/lat arrays are already the slowest form of computation we use.

However, I noticed that when we extract a boundary, if the original sides are not clockwise, and we force_clockwise=True, then the side position currently returned by get_bbox_lonlats become [left, bottom, right, top] (see here)I think this should be fixed ... as it's prone to introduce bugs (I don't tell you the time wasted to discover this point when implementing the spherical tools for my projects ... ).

I'm not sure I agree. "top" and "bottom" are relative. Is the top of the swath the last scan of the data or the first? Is it the most northern? What if we have a non-polar orbiting satellite? If the sides are always generated from a polygon maybe that will always be more clear (shapely or other library determines what that means)?

I believe that the clockwise/anticlockwise reordering should be applied only when returning the contours/vertices/lons/lats in the boundary object and not at the sides list definition in get_bbox_lonlats !

I'd have to check how I used this method in my Polar2Grid project. It is likely that I put it in get_bbox_lonlats because that is used internally in the area boundary class (if I recall correctly). But also, other polygon libraries probably need clockwise coordinates.

We could add to AreaBoundary the method set_clockwise() and set_anticlockwise(), which regulates the order of the returned contours/vertices/lons/lats arrays.

I don't think it is that people need the counter-clockwise (CCW) version of the coordinates, it is that the polygon/boundary math requires the clockwise (CW) version. Besides wasted performance I don't think I see a reason to never not return CW.

it’s not always clear to me when I see such name, what is the inner structure of the object.

Yep. Could probably return a namedtuple at the very least. This was just old interfaces before anyone knew what they were doing I think.

...Ah I just read your BoundarySides idea. Same thing. If it is a full class instead of a namedtuple then I might suggest we try using __slots__ to keep the "weight" of the class small.

Conclusion

I think in general I like where you're going, but I still think we're 1-2 steps away from a design that really "wows" me. I think the idea of something like an area_def.boundary.left.lonlats is exciting as it opens up a usage pattern that works for other use cases too.

djhoese · 2023-10-19T14:20:49Z

For this failing test:

pyresample/pyresample/test/test_gradient.py

Lines 246 to 254 in c0025ed

    
           def test_resample_swath_to_area_2d(self): 
        
               """Resample swath to area, 2d.""" 
        
               data = xr.DataArray(da.ones(self.src_swath.shape, dtype=np.float64), 
        
                                   dims=['y', 'x']) 
        
               with np.errstate(invalid="ignore"):  # 'inf' space pixels cause runtime warnings 
        
                   res = self.swath_resampler.compute( 
        
                       data, method='bil').compute(scheduler='single-threaded') 
        
               assert res.shape == self.dst_area.shape 
        
               assert not np.all(np.isnan(res))

Can you print out self.swath_resampler.coverage_status? My guess is this method is filtering out all chunks of the source data:

pyresample/pyresample/gradient/__init__.py

Lines 223 to 235 in c0025ed

    
           for i, covers in enumerate(self.coverage_status): 
        
               if covers: 
        
                   if is_src: 
        
                       y_start, y_end, x_start, x_end = self.src_slices[i] 
        
                   else: 
        
                       y_start, y_end, x_start, x_end = self.dst_slices[i] 
        
                   try: 
        
                       val = data[:, y_start:y_end, x_start:x_end] 
        
                   except IndexError: 
        
                       val = data[y_start:y_end, x_start:x_end] 
        
               else: 
        
                   val = None 
        
               data_out.append(val)

And then the list passed to np.max is empty.

That coverage_status is filled here:

pyresample/pyresample/gradient/__init__.py

Lines 164 to 211 in c0025ed

    
           def get_chunk_mappings(self): 
        
               """Map source and target chunks together if they overlap.""" 
        
               src_y_chunks, src_x_chunks = self.src_x.chunks 
        
               dst_y_chunks, dst_x_chunks = self.dst_x.chunks 
        
               coverage_status = [] 
        
               src_slices, dst_slices = [], [] 
        
               dst_mosaic_locations = [] 
        
               src_x_start = 0 
        
               for src_x_step in src_x_chunks: 
        
                   src_x_end = src_x_start + src_x_step 
        
                   src_y_start = 0 
        
                   for src_y_step in src_y_chunks: 
        
                       src_y_end = src_y_start + src_y_step 
        
                       # Get source chunk polygon 
        
                       src_poly = self._get_src_poly(src_y_start, src_y_end, 
        
                                                     src_x_start, src_x_end) 
        
                       dst_x_start = 0 
        
                       for x_step_number, dst_x_step in enumerate(dst_x_chunks): 
        
                           dst_x_end = dst_x_start + dst_x_step 
        
                           dst_y_start = 0 
        
                           for y_step_number, dst_y_step in enumerate(dst_y_chunks): 
        
                               dst_y_end = dst_y_start + dst_y_step 
        
                               # Get destination chunk polygon 
        
                               dst_poly = self._get_dst_poly((x_step_number, y_step_number), 
        
                                                             dst_x_start, dst_x_end, 
        
                                                             dst_y_start, dst_y_end) 
        
                               covers = check_overlap(src_poly, dst_poly) 
        
                               coverage_status.append(covers) 
        
                               src_slices.append((src_y_start, src_y_end, 
        
                                                  src_x_start, src_x_end)) 
        
                               dst_slices.append((dst_y_start, dst_y_end, 
        
                                                  dst_x_start, dst_x_end)) 
        
                               dst_mosaic_locations.append((x_step_number, y_step_number)) 
        
                               dst_y_start = dst_y_end 
        
                           dst_x_start = dst_x_end 
        
                       src_y_start = src_y_end 
        
                   src_x_start = src_x_end 
        
               self.src_slices = src_slices 
        
               self.dst_slices = dst_slices 
        
               self.dst_mosaic_locations = dst_mosaic_locations 
        
               self.coverage_status = coverage_status

But I'm just using what I get from reading the code. I didn't code this so I could be completely missing something.

ghiggi · 2023-10-19T15:22:50Z

Just to better clarify a couple of thoughts @djhoese :

I'm not sure I agree. "top" and "bottom" are relative. Is the top of the swath the last scan of the data or the first? Is it the most northern? What if we have a non-polar orbiting satellite? If the sides are always generated from a polygon maybe that will always be more clear (shapely or other library determines what that means)?

I don’t refer to the orientation in terms of geographic space, but with respect to the 2D lon/lat arrays. The [ŧop, right,bottom, left] sides of the coordinate arrays.

I'd have to check how I used this method in my Polar2Grid project. It is likely that I put it in get_bbox_lonlats because that is used internally in the area boundary class (if I recall correctly). But also, other polygon libraries probably need clockwise coordinates.

Yes the AreaBoundary class (at least in pyresample) does not correct for clockwise/anticlockwise ordering.
The force_clockwise reording was applied in get_bbox_lonlats because the outputs of this function were concatenated to get directly the lon/lat boundary vertices with the correct orientation.
But if we speak of the sides coordinate of an array, clockwise/anticlockwise order make no sense. Is the vertices derived from the sides that must follow the clockwise/anticlockwise assumption.
Right now, doing force_clockwise=True for i.e. retrograde orbits (going from east to west) scanning forward (or prograde orbits scanning backward), cause the output sides to be the [left, bottom, right, top] sides of the coordinates/image arrays.

I don't think it is that people need the counter-clockwise (CCW) version of the coordinates, it is that the polygon/boundary math requires the clockwise (CW) version. Besides wasted performance I don't think I see a reason to never not return CW.

If one want to generate Shapely polygons, it requires CCW order. If the area wraps the pole, we must not modify the ordering because the order determines which pole is wrapped in.

djhoese · 2023-10-19T15:29:44Z

Oh...so our boundary logic (or spherical geometry stuff) needs CW and shapely uses CCW? Ok so there should be an option.

So would/could/should there be a "sides" method on a boundary/polygon or should it be on the area? If there is a "sides" method on the area (and swath) then I think the equivalent of a force_clockwise needs to be on that method, right? Otherwise what if someone wants to do their own polygon math without shapely?

# Conflicts: # pyresample/geometry.py

djhoese · 2023-11-22T02:23:27Z

Ok @ghiggi I've merged this with main which includes my pre-commit changes. There was only one conflicting file on the merge and that was in geometry.py. The biggest conflict was actually just unfortunate where I moved the "edge" method from BaseDefinition to AreaDefinition right under the boundary sides methods and git was confused when it tried to merge them even though they were separate. Otherwise, the area slicing methods/functions are now in _subset.py including _get_area_boundary which you had made a change to and I made sure that was migrated over to that.

Otherwise, my merge just includes a couple things to make pre-commit happy (docstring fixes, indentation/spacing fixes, unused import after it included your changes, etc).

Let me know if you have any problems with this or anything looks confusing. I should be online at my usual times tomorrow.

…des_lats

djhoese · 2023-11-24T15:26:13Z

Very nice job. I re-read your updated PR description and I really like the goal you've set for yourself. It makes a lot of sense and I'm glad someone is trying to tackle the problems and inconsistencies.

I think my general overall concern with the changes you've made so far as that I don't like the separation between projections and geographic versions of things. I will admit I may have a misunderstanding of this, but I don't think so. To me, geographic "projections" are not a single thing. They can differ with their model of the Earth and the datum/reference coordinates used for that Earth. As you know we could also shift the prime meridian of a lon/lat CRS and it is still considered geographic.

With the above in mind, I think I'd like to look at combining the different boundary classes into a single class. Another thing that leads me to this idea is that you point out that the two classes differ in their .x/.lons, .y/.lats, and .sides_X properties. But with my understanding in mind .x == .lons and the same for the other properties. And if the CRS becomes a required parameter maybe that's OK?

I'm also not a fan of the "clockwise" behavior of the boundary classes. Specifically the .set_clockwise() and .set_counterclockwise() methods. These seem like they should either:

a. Return a new instance of the boundary with this behavior enforced/applied.
b. Make it a keyword argument to the class maybe?
c. Leave it as-is maybe?

It just feels odd to me, but I'm not sure I have any better suggestions at this point.

Wild suggestion: Are there enough deprecations and changes here that this should be moved to a single .boundary() method on the future geometry definitions?

ghiggi · 2023-11-24T15:59:50Z

@djhoese I also don't like the separation between projections and geographic versions of things, but it seems to me required when dealing with boundaries.

Here we are not speaking of geographical or projected CRS , but the presence of two types of coordinates: spherical and planar ! Maybe we should call the classes SphericalBoundary and PlanarBoundary !

For SwathDefinition, we can only define a SphericalBoundary.
For AreaDefinition with CRS with planar coordinates, we can define SphericalBoundary and PlanarBoundary.
For AreaDefinition with CRS with geographic coordinates, we can only define SphericalBoundary.

I try to elaborate here below some additional thoughts:

For (projection) planar coordinates, I can use shapely for geometry operations
For (geographic) spherical coordinates, I am on a sphere and I need to use spherical operations
For an AreaDefinition, when extracting the boundary sides projection coordinates (i.e. in meter)
and the equivalent lat/lon coordinates on the sphere I might end up with the case where:
- the concatenation of the area sides with the projection coordinates leads to clockwise vertices
- the concatenation of the area sides with the geographic coordinates leads to counterclockwise vertices
  --> From there one of the reason to define 2 separate objects
The boundary sides refers just to the side position (top, right, bottom, left) of the coordinate array.
When we create the *Boundary class, I check what is the initial order of the concatenated sides.
set_clockwise()and set_counterclockwise() does not modify any of the side data,
but set an internal flag that is used to choose if the concatenated sides must be reversed when
calling x, y, lat, lon, contour or vertices methods.
I defined the *Boundary class to have a order argument defaulting to None, which means
return boundary vertices as the order they are. The user can specify order="clockwise" or order="counterclockwise"
when creating the class, or by calling set_clockwise() and set_counterclockwise() later on, but this does not modify
any internal data of the boundary sides and does not trigger any computation !

Side note: there was already a method boundary() I created 2-3 months ago that returned an AreaBoundary (geographic coordinates)
that now I deprecated in favour of geographic_boundary which is returning the GeographicBoundary.
Maybe we can keep the .boundary(), make it return GeographicBoundary (instead of AreaBoundary) and remove the proposed geographic_boundary method if you are willing to accept that a method return a class with a different name (GeographicBoundary has all methods of AreaBoundary)

…rdinates

djhoese · 2023-11-24T19:21:25Z

Ah the Spherical versus Planar makes this make more sense now.

For an AreaDefinition, when extracting the boundary sides projection coordinates (i.e. in meter)
and the equivalent lat/lon coordinates on the sphere I might end up with the case where:
the concatenation of the area sides with the projection coordinates leads to clockwise vertices
the concatenation of the area sides with the geographic coordinates leads to counterclockwise vertices
--> From there one of the reason to define 2 separate objects

I don't understand why the type of coordinate retrieved (projection or geographic) would effect clockwise or counterclockwise. Don't we have enough cases of swaths (ascending and descending orbits or other old instrument types) and areas (north-up areas, south-up like FCI, and south-up/east-left like SEVIRI) where any coordinate combination could give you any orientation/ordering?

Besides the cases of what data we started with, what are cases where we need to use spherical boundaries even if we have a projected/planar area? Also, are there cases where we have geographic boundaries but can assume they don't cross the anti-meridian and could therefore be treated as planar coordinates? For example, a mercator or eqc projected dataset is not technically valid if it crosses the anti-meridian (it goes outside the coordinate space of the projection) so there is no need to treat it as a sphere. Similarly, if it does cross the anti-meridian in lon/lat space, but we shift the prime meridian we could (in some cases that don't cross the poles) treat the lon/lat coordinates as planar too.

I think I've realized one of the main reasons I don't like set_clockwise is that it puts state into the object and makes it mutable. It really feels like these objects should be immutable and only be containers for the boundary information and not modifiers of it. I realize that the clockwise wish property is not modifying the data until it is returned, but it causes confusion when using it. I would almost feel better if the boundary object knew the order/orientation of the data (either told by the user as a kwarg or determined by the class when needed) and besides automatically converting for things like shapely object creation, it could probably methods for forcing it that return a new copy of the boundary with the modifications done already. If the orientation/order is the same as the Boundary already then return self. If not, convert and return a new object.

This also makes me think, could the pyresample spherical stuff be made to detect the closed/not-closed state of the polygon provided to it and adjust as needed? That way there doesn't need to be special handling...eh I guess this is a nice convenience for any user to have that ability to get closed or open boundary coordinates.

…endency injections

ghiggi · 2023-11-24T20:16:23Z

Thansk for all these discussions @djhoese. They also help me to completely free out my thoughts.
Answering to your questions:

Besides the cases of what data we started with, what are cases where we need to use spherical boundaries even if we have a projected/planar area?

All pykdtree based stuffs currently exploit spherical boundaries for data reduction (see contour_poly in get_area_slices) !
Only gradient use planar boundaries (and fails then in many cases)

Also, are there cases where we have geographic boundaries but can assume they don't cross the anti-meridian and could therefore be treated as planar coordinates?

Yes many cases. With this PR, we will be in a good place to perform data reduction, on the sphere or with shapely, based on the type of source and target area.
We need to perform operation on the sphere if we have:

a swath crossing the antimeridian
a swath overpassing a pole,
a projection area enclosing the pole
a projection area crossing the antimeridian

And we need to stay on the sphere if the two areas CRS bounds does not intersect !

I think I've realized one of the main reasons I don't like set_clockwise is that it puts state into the object and makes it mutable.

I agree with that and I strongly push to deprecate the existing use of force_clockwise.
The order should just be set when creating the shapely Polygon or the spherical Polygon depending on the software requirements.
In this PR, if set_*clockwise is not called, it returns the boundary vertices as they appear in the coordinate array (top, right, bottom, left)

This also makes me think, could the pyresample spherical stuff be made to detect the closed/not-closed state of the polygon provided to it and adjust as needed?

That's just requires to check if the first vertex is equal to the last

…l planar projections, ...

…r projections

djhoese · 2023-11-25T03:37:23Z

We mentioned this on slack, but if backwards compatibility is limiting this design at all then I am very in favor of setting up as a boundary method in the future geometries that use this "perfect" boundary class (or classes if needed). The existing/legacy/old geometry definitions could use a special subclass of these future boundary classes that does something like:

from .future.boundary import Boundary as FutureBoundary

class _OldBoundary:
    ...

class LegacyBoundary(FutureBoundary, _OldBoundary):
    ...

That is, the boundary used by the existing AreaDefinition would be LegacyBoundary which is a combination of the FutureBoundary and the old boundary that you didn't want to hold on to just for backwards compatibility. That way when we switch to Pyresample 2.0 we get the "best" design instead of having to still deal with workarounds and hacks just to deal with backwards compatibility.

All pykdtree based stuffs currently exploit spherical boundaries for data reduction (see contour_poly in get_area_slices) !

For some reason I was sure that the spherical stuff was doing stuff in XYZ geocentric coordinate space, but you're right it is all lon/lat. Interestingly the SphPolygon computes the X/Y/Z coordinates but then doesn't ever use them from what I can tell:

pyresample/pyresample/spherical.py

Lines 547 to 552 in 25d0c72

    
           self.cvertices = np.array([np.cos(self.lat) * np.cos(self.lon), 
        
                                      np.cos(self.lat) * np.sin(self.lon), 
        
                                      np.sin(self.lat)]).T * radius 
        
           self.x__ = self.cvertices[:, 0] 
        
           self.y__ = self.cvertices[:, 1] 
        
           self.z__ = self.cvertices[:, 2]

Lastly, about inheritance versus composition/dependency injection: These classes seem very similar and almost not different at all. Unless you have a strong argument against it, it really seems better to require a crs kwarg associated with the provided boundary coordinates/vertices and then do a few if statements to separate out the if self.crs.is_geographic logic when needed. Otherwise, a "composition" style implementation would be something like doing:

class Boundary:
    def __init__(self, ...):
        if self.crs.is_geographic:
            self._polygon_helper = GeographicPolygonHelper(self.vertices, self.crs)
        else:
            self._polygon_helper = PlanarPolygonHelper(self.vertices, self.crs)

    def to_shapely_polygon():
        return self._polygon_helper.to_shapely_polygon()

Now of course I haven't done all of the necessary checks about how the code differs, but I wanted to give an example of what I was thinking/talking about when mentioning these topics.

I'd also like to rethink this idea of passing the type of polygon class to generate with a string keyword argument. There are a lot of design patterns that exist that could help us with this, but at the moment I don't have time to dive into them. This could easily be one of the last decisions made about this PR so not something we need to do right now.

ghiggi · 2023-11-27T22:53:41Z

pyresample/geometry.py

@@ -250,7 +251,8 @@ def get_lonlats(self, data_slice=None, chunks=None, **kwargs):
            # lons/lats are xarray DataArray objects, use numpy/dask array underneath
            lons = lons.data
            lats = lats.data
-
+        # TODO
+        # --> if data_slice and chunks provided, why here first rechunk all array and then subset?


Maybe here we can increase a bit performance ...

The data_slice case is not really utilized in Satpy so the chunking was implemented separate from that. The idea is/was that the chunking provided by the user is to match some other array (the band data from a Satpy reader for example). I suppose the chunking could be done after...but I'm scared of edge cases and unexpected consequences. If you know of a chunks + data_slice case coming from Satpy please let me know and point it out.

ghiggi · 2023-11-27T23:01:34Z

pyresample/gradient/__init__.py

-        lon_b = np.concatenate((lons.side1, lons.side2, lons.side3, lons.side4))
-        lat_b = np.concatenate((lats.side1, lats.side2, lats.side3, lats.side4))
-
+        vertices_per_side = 3600


This is exagerated and really slow down stuffs! Why has been set so large ?
If it was used to avoid cutting out portions of GEO area ... now I fixed the underlying bug !!!

ghiggi · 2023-11-27T23:04:27Z

pyresample/test/test_geometry/test_area.py

+    """Create an AreaDefinition object for testing."""
+    area = AreaDefinition(crs=crs, shape=shape, area_extent=area_extent, **kwargs)
+    return area
+


Seems that pytest fixture has a bug with nested call when passed to pytest mark parametrize.
Instead of using the fixture defined in conftest, this enable to test stuffs. Maybe @djhoese you want to check it out ... I spend 3-4 hours ... and did find a way to fix it

Fixtures can't be used inside parametrize. You need to use the third-party lazy_fixtures. You can see it's usage in other modules in pyresample (I think) and Satpy. It basically allows you to refer to a fixture in parametrize by it's name as a string.

ghiggi · 2023-11-27T23:21:34Z

The PR is now ready for review @djhoese @mraspaud. I again updated the PR description with the last changes.
I finished to implement the interfaces and the code now it ensure the correct ordering of vertices for SwathDefinition and AreaDefinition, which in turn enable to correct creation of SphPolygon and shapely Polygon.
I also added code to retrieve boundary of AreaDefinition projections with infinite values at the edges because of out-of-Earth locations (i.e. polar projections, global projections like Robinson, Mollweide, etc).

I checked against all cartopy projections and it works perfectly (except from a couple of projections because of a PROJ bug).
If we force the computation of such boundaries (requires reading the entire coordinates array in memory), some tests of slicer and gradient fail, because we did not yet implement the logic to deal with such projections. To this end I introduced the force_boundary_computations=False config to make the current workflow running ... and enable to adapt the pipeline in future separate PRs.

I now wait for your feedbacks before implementing the last test units. I also plan to still add a commit to enable to pass a tuple of vertices_per_side.

djhoese

Thanks again for putting in all this work. However, this is WAY too big of a pull request. The last I saw of this was extracting some boundary logic and reworking some new boundary classes. Now you've added a visualization subpackage and deprecated everything boundary or "sides" related? Very very scary and a lot of work for one PR.

I made a lot of comments to hopefully guide our discussion of this in the future. I skipped over all the tests except your one question about fixtures in test_area.py. I also ignored a lot of the boundary sides changes in geometry.py as it looked very complex and not necessarily complete or completely cleaned up.

correct ordering of vertices

What does "correct" mean? If the word "correct" is mentioned in docstrings anywhere it should be explained.

I also plan to still add a commit to enable to pass a tuple of vertices_per_side.

I wouldn't worry about it unless it is needed in this PR. Make an issue as a reminder if necessary.

We could use getattr to handle deprecations: https://peps.python.org/pep-0562/

I've mentioned it before and in multiple comments, but the more and more you change the more obvious it is that this work should be moved to the future geometry classes. At the same time, while I'm sure Martin and I are not giving you the rapid type of feedback you'd like, I'd really like more discussion before you spend hours or days working on a complete rewrite of some of this stuff. For example the test_area.py fixture issue, you mentioned you spent 3-4 hours on that. If it was the problem I mention in my comment that should have been a quick question and quick answer on slack. Maybe Martin and I don't have a good enough grasp on your vision for this stuff or on the existing boundary logic to give you the discussion you need or expect, but a PR this size (the second or third of yours like this) needs to consist of more discussion and less large scale rewrites and deprecations.

Some of your more recent changes seemed to stem from trying to get the tests to pass, but besides your own TDD tests that may be leading your development, I'm not sure this is needed until we've all agreed on the design/interfaces/layout. Otherwise you risk wasting your time rewriting these tests multiple times.

My overall request for changes on this are:

Move changes to future geometry classes where every possible. Utilize AreaDefinition.__getattr__ if needed to prevent access to the deprecated methods. In a future PR of mine I will remove the "LegacyAreaDefinition" subclass of the future AreaDefinition and only access legacy methods through __getattr__ and warnings/errors.
Combine some of this logic if at all possible. As mentioned elsewhere, the Boundary classes seem very close to be a single class with smart tricks for handling the special cases or delegating to pyproj's CRS classes for the information they seek. Similarly, I noticed some try/excepts or if/else in various places in this PR that "feel" like they should be an internal decision at a lower-level of the code or a decision at the higher level...some sort of abstraction. I see None used in a couple places where a more concise and intentional solution seems possible (or at least better defined and documented).

djhoese · 2023-12-01T02:07:45Z

pyresample/__init__.py

+__all__ = [
+    'grid', 'image', 'kd_tree', 'utils', 'plot', 'geo_filter', 'geometry', 'CHUNK_SIZE',
+    'load_area', 'create_area_def', 'get_area_def', 'parse_area_file', 'convert_def_to_yaml',
+    "_root_path",


Why is this listed in __all__. The all variable is meant to deal with from pyresample import *. No need for _root_path to be there.

My bad. I didn't know the purpose of __all__ was for that :D

On this topic, from pyresample import *is not recommended in python this a while, should we just remove __all__?

djhoese · 2023-12-01T02:11:03Z

pyresample/__init__.py

@@ -56,8 +56,13 @@

 from .version import get_versions  # noqa

-__all__ = ['grid', 'image', 'kd_tree', 'utils', 'plot', 'geo_filter', 'geometry', 'CHUNK_SIZE',
-           'load_area', 'create_area_def', 'get_area_def', 'parse_area_file', 'convert_def_to_yaml']
+_root_path = os.path.dirname(os.path.realpath(__file__))


from pathlib import Path pkg_root_path = Path(__file__).resolve().parent

And is this used outside of the test modules? If not, I'd prefer this go in pyresample/test/__init__.py or maybe in conftest.py but I think it'd have to be a fixture to be useful there and that probably isn't worth it. You'd have to do the right amount of .parent.parent to get the correct directory.

djhoese · 2023-12-01T02:11:45Z

pyresample/_config.py

@@ -40,6 +40,7 @@
        "features": {
            "future_geometries": False,
        },
+        "force_boundary_computations": False,


If we keep this it will need documentation, but that can be the last thing in this PR.

djhoese · 2023-12-01T02:12:39Z

pyresample/bilinear/_base.py

+    except Exception:
+        valid_indices = np.ones(source_lons.size, dtype=bool)


I'm guessing this was done to make tests pass? What's going on here?

No @djhoese. This try-except is used to deal with out of Earth disk projections !!!
Before this PR, the boundary was created using get_boundary_lonlats (see

pyresample/pyresample/geometry.py

Line 276 in 294e6ea

def get_boundary_lonlats(self):

). The method was just naively taking the image border lon/latcoordinates.
So for the geostationary full disk and other projections with out-of-earth disk at the border, the boundary sides were all Inf. And inside the .get_valid_index_from_lonlat_boundaries if there was any invalid side coordinate, it was returning an array of ones (no data reduction done !). This means that for full disc GEO projections we were not doing data reduction !!!

In this PR, I replaced get_boundary_lonlats with target_geo_def.boundary().sides. So:

if the area is geostationary, now we finally reduce data (and this should speed up stuffs !!!)

if the area boundary has out-of-Earth locations and the flag force_boundary_computations is True, the boundary method will retrieve the actual boundary coordinate inside the area, and we will reduce data ...

if the area boundary has out-of-Earth locations and the flag force_boundary_computations is False (to keep the old behaviour), the boundary method now fails because if all sides are Inf it will raise a ValueError within _filter_bbox_nans. If this happen, we return what we returned before: the np.ones array

pyresample/boundary/sides.py

djhoese · 2023-12-01T02:37:02Z

pyresample/geometry.py

+        warnings.warn("'get_boundary_lonlats' is deprecated. Please use "
+                      "'area.boundary().sides'.", DeprecationWarning, stacklevel=2)


This type of deprecation and all others like it seem like something that should be done in a future version of the geometry. This is perhaps too large of a change to request users make while still calling it a 1.x release...or is this method relatively new?

djhoese · 2023-12-01T02:39:27Z

pyresample/geometry.py

+                                                              coordinates="geographic")
+            # Polar Projections, Global Planar Projections (Mollweide, Robinson)
+            # - Retrieve dummy right and left sides
+            if config.get("force_boundary_computations", False):


What does this mean? What is being forced? How does this result differ from when it isn't forced/computed?

This config regulates the behaviour when an AreaDefinition has boundaries on out-of-Earth regions.

force_boundary_computations == False keep the old behaviour of not searching for the actual internal boundary cooordinates, but just select the coordinates at the image border.
For global projections, (i.e. Mollweide, Robinson, ...), polar projections, ... :

if all image sides are out of Earth, the sides would only have Inf values and a ValueError is raised further downstream if all values are Inf.

If some of the image sides are out of Earth, some sides values would be Inf, such Inf value are discarded further downstream in the processing. If not all values are Inf, a "partial boundary" is returned, however it does not represent the true Earth boundary of the area.

With force_boundary_computations == True, we actually search the actual internal coordinates of the boundary.
To do so, we need to compute in memory one the coordinates array , from here the name force_*_computations.

See my other comment answering to the reason of the try-except logic to understand the rationale of this choice.

Maybe we'll keep this discussion on this thread instead of the 3 other comments I made related to the same topic.

Is another way of seeing this as "produce the best possible boundary possible or do the quick/shortcut method"? Where the quick/shortcut is the old way, right? Sorry if I'm getting confused, but this only applies to generating the spherical/lonlat boundaries right? The x/y for AreaDefinitions should always be valid in terms of the x/y coordinates, right? Should we include swaths in this current discussion or wait to confuse me later?

Can we use information from the PROJ/pyproj CRS class (like axis bounds) if they exist to do an even faster method of this where maybe we clip the lon/lats to the defined limits of the CRS? They don't always exist...hhmm but the result is always Inf isn't it? Not -Inf and Inf...kind of hard to clip then. I could also see us hardcoding some specializations of this for specific projections like we do for geostationary. That would make things a little smarter and faster wouldn't it?

Perhaps a better name for this would be something related to doing the "extra" work to get the best answer rather than phrasing it as "forcing" something. If we can speed up the common case maybe this could default to "on"/True?

Should we include swaths in this current discussion

SwathDefinition are not impacted by force_boundary_computations flag.

force_boundary_computations only impacts AreaDefinition where Earth coordinates only lies inside an internal region of the coordinates array (i.e. geostationary ... but for geostationary we have a smart ad-hoc trick to retrieve the actual internal Earth boundary coordinates without loading the coordinates array into memory).

I looked deeply into PROJ/pyproj CRS and I didn't find another way to determine the actual valid internal coordinate boundary without the trick I implemented in the PR.
Theoretically one could design ad-hoc methods for each CRS to infer the boundary as we did for geostationary (angle + satellite height) ... but it's a huge, tedious, work.

This only applies only applies to generating the spherical/lonlat boundaries right?

force_boundary_computations=True is required to perform correct spherical polygon operations

force_boundary_computations=True will likely fix issues in get_area_slices (because of previous bad polygon intersections)

force_boundary_computations=True speeds up some pykdtree operations when reduce_data=True. Currently resampling geostationary full disc (i.e. with nearest) using reduce_data=True does not reduce the data (because the boundary returned is all Inf ...)

The x/y for AreaDefinitions should always be valid in terms of the x/y coordinates, right ?

force_boundary_computations=True is also used in the area.projection_boundary method and impacts the gradient resampling, where data reduction is (assumed to be done) in planar projection coordinates. area.projection_boundary return the x/y coordinates corresponding to Earth valid coordinates, and so enable the transform of i.e. the actual internal GEO boundary to another CRS --> reduce_data=True with gradient was failing for GEO FD and this PR we might have solved the issue (not tested ... I didn't want to also refactor gradient logic :P)

If we can speed up the common case maybe this could default to "on"/True

I think that the boundary and projection_boundary methods should return the correct boundaries.
This will also benefit get_area_slices in some cases (remapping to polar projections was causing issues ...)

Then the decision of what should be done in pykdtree and gradient it's a choice to be based on the following points:

Is it fine for some special AreaDefinition projections to load the entire coordinates in memory? We go a bit away from the idea of only load the required data with dask ... but on the other side in such special cases ... we were later on loading all coordinates anyway to discover which was valid and which not ...

In pykdtree, we will just gain in performance and I don't expect any surprise

In gradient, there is somewhat lot of assumptions in how data reduction is done and it might be there will be some surprise with some projections, but on the other hand we move forward to fix several issues ...

Theoretically one could design ad-hoc methods for each CRS to infer the boundary as we did for geostationary (angle + satellite height) ... but it's a huge, tedious, work.

Yeah...I figured. Shoot.

force_boundary_computations=True speeds up some pykdtree operations when reduce_data=True. Currently resampling geostationary full disc (i.e. with nearest) using reduce_data=True does not reduce the data (because the boundary returned is all Inf ...)

With reduce_data=True, this is still get_area_slices functionality, right (so the same as your previous bullet point)? I thought we had the custom geostationary boundary logic so what about it has Inf coordinates? Or are the x/y of the boundary resulting in Inf when converted to lon/lats?

Then the decision of what should be done in pykdtree and gradient it's a choice to be based on the following points:

Hhhmm this is difficult. I haven't looked at your full implementation but I'm wondering if in the dask-case (and numpy I guess), for the default, we could be smart about choosing a limited set of chunks to compute completely. Maybe it wouldn't be as perfect as the full lookup but could be really close.

djhoese · 2023-12-01T02:41:09Z

pyresample/geometry.py

+    ll_x, ll_y, ur_x, ur_y = area_extent
+    bottom = [(x, ll_y) for x in np.linspace(ll_x, ur_x, nb_points + 2)]
+    right = [(ur_x, y) for y in np.linspace(ll_y, ur_y, nb_points + 2)][1:]
+    top = [(x, ur_y) for x in np.linspace(ur_x, ll_x, nb_points + 2)][1:]
+    left = [(ll_x, y) for y in np.linspace(ur_y, ll_y, nb_points + 2)][1:-1]


This seems like something numpy can do in a vectorized way (numpy array operations only)...not the most important part of this PR by far.

djhoese · 2023-12-01T02:45:01Z

pyresample/kd_tree.py

+            # - If invalid sides, return np.ones
+            try:
+                sides_lons, sides_lats = target_geo_def.boundary().sides
+                # Combine reduced and legal values
+                valid_input_index &= \
+                    data_reduce.get_valid_index_from_lonlat_boundaries(
+                        sides_lons,
+                        sides_lats,
+                        source_lons, source_lats,
+                        radius_of_influence)
+            except Exception:
+                valid_input_index = np.ones(source_lons.size, dtype=bool)


How did this work before? This feels like this should happen somewhere else and the except Exception needs to be figured out.

See my comment for the other try-except ;)

djhoese · 2023-12-01T02:49:05Z

pyresample/test/test_geometry/test_area.py

+    """Create an AreaDefinition object for testing."""
+    area = AreaDefinition(crs=crs, shape=shape, area_extent=area_extent, **kwargs)
+    return area
+


Fixtures can't be used inside parametrize. You need to use the third-party lazy_fixtures. You can see it's usage in other modules in pyresample (I think) and Satpy. It basically allows you to refer to a fixture in parametrize by it's name as a string.

mraspaud

Thanks for the extensive work!
Unfortunately, I think this PR is too big as it is and need to be split into multiple PR, if just for the reviewers sanity. My suggestion would we to have:

One PR for TEST_FILES_PATH
One PR for replacing frequency with vertices_per_side
One PR for refactoring the visualisation part
One PR for renaming AreaDefBoundary and SwathDefBoundary to PlanarBoundary and SphericalBoundary (and adding corresponding deprecations)
One PR for fixing the gradient resampler
One PR for fixiing the kd_tree resampler
One PR for refactoring the boundary code and adding the new functionality
One PR for fixing the data reducing
etc...

Also, a lot of the new code does not seem to be tested. This makes it very difficult to understand the use cases. Tests are not only for testing, but also for illustrating the use cases of the code, and that helps me as reviewer understand the code and the intention.

I cannot accept this PR for now, as it is not possible for me to understand all cases this covers and all the implication within a reasonable amount of time.
I will however gladly review smaller, incremental PRs to introduce all the work you have done here.

mraspaud · 2023-12-07T12:56:42Z

pyresample/test/test_geometry/test_swath.py

+        expected_lons = [
+            -90.67900085, 79.11000061,  # 81.26400757,
+            81.26400757, 29.67200089,   # 10.26000023,
+            10.26000023, -5.10700035,  # -21.52500153,
+            -21.52500153, -21.56500053,  # -90.67900085,
+        ]
+        expected_lats = [
+            85.23900604, 80.84000397,  # 67.07600403,
+            67.07600403, 54.14700317,  # 30.54700089,
+            30.54700089, 34.0850029,  # 35.58000183,
+            35.58000183, 62.25600433,  # 85.23900604,
+        ]
+        np.testing.assert_allclose(lons, expected_lons)
+        np.testing.assert_allclose(lats, expected_lats)


Why are one third of the values commented out?

mraspaud · 2023-12-07T12:57:13Z

pyresample/test/test_geometry/test_swath.py

+        expected_lons = [
+            -45., -90.,  # -135.,
+            -135., -180.,  # 135.,
+            135., 90.,  # 45.,
+            45., 0.,  # -45.
+        ]
+        expected_lats = [
+            80., 80.,  # 80.,
+            80., 80.,  # 80.,
+            80., 80.,  # 80.,
+            80., 80.,  # 80.
+        ]


What are the comment values for?

mraspaud · 2023-12-07T12:58:57Z

pyresample/test/test_gradient.py

-        # Swath defs raise AttributeError, and False is returned
-        get_polygon.side_effect = AttributeError
-        self.resampler._get_dst_poly('idx2', 0, 10, 0, 10)
-        assert self.resampler.dst_polys['idx2'] is False


Why has this part of the test been removed? do we loose functionality with this PR?

mraspaud · 2023-12-07T13:00:48Z

pyresample/test/test_gradient.py

    """Test polygon creation."""
-    from pyresample.gradient import get_polygon
+    from pyresample.gradient import _get_polygon


Why is this a private (_) function if it needs to be imported?

mraspaud · 2023-12-07T13:02:20Z

pyresample/test/test_kd_tree.py

+TEST_FILES_PATH = os.path.join(_root_path, "test", 'test_files')
+


This is also defined in at least one other file, it should be factorised.

mraspaud · 2023-12-07T13:23:51Z

pyresample/boundary/base_boundary.py

+
+class BaseBoundary:
+    """Base class for boundary objects."""
+    __slots__ = ["_sides_x", "_sides_y"]


Can you motivate the use of __slots__ here? I don't see any real benefit in this class.

mraspaud · 2023-12-07T13:24:47Z

pyresample/boundary/base_boundary.py

+    def _compute_boundary_sides(self, area, vertices_per_side):
+        """Compute boundary sides."""
+        raise NotImplementedError()


This looks like the base class should be abstract, right?

mraspaud · 2023-12-07T13:31:15Z

pyresample/boundary/planar_boundary.py

+    return not polygon.exterior.is_ccw
+
+
+class PlanarBoundary(BaseBoundary):


This class is not sufficiently tested.

mraspaud · 2023-12-07T13:34:23Z

pyresample/boundary/planar_boundary.py

+    @classmethod
+    def _compute_boundary_sides(cls, area, vertices_per_side):
+        sides_x, sides_y = area._get_projection_sides(vertices_per_side=vertices_per_side)
+        return sides_x, sides_y


this should be a static method.

mraspaud · 2023-12-07T13:35:13Z

pyresample/boundary/planar_boundary.py

+    def __init__(self, area, vertices_per_side=None):
+        super().__init__(area=area, vertices_per_side=vertices_per_side)
+
+        self.sides_x = self._sides_x
+        self.sides_y = self._sides_y
+        self.crs = self._area.crs
+        self.cartopy_crs = self._area.to_cartopy_crs()


This should be at the top of the class

ghiggi added 3 commits October 10, 2023 23:05

Add refactor

e203899

Fix wrong import

95decd3

Lint

ae730f7

ghiggi mentioned this pull request Oct 10, 2023

Convert AreaDefinitions to odc geoboxes #545

Merged

4 tasks

ghiggi marked this pull request as draft October 10, 2023 21:42

ghiggi added 6 commits October 11, 2023 11:05

Fix typo

edcfcb4

Fix issues

eae1873

Fix issues

1662d2d

Fix issues

655fb72

Improve doc

b7df9be

Improve doc

a5322a6

ghiggi commented Oct 11, 2023

View reviewed changes

pyresample/geometry.py Outdated Show resolved Hide resolved

Remove old use of frequency argument

88065d7

ghiggi marked this pull request as ready for review October 11, 2023 11:18

ghiggi commented Oct 11, 2023

View reviewed changes

pyresample/geometry.py Outdated Show resolved Hide resolved

djhoese added enhancement backwards-incompatibility Causes backwards incompatibility or introduces a deprecation labels Oct 18, 2023

djhoese self-assigned this Oct 18, 2023

djhoese reviewed Oct 18, 2023

View reviewed changes

pyresample/geometry.py Outdated Show resolved Hide resolved

djhoese reviewed Oct 18, 2023

View reviewed changes

mraspaud reviewed Oct 19, 2023

View reviewed changes

pyresample/geometry.py Outdated Show resolved Hide resolved

Merge branch 'main' into refactor-get_bbox-lonlat

0e992a2

# Conflicts: # pyresample/geometry.py

Address refactor boundary sides

8eff9b4

ghiggi added 6 commits November 23, 2023 23:54

Add test units for visualizion utilities

b9cda35

Consistent naming across repo of lon/lat sides with sides_lons and si…

d41973a

…des_lats

Add geographic_boundary and projection_boundary methods

c6a9378

Deprecate boundary() for geographic_boundary()

6323814

Pass AreaDef crs to ProjectionBoundary crs

6210bea

Solve wasted polygon computations in gradient for GEO FD

28b5527

ghiggi added 3 commits November 24, 2023 17:37

Deprecate get_boundary_lonlats and SimpleBoundary

0eb7a6b

Remove use of __file__ and relative paths in test units

0f1b578

Improve clarity of sides extraction for geographic and projection coo…

059d496

…rdinates

Refactor Geographic and Projection Boundary using composition and dep…

d35f1b5

…endency injections

ghiggi added 2 commits November 25, 2023 02:12

Deal with AreaDefinition with unvalid sides: polar projections, globa…

b3273b8

…l planar projections, ...

Add code structure for boundary extraction for polar and global plana…

e3f4b24

…r projections

ghiggi mentioned this pull request Nov 27, 2023

Wrong coordinates returned by AreaDefintion.get_lonlats for some projections in out-of-Earth locations #558

Open

ghiggi added 3 commits November 27, 2023 22:56

Fix boundary ordering logic for swath and projections !

915e34f

Fix formatting warnings

61240eb

Renaming to SphericalBoundary and PlanarBoundary

7069197

ghiggi commented Nov 27, 2023

View reviewed changes

Some cleanout and notes

e638557

ghiggi requested review from mraspaud and djhoese November 27, 2023 23:11

djhoese requested changes Dec 1, 2023

View reviewed changes

mraspaud requested changes Dec 7, 2023

View reviewed changes

		except Exception:
		valid_indices = np.ones(source_lons.size, dtype=bool)

		warnings.warn("'get_boundary_lonlats' is deprecated. Please use "
		"'area.boundary().sides'.", DeprecationWarning, stacklevel=2)

		TEST_FILES_PATH = os.path.join(_root_path, "test", 'test_files')

		return not polygon.exterior.is_ccw


		class PlanarBoundary(BaseBoundary):

Refactor for boundary lat/lon extraction #546

Are you sure you want to change the base?

Refactor for boundary lat/lon extraction #546

Conversation

ghiggi commented Oct 10, 2023 • edited

ghiggi commented Oct 11, 2023

djhoese left a comment

Choose a reason for hiding this comment

ghiggi commented Oct 19, 2023 • edited

djhoese commented Oct 19, 2023

Conclusion

djhoese commented Oct 19, 2023

ghiggi commented Oct 19, 2023

djhoese commented Oct 19, 2023

djhoese commented Nov 22, 2023

djhoese commented Nov 24, 2023

ghiggi commented Nov 24, 2023 • edited

djhoese commented Nov 24, 2023

ghiggi commented Nov 24, 2023 • edited

djhoese commented Nov 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghiggi commented Nov 27, 2023 • edited

djhoese left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mraspaud left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghiggi commented Oct 10, 2023 •

edited

ghiggi commented Oct 19, 2023 •

edited

ghiggi commented Nov 24, 2023 •

edited

ghiggi commented Nov 24, 2023 •

edited

ghiggi commented Nov 27, 2023 •

edited