Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add geo-traits crate #1157

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kylebarron
Copy link
Collaborator

@kylebarron kylebarron commented Feb 26, 2024

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

Updated version of #1019 that's clean against the latest main. Aside from having no extra diff from main, it also has:

Geometry::MultiPolygon(p) => GeometryType::MultiPolygon(p),
Geometry::GeometryCollection(p) => GeometryType::GeometryCollection(p),
Geometry::Rect(p) => GeometryType::Rect(p),
_ => todo!(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bloop

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because I didn't add Line and Triangle traits yet, but we can do that before merging if you'd like

geo-traits/Cargo.toml Outdated Show resolved Hide resolved
Copy link
Member

@frewsxcv frewsxcv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for driving this initiative @kylebarron! I'm excited to use these traits in my own projects.

This pull request looks mergeable to me! With the understanding it is not yet ready to be published. I want to hear from @michaelkirk and @urschrei before merging though.

@kylebarron
Copy link
Collaborator Author

kylebarron commented Feb 27, 2024

In the medium term (in 2024) I would love to add 3d support to geoarrow, so I want to keep thinking about xyz geometry traits, but I'm pretty happy with these for 2d geometries. They've worked well so far in geoarrow.

@michaelkirk
Copy link
Member

Thanks for keeping this up to date WRT to main @kylebarron!

I said in #1011 (and still believe) that having some examples of algorithms that use these trait definitions would be key to evaluating them.

Are there examples anywhere that utilize these traits? @frewsxcv - have you tried, or would you be willing to see how this code works in your project?

@frewsxcv
Copy link
Member

Thanks for keeping this up to date WRT to main @kylebarron!

I said in #1011 (and still believe) that having some examples of algorithms that use these trait definitions would be key to evaluating them.

Are there examples anywhere that utilize these traits? @frewsxcv - have you tried, or would you be willing to see how this code works in your project?

Are the usages in geoarrow-rs sufficient? Or are you looking for something else? Also would love to hear why you find this a blocker to merging these in to the repository.

@frewsxcv
Copy link
Member

In rgis I would like to be able to read geospatial data and perform operations on the underlying data without having to convert through intermediary representations like geo-types. geo-traits with geozero would make this a lot simpler.

@kylebarron
Copy link
Collaborator Author

kylebarron commented Feb 27, 2024

Examples of how they're used today in geoarrow-rs:

@JosiahParry has also implemented these traits in serde_esri, and I believe he's tested against geoarrow.

@michaelkirk
Copy link
Member

Thanks for the examples @kylebarron! I'll take a look.

And relevant to @frewsxcv question:

Also would love to hear why you find this a blocker to merging these in to the repository.

I continue to think it's reasonable that code in main is intended to be useful, and that seeing code actually used in pursuit of the problem it purports to solve seems like a reasonable bar. I look forward to looking at @kylebarron's examples to that end.

geo-traits with geozero would make this a lot simpler.

I understand that's the theoretical goal, and a worthy one. I'm asking to see if this actually works towards that goal. Care to give it a shot? Is there something that keeps you from building against this branch, for example?

@michaelkirk
Copy link
Member

@JosiahParry has also implemented these traits in serde_esri, and I believe he's tested against geoarrow.

@JosiahParry - can you elaborate on what these traits enable you to do? Presumably it's useful to you or you wouldn't have done it 😉

@frewsxcv
Copy link
Member

Nope! There is nothing blocking me from building on-top off a branch.

The point of merging the code into main is to make it easier to iterate on, as I find working with remote git branches to be unnecessarily tedious and hard to reason about when I'm using other geo dependencies in my projects. Another point is to prevent staleness that happened often with the previous separate branch approach, which is a concern I had from the start.

@frewsxcv
Copy link
Member

@michaelkirk Correct me if this is wrong, but it sounds like you are skeptical of the usefulness of these traits without actually saying that. Am I mind reading correctly? I feel like there's an unusual and surprising amount of resistance to merging this in.

@michaelkirk
Copy link
Member

Correct me if this is wrong, but it sounds like you are skeptical of the usefulness of these traits without actually saying that. Am I mind reading correctly? I feel like there's an unusual and surprising amount of resistance to merging this in.

I feel a bit attacked! But it's true - I am skeptical of the traits code. Truly though, I am skeptical of all code. I am admittedly bit extra skeptical of the geo-traits code in that it's not just a single isolated algorithm, it's a cross cutting concern. I feel like I've said similar things before, but maybe not clearly enough. I appreciate @kylebarron patience with me.

An implementation of BoundingRect in terms of traits.

This implementation looks good to me! You're using the trait accessors to implement a useful algorithm and it doesn't seem overly ceremonious compared to the implementation on geo-types.

FrechetDistanceLineString / Area

Ugh, these ones on the other hand, seem a bit unfortunate. Is there a way to make these look more like the BoundingRect one? Or is it reasonable to assume that most algorithms implemented on geo-traits will need to have a separate trait definition (DistanceLineString, DistancePolygon, etc.) for each geometry type?

I haven't yet grokked why these implementations can't look more like the BoundingRect one - something about having a dedicated struct for the algorithm (i.e. struct BoundingRect) maybe? Can we do this with our other algorithms?

Also, am I reading it correctly that the "trait based" implementation for FrechetDistance is simply converting to a geo-types and then calling FrechetDistance on that? Like we're not actually implementing the algorithm using the trait here in any meaningful way - or am I misunderstanding? Is there something preventing a geo-types free implementation?


All the conversion API's seem reasonable at first blush, but can you connect the dots for me a little bit?

let line_string_1_wkb: &[u8] = "..."
let line_string_bbox = ???

let line_string_2_wkb: &[u8] = "..."
let frechet_distance = ???

@kylebarron
Copy link
Collaborator Author

Or is it reasonable to assume that most algorithms implemented on geo-traits will need to have a separate trait definition (DistanceLineString, DistancePolygon, etc.) for each geometry type?

All algorithm traits from geo need a separate implementation in geoarrow because geo's traits always return scalar objects while geoarrow's traits need to return arrays or chunked arrays.

I would love to have only one algorithm trait, e.g. just Distance but as mentioned in #1113 I don't think it's possible to implement blanket traits for multiple traits without specialization. I think impl<G: LineStringTrait> Distance for G {} and impl<G: PolygonTrait> Distance for G {} will always fail without specialization? So my first thought was to separate the traits to be able to implement impl<G: PolygonTrait> DistancePolygon for G {}. But I'd love for there to be a better way 🙂.

Also, am I reading it correctly that the "trait based" implementation for FrechetDistance is simply converting to a geo-types and then calling FrechetDistance on that? Like we're not actually implementing the algorithm using the trait here in any meaningful way - or am I misunderstanding? Is there something preventing a geo-types free implementation?

In BoundingRect I'm reimplementing the core algorithm in a native way on the traits. I do this for BoundingRect because getting bboxes is common enough that removing overhead is good and it's quite easy to implement. In FrechetDistanceLineString I don't want to touch the core algorithm implementation. In general, geoarrow's role is to provide the glue between high-level APIs in JS and Python to core algorithms implemented in other crates, such as geo.

The copy to a geo object is short-term glue that allows the trait to be called either on geoarrow scalars (e.g. a positional reference in an array) or geo scalar objects. In the long run, the underlying FrechetDistance trait in geo could be implemented in terms of LineStringTrait and the geoarrow glue wouldn't need to copy the input into geo::LineString.


All the conversion API's seem reasonable at first blush, but can you connect the dots for me a little bit?

let line_string_1_wkb: &[u8] = "..."
let line_string_bbox = ???

let line_string_2_wkb: &[u8] = "..."
let frechet_distance = ???

I added a test for the first one in this example PR: geoarrow/geoarrow-rs#542

// A builder for a columnar WKB arrays
let mut wkb_builder = WKBBuilder::<i32>::new();
// Add a geo polygon to the WKB array
// This uses geo-traits to serialize to WKB and adds the binary to the array
wkb_builder.push_polygon(Some(&p0()));

// Finish the builder, creating an array of logical length 1.
let wkb_arr = wkb_builder.finish();

// Access the WKB scalar at position 0
// This is a reference onto the array. At this point the WKB is just a "blob" with no other
// information.
let wkb_scalar = wkb_arr.value(0);

// This is a "parsed" WKB object. The [WKBGeometry] type is an enum over each geometry
// type. WKBGeometry itself implements GeometryTrait but we need to unpack this to a
// WKBPolygon to access the object that has the PolygonTrait impl
let wkb_object = wkb_scalar.to_wkb_object();

// This is a WKBPolygon. It's already been scanned to know where each ring starts and ends,
// so it's O(1) from this point to access any single coordinate.
let wkb_polygon = match wkb_object {
    WKBGeometry::Polygon(wkb_polygon) => wkb_polygon,
    _ => unreachable!(),
};

// Add this wkb object directly into the BoundingRect
let mut bounding_rect = BoundingRect::new();
bounding_rect.add_polygon(&wkb_polygon);

assert_eq!(bounding_rect.minx, -111.);
assert_eq!(bounding_rect.miny, 41.);
assert_eq!(bounding_rect.maxx, -104.);
assert_eq!(bounding_rect.maxy, 45.);

Run with

cargo test --lib -- test2::test_wkb_to_bbox --nocapture

@kylebarron
Copy link
Collaborator Author

I haven't yet grokked why these implementations can't look more like the BoundingRect one - something about having a dedicated struct for the algorithm (i.e. struct BoundingRect) maybe? Can we do this with our other algorithms?

I think part of why these implementations look the way they do is because I want to have compile-time safety across geometry types. E.g. I don't want to have fn frechet_distance(array: GeometryArray) -> Float64Array because frechet distance is not implemented for every geometry type. In this way, I think I've adopted the same use of traits as geo itself has done. The trait impl describes what operations are allowed on different combinations of objects. But this idea, that the trait describes what is allowed, ends up significantly more complex when you want to handle broadcasting across scalars, arrays, and chunked arrays.

(The chunked array abstraction, where you might have 10 million geometries in 100 separate chunks where each of those chunks is a contiguous buffer holding 100,000 geometries, is quite useful and is currently my unit of "automatic parallelism". I.e. a rayon par_map is called over each chunk.)

But the largest area of complexity in geoarrow-rs is handling these combinations. I think it's useful for UX to be able to call an operation like Distance against either a geoarrow scalar or a geo scalar, and thus it's necessary either to copy my impls twice, which is a lot of duplicated code, or implement them against a shared blanket trait.

@JosiahParry
Copy link
Contributor

JosiahParry commented Feb 29, 2024

I would quite like this! I think it would be really neat if the geo-type traits also had a new() method that would allow you to write algorithms entirely with trait methods.

Take this super derived example (i adjusted @kylebarron's geo-traits/src/line_string.rs file) https://gist.github.com/JosiahParry/f3469f8df4706a4d36f8a47db9ae3e09)

  trait StartEnd where Self: LineStringTrait {
      fn start_end(&self) -> Self::Owned {
          let start = self.coord(0).unwrap();
          let end = self.coord(self.num_coords() - 1).unwrap();
          Self::new(vec![start, end])
      }
  }

impl<T: CoordNum> StartEnd for LineString<T> {}

 #[test]
    fn test_start_end() {
        let coords = vec![
            Coord {x: 1.0, y: 2.0},
            Coord {x: 3.0, y: 4.0},
            Coord {x: 5.0, y: 6.0},
        ];

        let line_string = LineString::new(coords);
        let start_end = line_string.start_end();

        println!("{:?}", start_end);
        assert_eq!(start_end.num_coords(), 2);
    }
running 1 test
LineString([Coord { x: 1.0, y: 2.0 }, Coord { x: 5.0, y: 6.0 }])
test line_string::tests::test_start_end ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.00s

@kylebarron
Copy link
Collaborator Author

kylebarron commented Feb 29, 2024

I think that trait definition has the same downsides as mentioned elsewhere, where you'll need a different trait definition for each geometry type. (You'd need a StartEndPolygon that's implemented on PolygonTrait).

I also think that it's just extra unnecessary complexity, because I argue that the implementation of the algorithm should choose the most appropriate return type, as long as the geometry trait is also implemented on that return type. That means, for any geo-implemented algorithms, the internal, temporary data objects will always be geo objects, and so the lowest-cost return object will always be a geo object. Instead of always incurring the cost to convert back to the original trait representation, that conversion should be deferred to the called for when it's optimal for them. For many libraries, calling convert_to_my_repr(line_string.start_end()) where convert_to_my_repr is defined as

fn convert_to_my_repr(geom: &LineStringTrait) -> MyLineStringStruct {}

But for other libraries, especially in the context of libraries like geoarrow where you want to optimize the vectorized execution and not the scalar execution, it's much better to defer the data transformation until you have an array of geometries.

@busstoptaktik
Copy link

busstoptaktik commented Mar 19, 2024

I need something like this for Geodesy, but obviously only need the CoordTrait there, so my comments may be a bit narrow-sighted.

First, given your comments, @kylebarron, about wanting to extend towards 3D, I find the name of the x_y method unfortunate, and would prefer xy.

Both because for my TeX-eyes, x_y looks like x with subscript y, and because with 3D, 4D and/or Measure extensions, it will be too much of an eyecatcher to spot x_y_z_t_m (and its truncated siblings) in my source code. xy, xyz, xyzt, etc. would suffice.

Second, I find it reasonable to extract the CoordTrait in a separate crate, since coordinates and features (in the OGC Simple Features sense) are two very different things. Then the geo-traits crate could start by implementing CoordTrait for the Point feature, and build the world from there.

Also, if I understand correctly, most of the reservations expressed above would not apply to CoordTrait in itself, so it could probably be merged separately. Which (full disclosure) would be extremely useful for me, especially if somewhat extended.

Third, as a general remark of the Georust naming conventions x and y are rather unfortunate names for something that may most likely be longitudes or eastings in the former, latitudes or northings in the latter case.

Geodetically speaking, coordinates are ordered, and can be referred to as 1st, 2nd, 3rd and 4th, and as you never know what is the CRS specific convention regarding (N,E) vs (E,N), any kind of implied (mis)understanding is unfortunate. I suggest to let the trait implementer provide x(), y(), z(), t(), and let the trait autoprovide first(), second(), third(), fourth() (or preferably the other way round), and then just postulate that the naming is an internal convention, not to be confused with any external conventions.

Evidently, trying to extend this to the entire georust ecosystem would neither be valuable or reasonable - wherever the culture descends from an "earth-is-flat-as-my-monitor" computer graphics world, x and y are obviously the more useful convention. But in a CoordTrait it would make much sense to support a more geodetic world view - it wouldn't harm, and it would not make implementation harder or more verbose: It's just a few autoprovided aliases in the trait definition.

EDIT 2: On second thoughts, I actually think the Third item can be ignored: It doesn't matter how the "this is the internal convention" is communicated, and first()...fourth() is probably a unnecessary complication, that impedes communication.

Fourth, I have implemented an extended version of CoordTrait over at Geodesy, confined to the Coordinate module (with impls here), and the Geodesics functionality of the Ellipsoid implementation.

As you can see, I took the liberty to extend to 4D+M, but probably not in a very elegant way. Also, I need the xy_as_f64 etc. functionality (cf. geodesic_fwd) and geodesic_inv), but have probably implemented it in an idiotic, rather than idiomatic way, so any comments and/or improvements would be much appreciated.

I would find it awesome to be able to implement a tighter integration between Geodesy and the Georust universe. And a somewhat geodesy-aware CoordTrait would be a straightforward and sensible way of doing it.

I know, however, that I am stomping on your grass, but please consider my suggestions - and I would be very happy for any comments regarding how I am wrong, and how I could work further towards a Georust<->Geodesy bridge.

@kylebarron
Copy link
Collaborator Author

I'm interested in multi-dimensional geometry support, and think that handling it in traits could be a good way to incrementally add support. But I think it would probably be ideal if the dimension could be a generic, so that you'd statically know what type of coordinate you have, and which type of operations could be done on it.

I'm happy to switch x_y to xy.

Then the geo-traits crate could start by implementing CoordTrait for the Point feature, and build the world from there.

geo has had some discussion about whether to merge Point and Coord, and so it's not 100% clear that we do want both PointTrait and CoordTrait (in this PR I implemented PointTrait on Coord and CoordTrait on Point for simplicity). In any case CoordTrait/PointTrait is intricately linked to the rest of the traits here. A LineStringTrait will yield objects that implement CoordTrait, a PolygonTrait will yield objects that implement LineStringTrait and so on. So it's not clear to me that stabilizing only CoordTrait is the best, if we have to modify that later to support geometries.

@busstoptaktik
Copy link

But I think it would probably be ideal if the dimension could be a generic, so that you'd statically know what type of coordinate you have, and which type of operations could be done on it.

That would require a good deal more genericity than just the dimension.

In PROJ, we check the pipelines upon construction, such that the output type of step n fits with the input type of step n+1, but that's static features of each operator, not of the coordinate (the coordinate types are all just untagged unions), so the coordinate is supposed to fit the pipeline its fed into, while the pipeline is checked for internal consistency. And the internal consistency is still just at a very general level (any kind of projected coordinate will be accepted between a step producing kind A and a step expecting kind B)

That's why I entirely dropped the "different coordinate types" in Rust Geodesy, and only support different dimensionalities: It just doesn't make sense - coordinates are interpreted by operators which expect a specific input type, and the gamut of possible types is infinite.

No reason to try to make the type system reject having me feeding a utm zone 32 coordinate into a pipeline expecting some other type of projected coordinate (or even geographical or geocentric cartesians): The output becomes garbage, but its consistent garbage :-) and there are just too many kinds of parametrizations, each of which would lead to principally different gamuts for valid operations.

So it's a tough job!

@busstoptaktik
Copy link

I'm interested in multi-dimensional geometry support, and think that handling it in traits could be a good way to incrementally add support. But I think it would probably be ideal if the dimension could be a generic

@kylebarron My current, and much improved version, takes off from a much less modified version of your original CoordTrait, but with support for up to 4 dimensions, and a M(easure). Is it something like this ↓ you intend? That would fit very well with my plans

pub trait CoordTrait {
    type T: CoordNum;
    const DIMENSION: usize;
    const MEASURE: bool;

    /// Accessors for the coordinate tuple components
    fn x(&self) -> Self::T;
    fn y(&self) -> Self::T;
    fn z(&self) -> Self::T;
    fn t(&self) -> Self::T;
    fn m(&self) -> Self::T;

    /// Returns a tuple that contains the two first components of the coord.
    fn x_y(&self) -> (Self::T, Self::T) {
        (self.x(), self.y())
    }
}

@kylebarron
Copy link
Collaborator Author

I had been thinking more along the lines of the proposal in this PR, where the struct is generic over a CoordNum that can be NoValue.

In particular, when DIMENSION is 2, what should z and t return? T::default()? I was thinking it would be lovely if the CoordTrait could somehow be parameterized so that z and t didn't exist when DIMENSION was 2.

Or maybe the better way of looking at this is that z() would return NoValue. That I suppose would be the best way of linking your proposal and that PR.

@busstoptaktik
Copy link

when DIMENSION is 2, what should z and t return?

Essentially, I believe it is up to the person implementing the trait for a concrete data type to decide: It's your trait, but their data!

In Geodesy, I use NaN for t and 0 for z, because that fits well with geodetical reasoning: If you have no height information, your coordinate is probably assumed to be placed directly on the ellipsoid, whereas if you have no time coordinate, but accidentally call a time dependent operation on the coordinate, you actually want the NaN value to spill out all over your coordinate, to indicate its invalidity ("stomping on it with the x-large NaN boots").

In a sense, NaN is the IEEE 754 equivalent of None, and we want to be reminded when doing something stupid.

@kylebarron
Copy link
Collaborator Author

when DIMENSION is 2, what should z and t return?

Essentially, I believe it is up to the person implementing the trait for a concrete data type to decide: It's your trait, but their data!

In general, I disagree because I think it's important to have a well-specified data contract for these traits. If geo or geodesy or any other crate implements algorithms based on these traits, they need to know what they'll receive.

In Geodesy, I use NaN for t and 0 for z, because that fits well with geodetical reasoning

In this case I think it would be better to have z, t, and m return Option<Self::T> and then in geodesy you could use coord.t().unwrap_or(T::NaN) and coord.z().unwrap_or(0). That way the default data values are informed by the consumer and not the producer.

@michaelkirk
Copy link
Member

whereas if you have no time coordinate, but accidentally call a time dependent operation on the coordinate, you actually want the NaN value to spill out all over your coordinate, to indicate its invalidity ("stomping on it with the x-large NaN boots").

I think the dream is that rather than NaN spilling out all over at runtime, you have the type system refuse to even compile the code, saying "hey dummy, you're trying to use a time-dependent algorithm with non-time coords".

Maybe in practice it's not always possible when you instantiate your coordinates to know a priori what dimensions they'll need to have, but it seems like it would be possible some of the time.

@kylebarron
Copy link
Collaborator Author

I think the dream is that rather than NaN spilling out all over at runtime, you have the type system refuse to even compile the code, saying "hey dummy, you're trying to use a time-dependent algorithm with non-time coords".

💯

Maybe in practice it's not always possible when you instantiate your coordinates to know a priori what dimensions they'll need to have, but it seems like it would be possible some of the time.

In geoarrow this is where dynamic downcasting comes in, and where there's both geometry-specific types like LineStringArray but also the trait object Arc<dyn GeometryArrayTrait>. If a user already has a LineStringArray, they'll see only traits implemented on LineStringArray available to them. If they have an &dyn GeometryArrayTrait, then they can either downcast it themselves to a LineStringArray and catch that error, or pass the &dyn GeometryArrayTrait into an algorithm directly, which will fail if the algorithm isn't implemented for the underlying type.

@busstoptaktik
Copy link

busstoptaktik commented Mar 22, 2024

@kylebarron said:

In general, I disagree because I think it's important to have a well-specified data contract for these traits. If geo or geodesy or any other crate implements algorithms based on these traits, they need to know what they'll receive.

I, on the contrary, stipulate, that it is the implementer of the trait for a concrete type, who knows how best to adapt the concrete type to the (well documented, but not necessarily perfectly obtainable) expectations of the trait

In Geodesy, I provide implementations of the CoordinateSet trait for the built-in coordinate types, Coor2D, Coor32, Coor3D, Coor4D, providing z=0 and t=NaN, but (as you know) for user supplied data types, this is entirely user selectable

In this case I think it would be better to have z, t, and m return Option<Self::T> and then in geodesy you could use coord.t().unwrap_or(T::NaN) and coord.z().unwrap_or(0). That way the default data values are informed by the consumer and not the producer.

In that case, I believe all should be optional. Partially for consistency, partially to support "only height" or "only M" systems.

My unfortunate knack for premature optimization is also concerned by the fact that making things optional will introduce branching right in the inner coordinate loop. In general, it will probably be of minor influence, though

@busstoptaktik
Copy link

busstoptaktik commented Mar 22, 2024

@michaelkirk said

I think the dream is that rather than NaN spilling out all over at runtime, you have the type system refuse to even compile the code, saying "hey dummy, you're trying to use a time-dependent algorithm with non-time coords".

Maybe in practice it's not always possible when you instantiate your coordinates to know a priori what dimensions they'll need to have, but it seems like it would be possible some of the time.

As I also indicated wrt. PROJ pipelines in a previous message, that will be virtually impossible: You can never know what run-time pipelines a user may enter into the system, so compile-time checks will at best be spotty.

Also note that in both PROJ and Geodesy, the data flow model is such that conversions are done in place, so the actual memory location holding the coordinate, will change coordinate type between the steps of a transformation pipeline. So the only thing you can reasonably say about a coordinate tuple is that it is a collection of 0 or more numbers. Their interpretation is entirely up to the associated CRS.

While geoinformatics (and hence Geo) is, broadly speaking, a branch of mathematics, where a CRS is interpreted as a mathematical platonic ideal, geodesy (and hence Geodesy, and PROJ) is a branch of geophysics, and a CRS is an entirely empirical contraption, encompassing all the noise and messy realities of the physical world.

@kylebarron
Copy link
Collaborator Author

My unfortunate knack for premature optimization is also concerned by the fact that making things optional will introduce branching right in the inner coordinate loop

I do worry about this, though I'd love to see some numbers/data on the ballpark overhead to expect here.

In general, I also wonder whether there exists a trait that matches up to the expectations of geo and geodesy. It seems to me that geo tries to follow the Simple Features spec as closely as possible, which I believe, for example, stipulates that x and y always exist. It seems that geodesy needs a fundamentally different model of coordinates?

I suggest to let the trait implementer provide x(), y(), z(), t(), and let the trait autoprovide first(), second(), third(), fourth() (or preferably the other way round), and then just postulate that the naming is an internal convention, not to be confused with any external conventions.

I've been coming around to this. Some people will want to use three-dimensional coords to mean XYM instead of XYZ, and maybe that's ok to allow in the trait? Maybe the trait should define structure more than intent?

I've been thinking more about additional dimensions in geoarrow-rs, because I want to potentially use geoarrow-rs as a dependency in some Python projects where 3D coordinates are allowed. I started outlining some thoughts but to support more dimensions I think a const generic for the structural size is the way to go. Then the interpretation of XYM vs XYZ and XYZM vs XYZT are a function of the associated metadata, not the array types.

@kylebarron
Copy link
Collaborator Author

In particular, thinking about something like

pub trait CoordTrait<const DIM: usize> {
    type T: CoordNum;

    /// x component of this coord
    fn x(&self) -> Self::T;

    /// y component of this coord
    fn y(&self) -> Self::T;

    fn third_unchecked(&self) -> Self::T;

    fn third_opt(&self) -> Option<Self::T> {
        if DIM >= 3 {
            Some(self.third_unchecked())
        } else {
            None
        }
    }

    fn fourth_unchecked(&self) -> Self::T;

    fn fourth_opt(&self) -> Option<Self::T> {
        if DIM >= 4 {
            Some(self.fourth_unchecked())
        } else {
            None
        }
    }

    /// Returns a tuple that contains the x/horizontal & y/vertical component of the coord.
    fn xy(&self) -> (Self::T, Self::T) {
        (self.x(), self.y())
    }
}

then e.g. a LineStringTrait would become

pub trait LineStringTrait<const DIM: usize>: Sized {
    type T: CoordNum;
    type ItemType<'a>: 'a + CoordTrait<DIM, T = Self::T>
    where
        Self: 'a;

The unchecked variants allow for no-overhead access. I'm not sure if there's any way to make that smarter. Maybe having a method like

    fn array(&self) -> [Self::T; DIM];

makes sense. (Though I don't know when the compiler will catch an out of bounds access on that array).

@frewsxcv
Copy link
Member

frewsxcv commented Apr 9, 2024

If we were to introduce a trait Coord, does that mean we could change our algorithms from being parameterized by the scalar type (i.e the type of X/Y, geo::GeoNum) to being parameterized by the trait Coord?

Something like:

trait Coord {
    type Scalar: GeoNum;

    fn x(&self) -> Self::Scalar;
    fn y(&self) -> Self::Scalar;
}

pub trait Area<C: Coord> {
    fn signed_area(&self) -> Coord::Scalar;
    fn unsigned_area(&self) -> Coord::Scalar;
}

@urschrei
Copy link
Member

urschrei commented Apr 9, 2024

Won't you have to be generic over the Scalar type? Not all traits will be able to use GeoNum, right?

@frewsxcv
Copy link
Member

frewsxcv commented Apr 9, 2024

Something like this: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=4ade74994f39a0e5fd995f2e237670e6

Won't you have to be generic over the Scalar type? Not all traits will be able to use GeoNum, right?

Does this code answer your question? The Coord trait is generic over the associated type Scalar, but the Scalar generic doesn't need to be specified in the algorithm trait implementation

@michaelkirk
Copy link
Member

michaelkirk commented Apr 9, 2024

If we were to introduce a trait Coord, does that mean we could change our algorithms from being parameterized by the scalar type (i.e the type of X/Y, geo::GeoNum) to being parameterized by the trait Coord?

I think so!

To me, the Coord trait seems the most imminently useful, if only to paper over our current split-brain struct Coord vs struct Point.

Is that the use case you were thinking of, or did you have some other idea where parameterizing over a trait Coord would be helpful?

@frewsxcv
Copy link
Member

frewsxcv commented Apr 13, 2024

Bikeshed time– should the one-dimensional trait unit be called geo_traits::Coord or geo_traits::Point? See also #1169. It looks like @kylebarron used Coord in this pull request (the one I'm commenting in now)

@kylebarron
Copy link
Collaborator Author

Bikeshed time– should the one-dimensional trait unit be called geo_traits::Coord or geo_traits::Point?

I have both coord and point traits at this point to mirror geo

@busstoptaktik
Copy link

For whatever reason, GitHub stopped reminding me of updates on this PR, so I have not been following the discussion the last month or more. But just to follow up on @kylebarron's observation that:

In general, I also wonder whether there exists a trait that matches up to the expectations of [both] geo and geodesy.

You are probably right about that @kylebarron, and realizing this, I reworked Geodesy internals to be more trait based, and have recently merged trait implementations for both coordinate tuples (skip to line 110, to ignore macro based implementations of index, add, sub etc.), and ellipsoids. I believe this has led to huge code maintainability improvements.

I have, however, not checked what it means for the speed. But Geodesy is still amply fast for my use cases.

@michaelkirk
Copy link
Member

michaelkirk commented May 17, 2024

Bikeshed time– should the one-dimensional trait unit be called geo_traits::Coord or geo_traits::Point?

I have both coord and point traits at this point to mirror geo

@kylebarron - I can understand why you'd do it for symmetry, but if you weren't worried about symmetry, do you anticipate any complications arising from getting rid of one of them? They seem functionally redundant from what I understand.

Provided that there will be either a Point trait xor a Coord trait (but not both):

re: name bikeshedding: I think it mostly doesn't matter, but I'd vote for Point. I think it's a little more natural:

For example:

better worse
A MultiPoint consists of many Points A MultiPoint consists of many Coords
This method takes a Point geometry as input This method takes a Coord geometry as input

Being confused by the second example might be a stretch... but somehow "Coordinate Geometry" feels to me like it could be referring to any geometry that contains coordinates (any of Point, LineString, Polygon, etc.), as opposed to "Point Geometry" which feels like it's more likely talking about only a single spot.

I'd also say "Point" is maybe slightly more approachable of a term. And we don't need to abbreviate it!

Any counter examples where it's worse? It wouldn't be the end of the world for me either way.

@kylebarron
Copy link
Collaborator Author

@kylebarron - I can understand why you'd do it for symmetry, but if you weren't worried about symmetry, do you anticipate any complications arising from getting rid of one of them? They seem functionally redundant from what I understand.

I agree they're functionally redundant, and only exist to be semantically explicit.

re: name bikeshedding: I think it mostly doesn't matter, but I'd vote for Point.

That's fine with me.

Any counter examples where it's worse? It wouldn't be the end of the world for me either way.

Not that I can think of right now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants