Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for spherical geometry (S2) in Python #10

Open
jorisvandenbossche opened this issue Jan 27, 2022 · 38 comments
Open

Support for spherical geometry (S2) in Python #10

jorisvandenbossche opened this issue Jan 27, 2022 · 38 comments

Comments

@jorisvandenbossche
Copy link
Member

General issue to track the idea of improving the support for working with spherical geometries (instead of planar) in GeoPandas and more generally the Python geospatial ecosystem.

Some background links:

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Jan 27, 2022

And the previous community meeting, we had a discussion about this with @benbovy. Pasting the notes from that meeting here:

  • see this thread for context
  • https://github.com/benbovy/pys2index
  • overview from Benoit:
    • lightweight wrapper for S2 point index with API similar to scipy.spatial.cKDTree
    • performance in benchmarks so far is quite fast
    • would like to have vectorized wrappers for S2
    • S2 appears to be actively maintained and about to get additional functionality soon
    • used python-xtensor and pybind11 to work with S2 and numpy arrays
  • Two possible approaches to integrate:
    • own way to store geometries specfic to backend engine; convert geometries on the fly to S2 objects as part of specific operation (e.g., predicate)
    • Wrapper classes:
      • pygeos uses Python C extension wrapper for GEOS geometries so that GEOS objects are managed according to Python object lifecycles
  • consider putting in a request for NumFOCUS small development grant to start building out some of this support (next cycle may open early 2022): https://numfocus.org/programs/small-development-grants

@benbovy
Copy link

benbovy commented Jan 28, 2022

Some updates on this:

I've tried using pybind11 and xtensor-python to implement some vectorized functions working on numpy object arrays where array elements are Python-wrapped s2 objects (i.e., an approach similar to the one used in pygeos). Without success so far, unfortunately (see benbovy/pys2index#4).

There has been some effort to support numpy object arrays and/or ufuncs in pybind11 (e.g., pybind/pybind11#1152 and also this fork). I think it would really help, although it is not yet clear to me if/how we can adapt it in this case. I left a comment there: pybind/pybind11#1152 (comment).

@brendan-ward
Copy link
Member

Thanks for the update @benbovy !

I haven't been able to fully get my head around the use of pybind11 / xtensor-python parts here, but a couple thoughts came to mind:

  1. a common use would be in GeoPandas, where a geometry array is 1-dimensional. So supporting n-dimensional ufuncs is maybe a nice-to-have rather than must have? Thus the essential bit is really just a loop over a 1D array of objects (pointers to wrapped S2 entities). It seems like we could leverage numpy to do the broadcasting before calling into the pybind11 / C++ tier? (not sure, haven't really thought this out)

  2. maybe it would be easier to leverage numpy C API (i.e., meet it where it's at), as is done in pygeos, and then build this as a Python C extension a bit more manually (like pygeos) rather than via pybind11? To do this, I think we'd have to create a C API around S2, so in addition to the boilerplate required for Python C extension there would also be some maintenance burden on wrapping S2 as a C API. One advantage of this approach is that it should enable numpy ufuncs in the C tier, much like pygeos. This would need some investigation into how S2 might be best wrapped as a C API, but we could theoretically leverage some of the recent patterns worked out in GEOS (passing coordinate buffers instead of setting individually). Plus there might be some advantage of having similar patterns at the lower (C) level between the two implementations on top of underlying geometry engines (GEOS and S2); this might be able to better leverage contributors between the pygeos / shapely side and S2 bindings side. (though might be trying too hard to make this analogous to pygeos)

@martinfleis
Copy link
Member

@benbovy on a less technical note, if you still have a capacity and an interest in applying for the NumFOCUS SDG, the timeline for the next round is following:

  • Call for Proposals Announcement: February 4, 2022
  • Proposal Submission Deadline: March 4, 2022
  • Committee Selection Deadline: March 18, 2022
  • Notification to Applicants Deadline: April 15, 2022

If we should go ahead, I suggest organising a call in the first half of February to discuss the details.

@benbovy
Copy link

benbovy commented Jan 29, 2022

Thanks for your feedback @brendan-ward !

A common use would be in GeoPandas, where a geometry array is 1-dimensional. So supporting n-dimensional ufuncs is maybe a nice-to-have rather than must have? Thus the essential bit is really just a loop over a 1D array of objects (pointers to wrapped S2 entities)

I agree, we could certainly start with 1-dimensional arrays. That said, longer term I'd be interested in exploring the possibility of reusing n-dimensional vectorized S2 Python wrappers (as well as pygeos) with Xarray, which is undergoing some heavy refactoring to allow custom indexes. Xarray + geospatial indexes would be great for processing outputs of climate / ocean / landscape evolution / earth system models.

I think we'd have to create a C API around S2

That's my main concern. Unfortunately, there's no s2_c.h equivalent of geos_c.h as pointed out by @paleolimbot (https://youtu.be/zqRhF2XM1CE?t=1197). See also r-spatial/s2#160. Long term it would be great to have a common S2 C API that can be maintained by both the Python and R (and maybe other) geospatial communities, but this represents quite a big task IMHO. Also, it's best if we can avoid vendoring S2 in a C/Python wrapper.

On the other hand, if we can make this possible without much effort, simply using pybind11 (and xtensor-python) to create the Python wrappers would allow us to develop new features / iterate very quickly and would avoid the burden of maintaining an intermediate C API + dealing with the low-level Numpy C API (+ vendoring S2). It not yet clear to me what is needed to make this approach work, though. It would be great to have thoughts from folks at QuantStack. @wolfv, have you ever thought about adding vectorized functions / ufuncs working with numpy arrays of wrapped CGAL objects (as np.object dtype) in scikit-geometry?

@benbovy
Copy link

benbovy commented Jan 29, 2022

@martinfleis that sounds good. I'm happy to discuss the details within the next two weeks.

Before going further with a NumFOCUS application, I'd like to clarify the technical challenges of the C API vs. pybind11 approaches. TBH I'm not sure to have the full capacity of going deep into either the design of a S2 C API or heavy customization of pybind11 internals, especially within the context of a NumFOCUS small development grant.

@brendan-ward
Copy link
Member

@paleolimbot I'd be curious about your thoughts on a shared C API against S2 that could be leveraged for both Python, sf, and anything else that can use a C API. It looks like some of the code already in sf could be leveraged to get this started (per your thinking in r-spatial/s2#160).

Specifically, if this is something of interest to the broader community, should we try and take this up directly with S2? Long term, it seems ideal that the C API lives inside that project, as it does with GEOS, and given the approach with GEOS it seems also reasonable that it not be expected to have full coverage of the underlying C++ API, at least not initially.

However, that might lead to other challenges:

  • slower iteration speed in downstream libraries (vs vendor + wrap in those libraries) since it means waiting on C API releases
  • since our target audience here is also using GEOS for planar spatial operations, it might be ideal to make the C API around S2 a bit more similar to GEOS (if such is possible), in order to better simplify the calling patterns from Python / R / etc. As opposed to going with a C API that follows the S2 C++ API more closely.

Since it looks like sf has already taken the vendor and partially wrap with C API approach, we'd be curious about the maintenance effort there as we think through our approach here.

@jorisvandenbossche
Copy link
Member Author

From watching the FOSS4G talk of @paleolimbot mentioned above (https://www.youtube.com/watch?v=zqRhF2XM1CE), it is my understanding that the S2 Geometry library itself it quite low level, and all the functionality to work on "simple features" (as what we get from GEOS for planar features) actually lives in the R s2 package.

If that's correct, we shouldn't put effort in directly interfacing with S2, but rather with the C or C++ layer in R's s2. Which would then indeed require making this part of s2 more standalone as outlined in r-spatial/s2#160 / mentioned by Brendan in the comment above.

@paleolimbot
Copy link

Thanks all for looping me in here! @jorisvandenbossche is right that anything resembling a GEOS-like interface lives in the R package. More specifically, I copied the BigQuery Geography API (which copies the PostGIS API more or less), because I could test it as I went.

I'm conflicted about a C API for S2 because it robs S2 of some of the things that make it so great (despite the fact that I opened that ticket for myself). That said, S2 C++ is complicated to use and an awesome library that nobody has access to in Python is not doing anybody any good. I have some ideas about how to organize a C API and will give it a try since it seems like there may be interest in helping if I get it started. The nice part about doing this within the S2 R package is that there's near 100% test coverage, although eventually a C API should live elsewhere.

Another approach I've been thinking about is one that operates on vectors of geometries (as ABI-stable C ArrowArrays) rather than individual geometries. Operating on chunks of geometries is nice because you don't have to sacrifice performance on points to support multipolygons (et al.)...you just send more features in a chunk and the implementation can loop without a bunch of tiny allocs or virtual method calls. This depends on the somewhat experimental stuff that Joris, I, and others have been working on with how geometry can be passed around as a struct ArrowArray.

Or a combination of both! I'll start my own C API ticket and see what happens.

I probably missed a few things from the thread...let me know if I didn't catch something important!

@brendan-ward
Copy link
Member

one that operates on vectors of geometries (as ABI-stable C ArrowArrays)

This sounds very interesting! I've been wondering where some of the boundaries might be with those (mostly a separate topic), namely serialization to / from file storage vs on the compute side between a client library (e.g., something in Python or R) and the underlying geometry implementation (e.g., GEOS or S2).

Is the idea that the core object that is used in the client library becomes the ArrowArray, which is then passed to a layer in C/C++ that translates from those to GEOS or S2 objects? Is that limited to geometry construction / serialization from the underlying library (GEOS / S2), and otherwise the client holds on to wrapped pointers to GEOS or S2 objects?

you just send more features in a chunk and the implementation can loop without a bunch of tiny allocs or virtual method calls

I was thinking about this too in terms of trying to figure out which operations in a hypothetical C API for S2 would need to work on scalar geometries vs 1D arrays of geometries and have the underlying implementation do the looping rather than the caller (passing arrays seems very appealing). The part where I got hung up was that on the Python side (in pygeos/shapely), we're leveraging looping mechanics from numpy to give us the inner loop over multi-dimensional arrays and then operating on scalar geometries. I.e., it is the client that knows how to do the looping rather than the underlying implementation. Though our most common use cases so far are likely 1D arrays of geometries (though @benbovy 's comment re: n-dimensional arrays is well taken), and I might not be thinking about this correctly at all.

I have some ideas about how to organize a C API and will give it a try since it seems like there may be interest in helping if I get it started

Yes - interested! (may not have tons of capacity to throw at this, but interested in being involved). I was thinking of drafting a Google Doc or similar to outline some thoughts on the C API, but your thoughts on this are going to be a much better starting point. Is there someplace where we (community) can start collaborating around details about the API more specifically?

Is there interest in a meeting to discuss this topic, and shared needs in the S2 / ArrowArray space for both Python and R?

@benbovy
Copy link

benbovy commented Jan 31, 2022

From this discussion it seems that the best path forward would be to develop a common, GEOS-like C API for S2 that can be leveraged from different "client" ecosystems such as R or Python geospatial libraries. Although you guys certainly have much more experience than I have with GEOS, Arrow and/or the Numpy C API, I'd still be happy to help somehow if I'm able to do that, as I would really like to see more S2 capabilities accessible from within Python.

Some naive questions:

@paleolimbot how much effort do you think it would require to refactor bits from the S2 R package into a target-language agnostic C API? Do you think that short or mid-term we could already have something (even with a few features only) so that we can start building Python wrappers on it?

Perhaps a very naive question: would it be possible to separate the building of a GEOS-like C API from the building of the S2 library itself? @brendan-ward's comment re: ideally the C API would live inside the S2 project, although I don't know if there's interest in that from the S2 maintainers point of view and this would also probably further slower the iteration speed. Note: there are conda packages for the S2 library.

Could we easily reuse with Numpy arrays a C API that is working with C Arrow arrays? I.e., perform broadcasting, convert to Arrow array, call the C API and then convert back to Numpy array at little overhead cost? As a user, I indeed find the pygeos/shapely approach based on Numpy ufuncs quite appealing :), even though I acknowledge that it is probably overkill for 80% of use cases.

I'm conflicted about a C API for S2 because it robs S2 of some of the things that make it so great

I was also wondering about that, but I guess that S2-specific wrappers may live alongside (and more-or-less independently from) a GEOS-like "standardized" S2 C API, depending on what we want to achieve using Python, R,...?

@brendan-ward
Copy link
Member

@benbovy (speaking for myself here) - I don't think we've mady any decisions about which direction to pursue or even recommend, we're still gathering info to weigh pros / cons. So please don't take the suggestions around a C API for S2 as trying to shut down your work on the pybind11 / xtensor-python approach!

It does seem that in elaborating how we'd want a C API to work would also inform how we'd want an API via pybind11/xtensor-python to work (though a good bit of that would be inside the C++ tier rather than API exposed to python).

Question: if for the short term we wanted to stick to the 1D approach to get this implemented most quickly, and manage all the looping ourselves as a simple loop over that array, would that sidestep the issues around how to handle object arrays in xtensor-python that you were bumping into? I.e., sidestep the ufunc related issues specifically. Or is the issue more at a fundamental level of trying to have an object array of S2 objects at any dimension? (sorry - not entirely following your box example linked above)

Yet another idea, no idea if it would work. I wonder if there is middle ground of using pybind11 but with a more C style interface rather than object-oriented interface (i.e., don't use pybind to expose C++ classes with methods).

So in pygeos our core geometry object is just a struct that holds a couple pointers and is defined as a type:

typedef struct {
  PyObject_HEAD void* ptr;
  void* ptr_prepared;
} GeometryObject;

(other mechanics around managing the python type wrapping around this omitted here)

Which are then always passed around as PyObject pointers for use in numpy arrays or scalars; numpy doesn't need to know that . The ptr just points to a GEOSGeometry, which (if I follow) is itself just a struct around a GEOS C++ geometry object.

So (if I follow correctly), it seems like it would be possible (in theory, not up on pybind11 yet) to do something similar: constructors return a pointer to a struct (let's call this PyS2Geometry for example) that is what is stored into numpy arrays and passed back to Python; this has has an underlying C++ class underneath (let's call this S2Geometry) that is not visible to pybind11 specifically, but is accessible in your C++ tier by following the pointer. Note that the S2Geometry could just be a shell that manages more specific types underneath or via inheritance.

Then I think you could expose functions like intersects(PyS2Geometry a, PyS2Geometry b) (scalar example) to Python, and in your C++ tier get back the underlying S2Geometry, and call methods or functions using those.

So in this purely hypothetical example, pybind11 is really just helping with the boilerplate around creating a Python C extension, but allowing you to more easily have a C++ implementation underneath rather than having to define a C API in the middle. No idea what that means for working with numpy at the C++ level though.

@benbovy
Copy link

benbovy commented Feb 1, 2022

No worries at all! I'm really open about which direction to pursue and I'm enjoying the discussion here! Sorry if my tone suggested otherwise.

The box example is also confusing, sorry. I was messing around with pybind11 (and dumped some messy code in a branch), trying to see if we can create vectorized functions with custom types, both exposed to Python. I defined a box class wrapping the S2 equivalent but it could have been any other custom type, not even related to S2.

Note that the S2Geometry could just be a shell that manages more specific types underneath or via inheritance.

That is what I had in mind too (instead of exposing more specific S2 derived types to Python).

pybind11 is really just helping with the boilerplate around creating a Python C extension, but allowing you to more easily have a C++ implementation underneath rather than having to define a C API in the middle.

Yes this is what I wanted to check. Taking your example, I was wondering if pybind11 could even help with the boilerplate around the PyS2Geometry struct and the vectorization, i.e.,

  • expose the S2Geometry type in Python: py::class_<S2Geometry>(m, "S2Geometry")
  • register it as Numpy object dtype, e.g., using this macro (not available yet in pybind11): PYBIND11_NUMPY_OBJECT_DTYPE(S2Geometry)
  • expose intersects(const S2Geometry& a, const S2Geometry& b) as a vectorized Numpy (universal) function: m.def("intersects", py::vectorize(interstects)).
  • or do something like m.def("intersects", [](py::array_t<S2Geometry> a, py::array_t<S2Geometry> b){ ... })

However, I couldn't get it to work with the box example above. It is likely that I'm thinking about it too naively and that I'm missing something important. For example, py::vectorize doesn't work with non-POD types. It is not clear either to me how we can separate Numpy stuff from Eigen stuff implemented in this PR and in this pybind11 fork.

@brendan-ward
Copy link
Member

@benbovy I did some testing with pybind11 around returning numpy arrays of custom objects here.

Let me preface this by saying I am not at all up to speed on modern C++, esp. pointer semantics, still quite unfamiliar with pybind11, and basically don't know what. I'm doing here. I think this is working for a hypothetical example of wrapping a foreign object into a class that we then return as a pointer. I only tried the 1D array case and did not use xtensor-python.

The things that seemed to make this work were using PYBIND11_NUMPY_OBJECT_DTYPE from the pybind #1152 issue to register our custom type as an object dtype for numpy, and then returning a unique_ptr (should it be shared_ptr??) to our custom class as each element in the array (so that instances weren't copied when assigning into the array for return). Also had to register the custom class with pybind11, so that it knew how to __repr__ it, for example.

For the example I construct a custom Point object from x and y input arrays (looping ourselves), where a Point holds an instance of a foreign object, meant to simulate an S2 object, which allocates and deallocates memory (I haven't yet profiled for memory leaks); it seems like it is calling the destructors properly and should release it.

Point was just meant to stand in for a wrapper around S2 objects, not a specific concrete Point type. Not sure if this solves the specific issues you were bumping into above.

@paleolimbot
Copy link

Hi all...sorry for the radio silence, I wanted to put some fingers to keyboard before I said that something was going to be easy or hard!

This thread has helped me a lot, in particular helping clarify that S2 doesn't necessarily need a C API, it really just needs the simple features compatability layer to live outside the code that interfaces with R. The real crux of it is that there is no S2Geometry class like there is with GEOS, which is what I had to invent when I wrote the bindings the first time. Handling the C linking stuff (catching exceptions, defining opaque pointer types) is an orthogonal problem that sounds like it won't be necessary. Anything Python-related in compiled code is over my head, so correct me if I didn't get the gist of that right.

I've opened a PR to start the process of extracting out a standalone S2Geography class and functions (e.g., s2_num_points() that operate on them ( r-spatial/s2#165 ). Feel free to comment, particularly on the files in src/s2-geography/, since that would theoretically be the folder that a Python package could copy and use. My C++ is not awesome and I would be interested in any feedback on the design! Perhaps a good place to start would be geography.hpp for those keen.

With respect to the possibility of an S2Geography that sits on top of an Arrow array, I wrote the class in such a way that this could be implemented (i.e., an S2Geography does not have to own its data, although all the current subclasses do). That is perhaps a future battle.

@martinfleis
Copy link
Member

Just a reminder that if we want to apply for the SDG grant in this round, the Proposal Submission Deadline is March 4, 2022.

Cc @benbovy

@benbovy
Copy link

benbovy commented Feb 25, 2022

Hello everyone, getting back on this thread after a few busy weeks, sorry!

@brendan-ward, @paleolimbot, many thanks for sharing what you've done!

@brendan-ward - Your test was very helpful. I could also make work the 1D array case with xtensor-python, using std::unique_ptr<MyClass> instead of MyClass for the type of the array elements. I couldn't make a universal function, though (maybe because of this?) I still don't have a full understanding of what I'm doing TBH, but this is already promising.

@paleolimbot - I like your approach in geography.hpp. I wonder if we couldn't use a template class (i.e., S2Geography<S2Point>, S2Geography<S2Polyline>, S2Geography<S2Polygon>, etc.) instead of runtime polymorphism, but it's probably not a big deal.

@martinfleis - We're close to the dealine, indeed :/ I'll try to see if I can draft a proposal early next week, using as a reference the former proposals that you mentioned during the community meeting. I'll keep you updated! We can have a call to discuss the details if needed and if you're available next Monday or Tuesday.

@martinfleis
Copy link
Member

martinfleis commented Feb 25, 2022

@benbovy we can always postpone it to another round (June 3). You can also check our GSoC project idea on S2 and salvage some text from that (https://github.com/geopandas/geopandas/wiki/Google-Summer-of-Code-2022#s2---bringing-spherical-geometry-to-python).

I am available on Monday after 5pm UTC and Tuesday at almost any time feasible for Europe. But while I am happy to help organising, I am of no use technically in this.

@brendan-ward
Copy link
Member

@benbovy I should be available after 16:00 UTC on Monday/Tuesday if it would be helpful for me to join, but definitely do not hold it up on my account. Happy to help review proposal / ideas.

@paleolimbot
Copy link

Ok, it's done! There's a long way to go, but at least the simple features <-> s2geometry mapping is now in its own repo: https://github.com/paleolimbot/s2geography . I'm not awesome at CMake but it seems to work locally and on GitHub actions!

@jorisvandenbossche
Copy link
Member Author

That's really great @paleolimbot !

@jorisvandenbossche
Copy link
Member Author

We had a chat with Dewey, Benoît and Brendan yesterday, and some (partial/unstructured) notes can be found at https://hackmd.io/FiNFZ6wkQCufZyyxBewFYA?both

@benbovy
Copy link

benbovy commented Nov 28, 2022

Hi all,

I’m happy to report some good progress on this front.

s2geometry

  • version 0.10.0 packaged on conda-forge for Linux, MacOS (x86, arm64) and Windows
  • provide cmake config file for the conda-forge package so that it is easier to integrate in downstream cmake projects

s2geography

s2shapely (Python bindings)

https://github.com/benbovy/s2shapely/

  • Full support for Numpy ufuncs (*) with geographies handled via the np.object dtype 🎉
    • after digging into pybind11 internals and a lot of trial-and-error, I’ve found some tricks that allow reusing pybind11::vectorize without the need to re-implement all the broadcasting logic! It is a bit hacky but reasonably safe IMHO and hopefully this clears up the path to seamless implementation of a larger set of Python bindings🤞
    • There a potential important caveat: [BUG] pybind11::cast is not thread safe pybind/pybind11#2765. I haven't tested releasing the GIL in vectorized functions yet, though. (**)
  • A small subset of shapely classes and ufuncs counterparts already implemented (with unit tests): Geography, Point, LineString, GeographyType, is_geography(), get_type_id(), get_dimensions(), prepare(), is_prepared(), destroy_prepared(), interesects(), equals().
  • provide more bindings (copy shapely’s API)
  • add support for empty geography
  • documentation (+ type stubs generation, e.g., with mypy?)
  • early release
  • conda-forge packages
  • wheels

What do you think about the name “s2shapely”? I kind of like it, especially if we are able to replicate most of shapely’s API. Perhaps it would make sense to eventually move the repository into the shapely Github organization?

The pybind11 workarounds used here still need to prove their "robustness", but if it works well in the general case (I'm optimistic 🙂) then I don’t see any major blocker for the next steps. I’m going to work on implementing more things in s2shapely by copying shapely’s API.

(*) EDIT: not real Numpy ufuncs but pybind11 vectorized functions.
(**) EDIT2: unfortunately, pybind11::vectorize doesn't seem to support releasing the GIL? benbovy/spherely#2

@martinfleis
Copy link
Member

@benbovy this is great!

What do you think about the name “s2shapely”?

I understand the choice but can already imagine a situation when debugging some geopandas bug:

  • "do you have s2shapely installed?"
  • "yes, I do have shapely"
  • "no, I mean s2shapely, not shapely"

I would probably try to pick another name that does not cause a confusion. The similar issue may happen when you are talking about it. It will always require clarification that this is the one with s2, not a shapely you already know.

@benbovy
Copy link

benbovy commented Nov 29, 2022

@martinfleis I agree this may be confusing in such situation. On the other hand, if the goal is to closely replicate shapely's API using a different geometry backend, choosing a name that is too unrelated may be confusing too for newcomers.

To help clarifying between the two packages maybe we could provide different class names, e.g., S2Point vs. Point, S2Polygon vs. Polygon, etc (benbovy/spherely#1)? The base classes already have different names: Geography vs. Geometry. Also, calling s2shapely API with shapely.Geometry object or vice-versa is not possible and clear error messages would already help avoiding too much confusion.

Some alternative names:

  • scikit-sphere (import sksphere): I initially picked this name but I'm not a great fan of different package vs. import names, also I always make typos while trying to type the first "sksph" letters :)
  • scikit-s2 (import sks2): easier to type but non consistent with most of other scikit-xxx packages where xxx is rather a common name (learn, image, build, geometry, etc.) than a given library
  • pys2 (like pygeos): already taken on pypi.org, unfortunately (and pygeos will be soon deprecated)
  • s2: already taken on pypi.org (also confusing: is it the s2geometry library or python bindings?)
  • shapely-sphere (import ???)
  • s2-shapely (import s2shapely)

@paleolimbot
Copy link

I think s2shapely is probably accurate for what you're trying to do here and I doubt it will be all that confusing; however, you could also use the name 's2geography'.

@jorisvandenbossche
Copy link
Member Author

@benbovy thanks for the overview of todo items and progress!

To help clarifying between the two packages maybe we could provide different class names,

I would personally lean to rather use a more distinct package name, but then the same class names to make the API as similar as possible (except things like base class geography vs geometry, etc).

I was thinking the same as what @paleolimbot already mentioned as well: we could just use "s2geography", to keep it close to the C++ library it is wrapping? And since it is python vs C++, there is no naming conflict. Or do you think it is confusing because it might not be clear which of the two you are talking about? (most users will just use the python bindings, and so won't run into that confusion though)

@jorisvandenbossche
Copy link
Member Author

we could just use "s2geography", to keep it close to the C++ library it is wrapping? And since it is python vs C++, there is no naming conflict

Although there is of course some conflict. For a package manager like conda, we would then need to use python-s2geography as the install name (for import s2geography), while on pip it could be just s2geography .. Yeah, that's certainly a downside of this option.

@paleolimbot
Copy link

paleolimbot commented Nov 30, 2022

Also, 's2geography' for the C++ library is a name I pulled out of my bum one day when my daughter was napping and could totally be changed (although maybe it's already a conda package?)

@benbovy
Copy link

benbovy commented Nov 30, 2022

I like the name "s2geography" for the C++ library!

Some other options:

arcly
curvely
spherely
(EDIT) roundly

Those names are short, clear, at the same time close and distinct enough from shapely. They are also available on pypi / conda.

@benbovy
Copy link

benbovy commented Nov 30, 2022

Those names are short, clear, at the same time close and distinct enough from shapely. They are also available on pypi / conda.

Maybe a bit hard to pronounce, though? Any other (curved-shape-word)ly idea?

@brendan-ward
Copy link
Member

Another option: spherically
(if we're trying to stick to a ly suffix)

@jorisvandenbossche
Copy link
Member Author

From the -ly options, I think I might like spherely the most (I can see the tagline "like shapely, but on the sphere")

I would personally still consider s2geography as an option as well, though (also for the python import name)

@martinfleis
Copy link
Member

I'm on the same side as @jorisvandenbossche.

@paleolimbot
Copy link

+0.001 for "s2geography" (just because it opens the Python package up to doing more than just providing a compatible shapely interface).

@benbovy
Copy link

benbovy commented Dec 9, 2022

I like spherely (also the tagline!). spherically is nice too, although put in a sentence without any highlight it might be slightly confusing.

I'd prefer spherely over s2geography for the Python library because from the user point of view it is closer to shapely (similar API) and also to avoid being it too tied to the C++ package which is more an implementation detail IMHO (possible that the Python library provides additional features or includes other dependencies in the future?). There's also the conda package issue mentioned by @jorisvandenbossche, although that's a minor concern.

@benbovy
Copy link

benbovy commented Dec 9, 2022

+0.001 for "s2geography" (just because it opens the Python package up to doing more than just providing a compatible shapely interface).

Yes that makes sense too! Although I think it is OK if "spherely" vs. "shapely" do not share the same exact API and features.

That is actually my concern with using the same name "s2geography" for both the C++ and Python libraries: it restricts more the Python package to just providing Python bindings to the C++ library.

@paleolimbot
Copy link

Fair enough! 'spherely' is nicer to say too 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants