Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build ontop of arrow Extension datatype #63

Open
nmandery opened this issue Oct 19, 2022 · 5 comments
Open

Build ontop of arrow Extension datatype #63

nmandery opened this issue Oct 19, 2022 · 5 comments

Comments

@nmandery
Copy link
Owner

After some discussion with @kylebarron on the georust discord we came to the conclusion that this crate could be implemented on top of the arrow Extension datatype. Support in arrow2 appears to be finished, support in polars is still to be implemented.

@allixender
Copy link

Great seeing you guys @nmandery @kylebarron working on the Rust geo ecosystem

@kylebarron
Copy link

An H3Array could use an implementation similar to what I do in geoarrow, which is make a wrapper array like my PointArray

Since h3 cells can be represented as raw uint64s, you could define an h3 array as

pub struct H3Array(PrimitiveArray<u64>)

Then the From implementation could convert from a PrimitiveArray or from an extension array.

geoarrow is also relevant because your polyfill implementation could return a PolygonArray and stay in arrow memory. Maybe an arrow-efficient implementation of polyfill would first see how many pentagons exist in the polyfill output before actually running the polyfill (is that possible?) and then you'd only have to make one allocation in theory.

@nmandery
Copy link
Owner Author

I got to look at the more primitive arrow2 types like you are using here. Looks quite straight forward.

In the end there will probably be different H3CellArray, H3DirectedEdgeArray, ... structs to represent the different types of H3 indexes with type safety. This should also help to avoid repeated validations of the contents.

Combining that with geoarrow for everything geometry-releated is the way I want to go. The only missing thing for this currently is only time ;)

@kylebarron
Copy link

This should also help to avoid repeated validations of the contents.

That and repeated downcasting were the main reasons I stored e.g. a PolygonArray as its constituent parts instead of directly as an arrow2::ListArray, because then you'd have to downcast on every row to access a single polygon

@nmandery
Copy link
Owner Author

In the meantime I started working on this arrow integration in https://github.com/nmandery/h3arrow . It is located in a new repository as it now is based on h3o instead of the H3 C library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants