Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PackedCoordinateSequence separated arrays variant #672

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bjornharrtell
Copy link
Contributor

This PR implements a third variant of the PackedCoordinateSequence implementations named Double2 for lack of a better name.

This variant explicitly uses three arrays (XY, Z and M respectively) instead of a stride based single array as the other two implementations do.

The motivation of adding this variant is to support a more direct and efficient way to interop with memory models of this layout. Popular examples of this memory model are GDAL and QGIS. It's also the memory model I chose for my own FlatGeobuf format and is the main reason for me wanting this (to be able to more efficiently use FlatGeobuf with JTS or ports of it).

Signed-off-by: Björn Harrtell <bjorn@wololo.org>
@bjornharrtell
Copy link
Contributor Author

@dr-jts can I get your opinion on this?

@jagill
Copy link

jagill commented Feb 7, 2021

Aside: Have you had success with this memory format? I was trying it out recently, in the hopes I could vectorize some operations, but I found significant overhead that swamped any SIMD benefits.

@bjornharrtell
Copy link
Contributor Author

@jagill when designing FlatGeobuf I was opting between a single array with dimensions as stride or the one I described here. I couldn't find any pro/cons with either and the choice was ultimately decided by that GDAL has that model and that 3D wasn't a primary goal at the time. I'm perhaps regretting it now but I still don't feel there is a clear winner.

I do not have any experience with SIMD. Which memory model would you say is ideal?

@jagill
Copy link

jagill commented Feb 8, 2021

I don't know which model is ideal. A flat strided array [x, y, z, x, y, z, ...] so far has been fastest for me. In Java, it is significantly faster than an array of points [Point(x, y, z), Point(x, y, z), ...] (this is not surprising). In theory, if you have separate arrays for each dimension [x, x, ...], [y, y, ...], [z, z, ...] you can:

  1. Be more agnostic as to whether you have z or m dimensions, and
  2. use SIMD more easily, because all your xs are in one place.

But so far I have not succeeded in making separate arrays faster than a strided array.

I suspect -- since most computations only use x and y -- that using a strided array for x and y but separate arrays for z/m would be better than having a single array with all of x y z and m.

@bjornharrtell
Copy link
Contributor Author

@jagill thanks for your thoughts, they give me some comfort that perhaps the memory model I've chosen isn't bad. :)

@bjornharrtell
Copy link
Contributor Author

I just remember another rationale for the paired XY layout and it's basically that X is more or less useless without Y but other dimensions and/or measures are more standalone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants