PackedCoordinateSequence separated arrays variant #672

bjornharrtell · 2021-01-21T22:03:52Z

This PR implements a third variant of the PackedCoordinateSequence implementations named Double2 for lack of a better name.

This variant explicitly uses three arrays (XY, Z and M respectively) instead of a stride based single array as the other two implementations do.

The motivation of adding this variant is to support a more direct and efficient way to interop with memory models of this layout. Popular examples of this memory model are GDAL and QGIS. It's also the memory model I chose for my own FlatGeobuf format and is the main reason for me wanting this (to be able to more efficiently use FlatGeobuf with JTS or ports of it).

Signed-off-by: Björn Harrtell <bjorn@wololo.org>

bjornharrtell · 2021-01-22T13:05:03Z

@dr-jts can I get your opinion on this?

jagill · 2021-02-07T20:10:25Z

Aside: Have you had success with this memory format? I was trying it out recently, in the hopes I could vectorize some operations, but I found significant overhead that swamped any SIMD benefits.

bjornharrtell · 2021-02-07T20:50:05Z

@jagill when designing FlatGeobuf I was opting between a single array with dimensions as stride or the one I described here. I couldn't find any pro/cons with either and the choice was ultimately decided by that GDAL has that model and that 3D wasn't a primary goal at the time. I'm perhaps regretting it now but I still don't feel there is a clear winner.

I do not have any experience with SIMD. Which memory model would you say is ideal?

jagill · 2021-02-08T13:25:14Z

I don't know which model is ideal. A flat strided array [x, y, z, x, y, z, ...] so far has been fastest for me. In Java, it is significantly faster than an array of points [Point(x, y, z), Point(x, y, z), ...] (this is not surprising). In theory, if you have separate arrays for each dimension [x, x, ...], [y, y, ...], [z, z, ...] you can:

Be more agnostic as to whether you have z or m dimensions, and
use SIMD more easily, because all your xs are in one place.

But so far I have not succeeded in making separate arrays faster than a strided array.

I suspect -- since most computations only use x and y -- that using a strided array for x and y but separate arrays for z/m would be better than having a single array with all of x y z and m.

bjornharrtell · 2021-02-08T22:14:47Z

@jagill thanks for your thoughts, they give me some comfort that perhaps the memory model I've chosen isn't bad. :)

bjornharrtell · 2021-02-09T20:14:15Z

I just remember another rationale for the paired XY layout and it's basically that X is more or less useless without Y but other dimensions and/or measures are more standalone.

bjornharrtell force-pushed the packed-sequence-double2 branch from 6ea7a4b to 6681658 Compare January 21, 2021 22:05

PackedCoordinateSequence separated arrays variant

4541012

Signed-off-by: Björn Harrtell <bjorn@wololo.org>

bjornharrtell force-pushed the packed-sequence-double2 branch from 6681658 to 4541012 Compare January 21, 2021 22:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PackedCoordinateSequence separated arrays variant #672

PackedCoordinateSequence separated arrays variant #672

bjornharrtell commented Jan 21, 2021

bjornharrtell commented Jan 22, 2021

jagill commented Feb 7, 2021

bjornharrtell commented Feb 7, 2021

jagill commented Feb 8, 2021 •

edited

bjornharrtell commented Feb 8, 2021

bjornharrtell commented Feb 9, 2021

PackedCoordinateSequence separated arrays variant #672

Are you sure you want to change the base?

PackedCoordinateSequence separated arrays variant #672

Conversation

bjornharrtell commented Jan 21, 2021

bjornharrtell commented Jan 22, 2021

jagill commented Feb 7, 2021

bjornharrtell commented Feb 7, 2021

jagill commented Feb 8, 2021 • edited

bjornharrtell commented Feb 8, 2021

bjornharrtell commented Feb 9, 2021

jagill commented Feb 8, 2021 •

edited