Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RasterRegion should produce LazyMultibandTile #199

Open
3 tasks
echeipesh opened this issue Jun 27, 2019 · 0 comments · May be fixed by #223
Open
3 tasks

RasterRegion should produce LazyMultibandTile #199

echeipesh opened this issue Jun 27, 2019 · 0 comments · May be fixed by #223
Assignees

Comments

@echeipesh
Copy link
Collaborator

echeipesh commented Jun 27, 2019

Currently RasterRegion.raster triggers a read when the raster is requested:

require(bounds.intersects(source.gridBounds), s"The given bounds: $bounds must intersect the given source: $source")
@transient lazy val raster: Option[Raster[MultibandTile]] =
for {
intersection <- source.gridBounds.intersection(bounds)
raster <- source.read(intersection)
} yield {
if (raster.tile.cols == cols && raster.tile.rows == rows)
raster
else {
val colOffset = math.abs(bounds.colMin - intersection.colMin)
val rowOffset = math.abs(bounds.rowMin - intersection.rowMin)
require(colOffset <= Int.MaxValue && rowOffset <= Int.MaxValue, "Computed offsets are outside of RasterBounds")
raster.mapTile { _.mapBands { (_, band) => PaddedTile(band, colOffset.toInt, rowOffset.toInt, cols, rows) } }
}
}

This in particular is a problem when writing tiles sources from RasterSource API using GeoTrellis LayerWriter because the first action taken is to groupBy the records by their index:

https://github.com/locationtech/geotrellis/blob/474ed9019b1281ce9e134167e7f7f3b0fc3e2eae/s3-spark/src/main/scala/geotrellis/spark/store/s3/S3RDDWriter.scala#L81

Since the read is triggered before this groupBy this results in shuffle of all of the raster pixels which is quite expensive.

What would be preferable is a having an instance of MultibandTile that contains a RasterRegion but does not read the pixels until they're explicitly requested by one of the functions. This would allow the groupBy to be performed on metadata only, greatly improving performance of all ingests.

This would be helpful behavior in other but similar situations where the tiles need to be sorted, filtered or joined before they're actually used.

I'm not sure if this should be default behavior (probably?) or if we should provide both behaviors as part of the RasterRegion interface: eagerRaster and lazyRaster.

  • Implement LazyMultibandTile
  • RasterRegion produces LazyMultibandTile
  • Benchmark a sample ingest with eager vs lazy tile read to validate assumption and document
@echeipesh echeipesh added this to the GT 3.0 milestone Jun 27, 2019
@echeipesh echeipesh self-assigned this Aug 2, 2019
@echeipesh echeipesh linked a pull request Aug 5, 2019 that will close this issue
@rossbernet rossbernet removed this from the GT 3.0 milestone Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants