Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial composites with a subset of tiles from a full tile_grid? #12

Open
sheffe opened this issue Mar 15, 2019 · 3 comments
Open

Partial composites with a subset of tiles from a full tile_grid? #12

sheffe opened this issue Mar 15, 2019 · 3 comments

Comments

@sheffe
Copy link

sheffe commented Mar 15, 2019

I'm a few days into heavy use of slippymath and really enjoying it -- thank you! FWIW, I'm using it to rapidly construct training sets for some image analysis prototyping -- I don't know if you had that use-case in mind when writing the package, but I'm writing <50 lines of code to get a fantastic training set with zero overhead. Mind-blowing.

With that application in mind, I started pulling and caching huge numbers of tiles at relatively high zoom. Even with only ~10k tiles pulled, compositing everything back into a raster (to recover the spatial information) is not trivial -- that amounts to a 25,600px square image. For larger areas I'm doubtful that it's a good idea to try. The next step for many types of image analyses would be to take many bite-size chunks from that large image and crop/rotate/coarsen/etc to preprocess, so compositing the full raster is a bottleneck that isn't really needed other than to return the image tile files to a raster object with the CRS/extent/resolution embedded correctly.

I've experimented with the function compose_tile_grid and subsets of the tile_grid object, and I run into trouble when I don't pass in the complete tile_grid object in order to accurately staple tiles back into a raster with the spatial information correctly attributed.

I'm working through package internals and think I understand what's going on here. (I think) the bbox_to_tile_grid object transformation is exactly invertible. If that's true, then (borrowing from the README) we could probably take a statement like this:
tile_grid <- bbox_to_tile_grid(uluru_bbox, max_tiles = 15)
and come up with some corresponding process that turns a tile_grid into (eg) an sf polygon dataframe describing the precise boundaries of each component tile. That would allow for rasterizing each tile independently, with full spatial information for the specific tile that can be merged (or not) into a larger raster.

That would also permit informed subsetting of which tiles to look at in a subsample of a larger slippymath pull. For example, this kind of process would be broadly useful:

  • take a random subset of points within an area of interest,
  • find the tile polygons covering the points,
  • grab the tiles surrounding the covering tile in an NxN box
    => we get an enriched training set of many partially-overlapping images.

It looks like you've done a lot of work to remove the sf dependency before CRAN release, so this line of thinking may be better interpreted in spirit than implementation. Does the overall concept make sense? My use-case might be too idiosyncratic for a PR into the package, but if my premises seem directionally correct to you, then I'll plan to write up some logic to implement this independently. I can link a gist in case you're interested.

Thanks again for all of your work on this.

@MilesMcBain
Copy link
Owner

MilesMcBain commented Mar 27, 2019

Hey thanks for raising this, I am glad you are getting some use out of slippymath!

Compositing large numbers of tiles is definitely not a great idea with the package as it stands now. compose_tile_grid uses a simple call to raster::merge() which has some undesirable memory management practices that can really blow out processing time.

Regarding rasterising individual tiles: There's probably a function missing to do that. Right now you can get a list of sf bounding boxes for every tile in grid using tile_grid_bboxes, and you have your list of tile images you've pulled down, but you're kind of on your own from there to combine those pieces to make individual spatially referenced rasters.

However code to make those individual rasters already lives in compose_tile_grid. I mean this bit:

                        raster_img <-
                            raster::brick(image,
                                          crs = attr(bbox, "crs")$proj4string)
                        raster::extent(raster_img) <-
                            raster::extent(bbox[c("xmin", "xmax", "ymin", "ymax")])
                        raster_img

So if we pull that out into a new function it can be used to combine the pieces.

The one thing this wouldn't give you that you mentioned are polygons. Do you just need them as a step toward the individual rasters or are they important in their own right?

Edit: Have you checked out ceramic? It builds on slippymath and has some nice features like caching, and fetching tiles given many different kinds of spatial objects - not just bounding boxes.

@sheffe
Copy link
Author

sheffe commented Mar 27, 2019

Edit: Have you checked out ceramic?

If I had a nickel for every time Michael Sumner had already written the thing I needed but didn't know the correct search terms to describe... This looks great. I'm hoping to use non-Mapbox sources but the framework looks spot-on. (I want to download a ton of tiles to semi-permanent storage, which runs against Mapbox ToS.)

With your answer above and the ceramic pointer, I can see two ways to solve the issue I raised -- will experiment further. Worth checking first: @mdsumner do you anticipate extending the ceramic package to sources outside of Mapbox?

Answering your other question:

The one thing this wouldn't give you that you mentioned are polygons. Do you just need them as a step toward the individual rasters or are they important in their own right?

The pointer to compose_tile_grid is enough, I think -- I can make a polygon from the bbox of each tile and proceed from there. I just need to work tile-by-tile, but it's a small modification of the existing logic. If I get something working, would you want this feature in the package? (Totally cool if you want to keep the footprint small, especially since ceramic does a lot of the same job.)

The background use-case -- I'm definitely interested in the polygons in their own right. It might be an idiosyncratic workflow. I do lots of raster computations using data stored on S3, so it's hard to merge together a huge raster once, leave it out of memory, and load cropped subsets for processing. (Most of my work uses many small and frequently-overlapping subsets, and I process them in parallel, so I rarely need to merge more than a few tiles at a time.) I usually leave the raw data in small tiles and create an index file for loading only what I need in a batch. The basic idea: if I have a directory of N tiles, representing some combination of metadata (zoom-level, map-type, bounding box, etc), I store the filename/S3 key, metadata columns, and sf POLYGONs from the raster extents/bboxes. I can leave all of the raw data on S3 and use the index for (eg) intersecting to focal areas or defining a point sampling scheme. Arbitrary metadata can be added this way -- for example, maybe we want a balanced training set on a range of population densities, so stratify our image sampling accordingly. It also scales pretty well to large sets of tiles indexed in a DB. Having a PostGIS table of raster boundary polygons pointing to spatial data in S3 can seem odd, but S3 is dirt-cheap and fast, and S3/EC2 transfer is free.

@mdsumner
Copy link
Contributor

I actually just added 'general' branch which now includes an AWS source to ceramic, the one used by elevatr.

The fastest way to merge is via VRT and use of GDAL's lazy tools. I'm going to separate that out in ceramic soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants