Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose transform and CRS in /metadata #163

Open
emilhe opened this issue Feb 1, 2020 · 28 comments
Open

Expose transform and CRS in /metadata #163

emilhe opened this issue Feb 1, 2020 · 28 comments
Labels
enhancement New feature or request

Comments

@emilhe
Copy link

emilhe commented Feb 1, 2020

I am visualizing raster data in a Leaflet-based map component. Prior to using Terracotta, i was using a kind of ImageLayer, which emitted click events specifying the raster index of the click. I need this index to lookup some other data.

To accommodate larger rasters, i have switched to serving the data via Terracotta, i.e. i add a TileLayer to the Leaflet map with the appropriate Terracotta URL. Now, the TileLayer can obviously not tell me the raster index of a click, as it knows nothing about the underlying data. Hence i am left with the "general" map click event, which contains just the (lat,lon) of the click.

I can calculate the index of the nearest point in the raster from the (lat,lon) of the click, but this approach yields only an accurate result within +/- 2 pixel in each direction. I found that transforming the click and raster coordinates to EPSG:3857 prior to searching brings down the uncertainly to +/- 1. However, i would really like to get to zero, i.e. an exact match.

Since it is Terracotta, which is actually rendering which pixels goes where, i guess that Terracotta should be able to determine raster index of a (lat,lon) pair? If this could be done, think that the functionality should be exposed via a new API endpoint.

@dionhaefner dionhaefner added the enhancement New feature or request label Feb 1, 2020
@dionhaefner
Copy link
Collaborator

Could you please expand a bit on your use case? Why do you need pixel indices from lat / lon?

I am trying to determine whether this is a useful feature for a majority of users that warrants its own API endpoint.

@emilhe
Copy link
Author

emilhe commented Feb 1, 2020

In my typical uses cases i have multiple variables [a, b, c, ...] defined on the same coordinate grid. I then calculate some derived variable that indicates feasibility and plot it. When i click on the map to select a location, i would then like to get all of the variables [a, b, c, ...] that belongs to that point. To do this, i need the raster index.

A concrete example could be assessment of new sites for wind turbines. For each grid point i have many meteorological parameters (wind speed, air density, pressure, temperature, ...). To illustrate feasibility, i would plot e.g. the mean wind speed. When the user clicks the map to select a location for further analysis, i need to get the actual values of all of the variables; hence one needs index.

From a user perspective it is very annoying if the index mapping is not perfect. Imagine that you need to locate the pixel with the heighest wind speed within your region of interest. Hence, you locate the pixel on the map and click on it. Now, if the index mapping is not perfect, the resulting selection will show a lower value. To get the "correct site", i.e. the one with the highest wind speed, you will need to play around clicking on the neighbouring pixels until your get it right. That is a poor user experience :/

Using Terracotta for data visualization, i would say that this is a killer feature. It is really common that you make a selection on a map and need the underlying data for extended analysis. However, for visualization of "pure" images, i agree that the feature is not that useful. Still, i think that the usefulness for the data visualization case (which is pretty broad) is enough to warrant its own API endpoint.

@j08lue
Copy link
Collaborator

j08lue commented Feb 3, 2020

Pixel drilling

We have for quite while been planning to build a data extraction ("pixel drilling") tool for data used with Terracotta. Very similar to your use case:

  1. You have a collection of imagery nicely organized and indexed with Terracotta that you display on a map
  2. You want to be able to click on the map and get all the values along some key - be it a date key for time series or a variable/band key as in your case.
  3. You want the visual representation of the imagery on the map to be consistent with the numeric values you get.
  4. You (obviously) do not want to duplicate data or metadata, to save time and so you can be sure that your record is consistent across the two representations.

We considered different options:

  1. build this functionality into Terracotta
  2. make a plug-in interface for Terracotta where you could attach that functionality, or
  3. just make a stand-alone tool that perhaps can work off Terracotta's database and discover the rasters from there.

Earlier, we considered this to be out of scope (#79). But we can open this discussion again.

Accuracy

Regarding your concern with pixel accuracy: You have to be careful here, because Terracotta reprojects your data on the fly. Could be from a different coordinate reference system to Web-Mercator, or to match the exact xyz tile bounds:

https://github.com/DHI-GRAS/terracotta/blob/3c9feff9910382da1c827f11d8a84e7f08a44970/terracotta/xyz.py#L41-L42

So the pixels you see on your map are almost never the ones you have stored in your raster.

So if you want to accurately compute stuff and stay true to the original raster data, you will need an API where you can post your point (or feature) coordinates and that will accurately translate them into the nearest pixel(s) by using the original image transform and not the XYZ tiles. We are currently testing that concept. The performance is really nice with async retrieval of pixels from many rasters with Lambdas etc. But we do not yet have enough experience to conclude on how to implement this best - and connect it to Terracotta.

You see there are a lot of considerations to make. It would be great if you designed and prototyped something and presented your concept and considerations.

@emilhe
Copy link
Author

emilhe commented Feb 3, 2020

@j08lue That sounds great! My use case fits perfectly with your concept of "pixel drilling". I didn't know that the feature had been discussed before. Regarding the different options,

  1. build this functionality into Terracotta
    This was also my immediate idea. I image that a single endpoint would suffice. Something like

/pixeldrill/{keys}/{lat}/{lon}

or alternatively with the arguments (keys, lat, lon) as query parameters.

  1. make a plug-in interface for Terracotta where you could attach that functionality, or
    I generally like the idea about plugins. However, as i see it, pixel drilling is a core functionality, and thus i would prefer it to be available built-in.

  2. just make a stand-alone tool that perhaps can work off Terracotta's database and discover the rasters from there.
    This would also work, but i still think that (1) would provide the most coherent interface.

About accuracy, the lines of code that you quote was in fact the reason that i came up with the idea to do a projection of my coordinate grid to EPSG:3857 prior to searching for the coordinate. However, as noted in the initial post, while this approach did indeed increase accuracy, it's still not perfect.

In my previously look at the code, i kind of got stuck in the _get_raster_tile method,

https://github.com/DHI-GRAS/terracotta/blob/3c9feff9910382da1c827f11d8a84e7f08a44970/terracotta/drivers/raster_base.py#L451

trying to figure out the index mapping. I will give it another try. I guess that as soon as the correct transformation (lat,lon) -> pixel coordinate is determined, the implementation should be doable.

@j08lue
Copy link
Collaborator

j08lue commented Feb 3, 2020

Well, lat lon to pixel is simple:

import rasterio
import rasterio.warp
import rasterio.windows

lat = 55.871610
lon = 12.493728

with rasterio.open('/my/target/raster.tif') as ds:
    # transform to right CRS
    x, y = rasterio.warp.transform({'init': 'epsg:4326'}, ds.crs, [lon], [lat])

    # inverse-transform from projection coordinates to indexes
    ii, jj = ~ds.transform * (x, y)

    # round indexes to integers and turn them into a window
    i = int(ii[0] + 0.5)
    j = int(jj[0] + 0.5)
    window = rasterio.windows.Window.from_slices((j, j+1), (i, i+1))

    # extract the data
    pixel_value = ds.read(window=window)

Or use window = ds.window(x-dx, y-dy, x+dx, y+dy) as a short-cut, where dx and dy are half the pixel size.

@emilhe
Copy link
Author

emilhe commented Feb 3, 2020

I imagine that it's simple if one knows how to do it ;). In your code, could you share the imports? And is the "~transform" operation an actual function of just pseudo code?

I was thinking that there would be some rounding of indices when the pixels are drawn on the tiles other than the coordinate transformation, so that depending on the zoom level and the resulting tiles, the same (lat,lon) might map to different pixels. However, i would be very happy if this is not the case :)

@j08lue
Copy link
Collaborator

j08lue commented Feb 3, 2020

so that depending on the zoom level and the resulting tiles, the same (lat,lon) might map to different pixels

No, a lat lon position should map to the same source image pixel always. When you render map tiles of any zoom level, though, pixels will change size and position and their values will be interpolated accordingly. So the user will not see the exact same values on the map as in the point-extracted data. But the extracted data will be the "true" pixel data from the source image.

I added the imports to the above code. Have not tested it but it should work exactly like this. ~ is a valid inversion operator on affine.Affine. 😄

@emilhe
Copy link
Author

emilhe commented Feb 3, 2020

Ah, i see, thanks. With slight modifications (it seems that the transform does not work on lists of coordinates),

import rasterio
import rasterio.warp
import rasterio.windows

lat = 55.871610
lon = 12.493728

with rasterio.open('/my/target/raster.tif') as ds:
    # transform to right CRS
    x, y = rasterio.warp.transform({'init': 'epsg:4326'}, ds.crs, [lon], [lat])

    # inverse-transform from projection coordinates to indexes
    ii, jj = ~ds.transform * (x[0], y[0])

    # round indexes to integers and turn them into a window
    i = int(ii + 0.5)
    j = int(jj + 0.5)
    window = rasterio.windows.Window.from_slices((j, j+1), (i, i+1))

    # extract the data
    pixel_value = ds.read(window=window)

the code runs. Comparing the result, i.e. the resulting indices, to my previous approach, your suggested approach seems to be similar in accuracy, i.e. +/- 1 pixel of what i see on the screen.

So the user will not see the exact same values on the map as in the point-extracted data. But the extracted data will be the "true" pixel data from the source image.

Is guess this is the very "source of the problem". Intuitively, a user would expect to get exactly the value that they clicked on. I understand why this cannot be achieved when the zoom is low as all pixels are not rendered. But shouldn't it be possible if the zoom level is large enough that all pixels are rendered?

@emilhe emilhe closed this as completed Feb 3, 2020
@emilhe emilhe reopened this Feb 3, 2020
@emilhe
Copy link
Author

emilhe commented Feb 3, 2020

(the close action was unintentional)

@j08lue
Copy link
Collaborator

j08lue commented Feb 3, 2020

But shouldn't it be possible if the zoom level is large enough that all pixels are rendered?

It's not necessarily the same pixels. The client (Leaflet) defines which image transform the XYZ tiles have, so pixels might be displaced compared to the original image. Maybe web-optimization can help here? #151

It could also be a round-off issue. How many decimals do your lat/lon coordinates have and how large are your pixels?

@dionhaefner
Copy link
Collaborator

dionhaefner commented Feb 3, 2020

May I suggest something else to help you work around the issue now.

During ingestion, I suggest you add some additional metadata to your rasters:

with rasterio.open(myraster) as src:
    crs = src.crs
    transform = src.transform

metadata = driver.compute_metadata(
    myraster,
    extra_metadata=dict(crs=crs, transform=transform)
)
driver.insert(..., metadata=metadata)

Then, you get original CRS and transform of the raster when doing /metadata queries. This should enable you to do the index lookup like this:

  1. Project clicked (lat, lon) to raster CRS
  2. Apply inverse transform to this value

(this is what @j08lue suggested, but in the frontend)

Projecting both to another CRS is prone to off-by-one errors and not unique, especially at high latitudes. I suggest you stay in the native projection of the raster.

@j08lue
Copy link
Collaborator

j08lue commented Feb 4, 2020

You may have to get the transform and crs ready for serialization. list(transform) and crs.wkt might work.

@emilhe
Copy link
Author

emilhe commented Feb 4, 2020

Thank you for some great inputs! I am now realizing that what i was initially asking might not be possible - at least not when my data grid is in a different projection than the visualization (which is 'EPSG:3857' according to Leaflet documentation). To overcome this issue, i am not trying out a new workflow,

  1. Create a grid in 'EPSG:3857', i.e. something like
# Define coordinate region, this is approximately DNK.
xs, ys = np.linspace(850000, 1500000), np.linspace(7000000, 8000000)
# Create raster with dummy data.
Xs, Ys = np.meshgrid(xs, ys)
raster = Xs ** 2 + Ys ** 2
  1. Write the raster data to a cog file with the web_optimized flag enabled,
def raster_to_cog(xs, ys, raster, path):
    # Setup meta data.
    xmin, ymin, xmax, ymax = [xs.min(), ys.min(), xs.max(), ys.max()]
    nrows, ncols = np.shape(raster)
    xres = (xmax - xmin) / float(ncols)
    yres = (ymax - ymin) / float(nrows)
    transform = Affine.translation(xmin, ymin) * Affine.scale(xres, yres)
    src_profile = dict(
        driver='GTiff',
        height=nrows,
        width=ncols,
        count=1,
        dtype=raster.dtype,
        crs='EPSG:3857',
        transform=transform,
        nodata=np.nan,
    )
    # Write the data in could optimized format.
    with MemoryFile() as memfile:
        with memfile.open(**src_profile) as mem:
            # Populate the input file with numpy array
            mem.write(raster.astype(rasterio.float64), 1)
            dst_profile = cog_profiles.get("raw")
            cog_translate(
                mem,
                path,
                dst_profile,
                in_memory=True,
                quiet=True,
                web_optimized=True
            )
  1. Get the pixel value using the previous code, with the slight modification that the "+ 0.5" shift has been removed,
def pixel_drill(lat, lon, cog_path):
    with rasterio.open(cog_path) as ds:
        # transform to right CRS
        x, y = rasterio.warp.transform({'init': 'epsg:4326'}, ds.crs, [lon], [lat])
        # inverse-transform from projection coordinates to indexes
        ii, jj = ~ds.transform * (x[0], y[0])
        # round indexes to integers and turn them into a window
        i = int(ii)
        j = int(jj)
        window = rasterio.windows.Window.from_slices((j, j + 1), (i, i + 1))
        # extract the data
        pixel_value = ds.read(window=window)

        return pixel_value

My initial tests seem to indicate that this approach does yield exact pixel mapping (!), but i will need to do some more testing to be sure.

@dionhaefner
Copy link
Collaborator

Index lookup

I think what you are trying to do should definitely be possible, despite the numerous reprojection steps. Writing a lookup raster is not necessary. I'm not sure whether it should be i = int(ii) or i = round(ii), but there should be a reference for that.

If you consistently experience off-by-one errors with both rounding schemes, I would be grateful for a reproducible test case. It could be that Terracotta is actually off by a fraction of a pixel, so that would be a bug that needs to be fixed.

Again, the recipe should be:

  1. Transform lat, lon to native CRS
  2. Apply inverse transform to obtain pixel index

You can do this perfectly fine in the frontend, so I would be strongly opposed an API endpoint that does this. Such a function belongs into a potential Terracotta client (frontend) library.

Pixel drilling

The pixel drilling discussion is more general, where we don't just want indices, but the actual pixel values and time series. Doing this efficiently is not possible with our current data model. Solving this problem involves a different approach to storing the data, vanilla COG won't do it.

That is because COG can only be read one (internal) tile at a time. With an internal tile size of 256 x 256, a single-pixel read has an overhead of 65536, i.e., it takes tens-of-thousands times longer to read than it needs to. So this naive approach will not scale.

But if you are motivated you are still welcome to try. Maybe the performance is still usable for simple applications.

@emilhe
Copy link
Author

emilhe commented Feb 4, 2020

Index lookup

In my proposed workflow above i am not using a lookup raster. Rather i am interpolating my data onto a uniform grid in 'EPSG:3857' prior to creating the raster (this step is not shown, for brevity i just created some dummy data) so that my data grid and my view (pixel grid) are in the same coordinate system. Does that make sense?

With i = int(ii), the pixel mapping seems to be working, i.e. so far i have no reason to suspect that Terracotta is doing anything wrong.

As a understand, to do the transformation, the raster CRS must be known. Hence the client would need access to this information somehow, but i guess that could be within the scope of a (potential) terracotta client.

Pixel drilling

I see that with the current data structure, the pixel drilling cannot be perfectly efficient. However, i think that a more interesting question is whether it is effictient enough? To quantify the efficiency, i did a quick benchmark of the read code above, and on my PC the execution time is 1-2 milliseconds. For single-point request (as in my case), i would say that this is good enough.

So this naive approach will not scale.

When you talk about scaling, do you mean to non-point requests, e.g. points within a polygon?

@j08lue
Copy link
Collaborator

j08lue commented Feb 4, 2020

As a understand, to do the transformation, the raster CRS must be known. Hence the client would need access to this information somehow

Yes, that was the idea behind storing that information (CRS and transform) in the metadata, which the client can retrieve (/metadata).

When you talk about scaling, do you mean to non-point requests, e.g. points within a polygon?

No, the problem is actually bigger with point requests that with polygons. The thing is that, even when you request a single point, GDAL/rasterio has to read the full containing block of data from the tiled GeoTIFF, and then discard all but one pixel. Hence you pay for 60,000 pixels with time and data access, but only get one. Requesting a million single pixels like that, you will effectively read 240 GB, when the data you actually need is only 4 MB.

On the other hand, if you only want to "drill" a tiny fraction of your rasters occasionally, the overhead might be worth the convenience that lies in deploying the data only once.

I agree with @dionhaefner that, considering the inefficiency, this is not a feature we should include with Terracotta. But a separate API that builds on top of Terracotta's database might be a good solution for occasional usage.

An alternative is to look into implementing an alternative raster driver for a format that supports more efficient multi-dimensional data extraction. I do not know of a format that is fast enough, though.

@vincentsarago
Copy link

just an FYI becuse it wasn't mentioned but rasterio is already shipped with a sample method

https://github.com/developmentseed/cogeo-tiler/blob/master/cogeo_tiler/handler.py#L389-L402

with rasterio.open(src_path) as src_dst:
            lon_srs, lat_srs = warp.transform("epsg:4326", src_dst.crs, [lon], [lat])
            if not (
                (src_dst.bounds[0] < lon_srs[0] < src_dst.bounds[2])
                and (src_dst.bounds[1] < lat_srs[0] < src_dst.bounds[3])
            ):
                raise Exception("Point is outside the raster bounds")

            values = list(src_dst.sample([(lon_srs[0], lat_srs[0])], indexes=indexes))[
                0
            ].tolist()

@j08lue
Copy link
Collaborator

j08lue commented Feb 5, 2020

Nice, thanks for the pointer, @vincentsarago! That nicely simplifies the code, but does not solve the problem of read overhead. Have you encountered efficient solutions for that - also with a completely different tech stack?

@vincentsarago
Copy link

@j08lue I think you are mentioning @dionhaefner comment:

That is because COG can only be read one (internal) tile at a time. With an internal tile size of 256 x 256, a single-pixel read has an overhead of 65536, i.e., it takes tens-of-thousands times longer to read than it needs to. So this naive approach will not scale.

Sadly I don't think we can do iner tile value retrieval without fetching the full tile, and I'm not sure if its possible because we can easily retrieve tile based on the internal tile index but at the tile level I don't think it's possible to determine the index of the data for a specific pixel

@dionhaefner dionhaefner changed the title Get index of (lat, lon) for raster Expose transform and CRS in /metadata Apr 1, 2020
@emilhe
Copy link
Author

emilhe commented Oct 2, 2020

For anyone else with this issue, I got it working using a small extension that fetches point data from the geotiff. You can find it here,

https://pypi.org/project/terracotta-toolbelt/

Be aware that it's also necessary to,

  • Interpolate the data onto a grid that is regular in the projection in which the data is viewed, i.e. epsg:3857

  • The pixels of the data grid must be aligned with the map tiles. This can be achieve via the web_optimized=True flag

  • Terracotta post processing (e.g. interpolation) must be disabled, i.e. set REPROJECTION_METHOD="nearest"

I have created a small example that demonstrates how to use Terracotta for data visualization including pixel drilling,

https://github.com/thedirtyfew/terracotta-dash-example

When i get the time, I plan to add a hosted verison of the example on Heroku :)

@dionhaefner
Copy link
Collaborator

👋 @emilhe, thanks for sharing! I think it's great to have a full example deployment like yours, and your map looks gorgeous.

I'm not sure whether this is how I would solve the pixel drilling, but whatever gets the job done :)

FWIW, I would actually recommend to side-step the issue and do the value lookup in the frontend. You can retrieve the RGB value from the tile images in leaflet and map them back to the original pixel values via /colormap. Similar to what is being done here: https://johngravois.com/lerc-leaflet/

Since this just operates on the tile images, you can guarantee WYSIWYG compliance and don't need any additional API calls or special preprocessing. You just need to make sure to enable CORS for tiles so you can access the canvas.

@emilhe
Copy link
Author

emilhe commented Oct 4, 2020

Thanks for the example, @dionhaefner ! It looks really cool. In terms of raw performance, it's definitly better than my approach, and it's also great to see an approach that's indepedent of data pre processing. However, i also see a few drawbacks,

  1. You can only get the "real" value if there is a 1:1 mapping between color and value. Hence, in most cases you'll probably loose some accuracy (since the mapping is not strictly 1:1), while in other cases (such as cyclic colormaps) you get multiple possible values, i.e. it won't work.

  2. You can't get the value of off-screen data.

While (1) is only a problem if high accuracy is needed (or special colorscales are used), (2) is a deal breaker for the use cases i am typically facing. As an example, consider an application that displays metereological data. The user will be able to select a single parameter, say wind speed, for visualization on the map. When a point is clicked, information on other paramers (e.g. air density, pressure, ...), should appear. Do you see any way your approach could be adopted to this kind of use case?

@dionhaefner
Copy link
Collaborator

dionhaefner commented Oct 5, 2020

  1. Yes, this only works in 8 bit color space, so if your raster is in 16 or 32 bit you will lose information (that you can regain by re-scaling, though). Regarding cyclic maps, we don't have any build-in colormaps that are not injective so that shouldn't be a huge concern.
  2. It is possible to solve, but a bit more awkward indeed. The simplest solution would be: Find out which tile your off-screen data belongs to, request that tile, extract the information, throw the tile away.

@dionhaefner
Copy link
Collaborator

dionhaefner commented Oct 5, 2020

Another drawback is that, due to resampling of overviews and interpolation, you can get values that are not present or not possible in the original raster, since it is truly WYSIWYG.

I'm not saying this is the right approach in every case, just putting it out there as the first thing I would try in this situation :)

@emilhe
Copy link
Author

emilhe commented Oct 5, 2020

  1. Ah, i didn't know that. But i guess that it could still be an issue for custom color maps. That being said, I agree that in most cases, it's not really an issue.
  2. That was also the only approach that i could think off at the top of my head. But since it requires sending the whole tile from the server to the client, this approach will be much slower than what i am currently doing.

I think that "the best approach" depends on the use case at hand :). In general, if you only need access to on-screen data, the client side color-to-value approach seems to be the most elegant (and performant). On the other hand, the server side approach seems more suitable when off-screen data are needed.

@dionhaefner
Copy link
Collaborator

Regarding performance: Did you do any benchmarks how long it takes for your custom endpoint to return a value? (just curious)

@j08lue
Copy link
Collaborator

j08lue commented Oct 5, 2020

Side note on this already pretty long thread - there is talk going on about data tiles where you pull the binary data into the client... Just FYI.

@emilhe
Copy link
Author

emilhe commented Oct 5, 2020

Yes, i did a simple benchmark requesting a few thousand random (to avoid caching) points. With my current implementation, the mean request time is around 2 ms on my work station, but the speed will of course vary depending on HW and connection speeds.

@j08lue - That looks really cool! But won't it be a problem for large data sets (i assume what the data must be held in the browser)? The reason i started using Terracotta was that my data files could be in the ~ GB range (covering the globe), which was not handled well by the browser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants