Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapnik does not use overviews in the same way gdal does #3966

Open
cpaulik opened this issue Aug 7, 2018 · 24 comments · May be fixed by #3975
Open

Mapnik does not use overviews in the same way gdal does #3966

cpaulik opened this issue Aug 7, 2018 · 24 comments · May be fixed by #3975

Comments

@cpaulik
Copy link
Contributor

cpaulik commented Aug 7, 2018

As discussed in #3960 (comment) I create a separate issue.

Current mapnik versions have a special logic to select overviews from the gdal dataset. This results in problems when trying to use already existing and tested gdal datasets with mapnik.

When I use the gdal plugin of mapnik I would expect it to behave like gdal.

Maybe the mapnik logic that selects overviews in a special way should go into a separate plugin or maybe the Raster plugin? Or maybe it can be a setting in the plugin?

My original use case is the following:

I have a sparse dataset that is updated for a region based on user demand. So I create 5x5 degree VRT files with overview levels 4, 8, 16 etc. and then I create a global vrt file with the higher level overviews (32, 64). So If I update a 5x5 degree region I only have to update the higher level overviews which is much less expensive.

It seems to me that current mapnik versions do not use the overviews of the 5x5 degree VRT files. Older mapnik versions did that without problems. I can provide an example if the issue is unclear.

@talaj
Copy link
Member

talaj commented Aug 7, 2018

After looking into GDAL code the overview logic seems similar to that in Mapnik. I was thinking there needs to be some recursion to go through nested datasets, but I didn't find such.

It would really help if you can make a simple test case which I can run locally.

I agree that the way how Mapnik GDAL plugin handles overviews is not how some users would expect.

@cpaulik
Copy link
Contributor Author

cpaulik commented Aug 7, 2018

I've created an example. You can find it at https://gist.github.com/cpaulik/34853bebc07174edc78233be51fc8bb6

@talaj
Copy link
Member

talaj commented Aug 8, 2018

Thanks for the test case, I can reproduce it.

Mapnik can see only one overview, as well as gdalinfo. Other overviews are hidden behind abstraction of another data set.

$ gdalinfo global.vrt
Driver: VRT/Virtual Raster
Files: global.vrt
       global.vrt.ovr
       5x5.vrt
Size is 6000, 6000
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
Origin = (-10.000000000000000,40.000000000000000)
Pixel Size = (0.001000000000000,-0.001000000000000)
Corner Coordinates:
Upper Left  ( -10.0000000,  40.0000000) ( 10d 0' 0.00"W, 40d 0' 0.00"N)
Lower Left  ( -10.0000000,  34.0000000) ( 10d 0' 0.00"W, 34d 0' 0.00"N)
Upper Right (  -4.0000000,  40.0000000) (  4d 0' 0.00"W, 40d 0' 0.00"N)
Lower Right (  -4.0000000,  34.0000000) (  4d 0' 0.00"W, 34d 0' 0.00"N)
Center      (  -7.0000000,  37.0000000) (  7d 0' 0.00"W, 37d 0' 0.00"N)
Band 1 Block=128x128 Type=Float64, ColorInterp=Gray
  Overviews: 750x750

Mapnik loops through all accessible overviews (just one in this case). Finds no suitable overview and instead of requesting raster in resolution of the map (256 x 256), requests full resolution of the raster for given extent.

The following python snippet is equivalent to the RasterIO called from Mapnik:

from osgeo import gdal
ds = gdal.Open('global.vrt')
band = ds.GetRasterBand(1)
data = band.ReadRaster(xoff=0, yoff=0,
                       xsize=257*4, ysize=257*4,
                       buf_xsize=257*4, buf_ysize=257*4,
                       buf_type=gdal.GDT_Float64)

Just a note: Mapnik adds 4 pixels margin, which is the reason why there is 257 * 4 instead of 256 * 4.

This is really bad for your use case. What options do we have? Make this behaviour optional?

What do you think, @flippmoke ?

@flippmoke
Copy link
Member

I would have to look at the code for why there is a margin like this but my assumption would be that it is a slightly bigger buffer so that we can do better resampling if necessary on other code paths. I am not certain here at all though, it could simply be a bug. Do we know what line of code this comes from?

@talaj
Copy link
Member

talaj commented Aug 8, 2018

@flippmoke No, no, the margin is not the problem here at all. Margin is correct as it is. I just tried to explain those values.

The problem here is that Mapnik does not use overviews, instead requesting raster 16 times bigger than needed. There is no way how Mapnik can do that with current implementation as those overviews are hidden in nested VRT. But GDAL itself can use those overviews, the old implementation of Mapnik GDAL plugin as well.

@flippmoke
Copy link
Member

Okay sorry, I was slightly confused here...

I put together a PR that might address this and gives me at least something visual to represent the problem. I have not tested it at all against this testcase and we likely should add a test case here.

Finally, we might want to consider if it is possible to make an endless loop here? Can you have a VRT overview that points to a VRT? This could be a security concern if that is the case.

@cpaulik
Copy link
Contributor Author

cpaulik commented Aug 9, 2018

@flippmoke I doubt that your PR addresses this problem since I think it is not possible to get at the nested overviews using a GetOverview call on the first dataset. It is that by using the correct size and buffer combination in RasterIO the nested gdal drivers will use overviews of all levels as appropriate.

I've heard that mapnik wants to do it's own resampling in some cases and this is the reason for implementing all this logic in mapnik. But as I said above. I think most users would expect the gdal plugin to work like gdal would.

@talaj
Copy link
Member

talaj commented Aug 9, 2018

Thanks for the pull request, @flippmoke, but it doesn't work, unfortunately. As @cpaulik suggests, we should not to try to outsmart GDAL and rather call RasterIO with expected parameters.

The current logic doesn't solve memory issues nor it solves resampling as overviews are already resampled out of Mapnik's control.

@flippmoke
Copy link
Member

I am fine with doing resampling via GDAL, we will just need to have some sort of conversion between the different expected resampling types so that the correct one is chosen. Additionally, I am not sure how in the later steps if there are any concerns.

@rouault do you have any suggestions here?

@rouault
Copy link
Contributor

rouault commented Aug 9, 2018

@rouault do you have any suggestions here?

This is a bit vague as a question ;-) You can specify a resampling method to the extra argument of GDALRasterIOEx()

@cpaulik
Copy link
Contributor Author

cpaulik commented Aug 9, 2018

@flippmoke The resampling type is chosen when the overview is created so you have not choice there on mapniks side if I understand correctly.

If there are no overviews then it is a different story of course. But how would a mapnik user choose a resampling method? I don't see any parameter listed in the wiki https://github.com/mapnik/mapnik/wiki/GDAL

EDIT: Ah sorry, it seems to be in the RasterSymbolyzer https://github.com/mapnik/mapnik/wiki/RasterSymbolizer

@flippmoke
Copy link
Member

@rouault is there a way to access an overview in a TIF from an overview in a VRT? Should we just trust GDAL in this case? Would it be more wise for us to simply have GDAL do all the band selection and resampling?

@rouault
Copy link
Contributor

rouault commented Aug 9, 2018

is there a way to access an overview in a TIF from an overview in a VRT?

Not directly since VRT abstracts away those details, but if you issue a downsampling RasterIO() request on the VRT, it will forward it to the TIF, and the TIF will then use its overviews when appropraite.

@cpaulik
Copy link
Contributor Author

cpaulik commented Aug 13, 2018

@flippmoke I've only found the three resampling types in the RasterSymbolizer documentation. Are those the ones we have to map to gdal resampling methods?

mapnik gdal
fast nearest
bilinear billinear
bilinear8 billinear ?

And we should probably add some kind of warning in the docs that this setting will be ignored if overviews exist.

@flippmoke
Copy link
Member

@cpaulik
Copy link
Contributor Author

cpaulik commented Aug 13, 2018

That makes it more interesting 😄

mapnik gdal
SCALING_NEAR nearest
SCALING_BILINEAR billinear
SCALING_BICUBIC cubic
SCALING_SPLINE16 cubicspline
SCALING_SPLINE36 cubicspline
SCALING_HANNING -
SCALING_HAMMING -
SCALING_HERMITE -
SCALING_KAISER -
SCALING_QUADRIC -
SCALING_CATROM -
SCALING_GAUSSIAN gaussian
SCALING_BESSEL -
SCALING_MITCHELL -
SCALING_SINC -
SCALING_LANCZOS lanczos
SCALING_BLACKMAN -

That would be a first mapping without looking up all the resampling methods in detail.

Would you be ok in saying the resampling methods from mapnik that are not supported by gdal are not supported by the gdal plugin? Or would you prefer a different data access logic based on chosen resampling method?

What I mean is that if e.g. SCALING_BESSEL is chosen in mapnik then mapnik will request the full resolution dataset and then resample itself but if e.g. SCALING_NEAR is chosen then mapnik would call RasterIO with the final buffer size so that gdal resamples / or uses overviews if present.

@flippmoke
Copy link
Member

@cpaulik just some history

The chart above is part of the reason we have not done resampling in GDAL in the past the expected result if you are using Mapnik resampling can be very different then GDAL. Therefore the choice was made to try to resample using Mapnik where possible and then search through the overviews to find the best overview to select the source data from for resampling. Part of the goal has been for the raster plugin and the GDAL plugin to reproduce the same results if possible.

I am worried that in solving one problem we could be creating another and do something that is another big shift in the GDAL plugin/mapnik. Should we do this?

@cpaulik
Copy link
Contributor Author

cpaulik commented Aug 14, 2018

@flippmoke Thank you for the background.

My thoughts on this are that the current solution does not solve the problem. If any resampling method is selected that does not produce the same result as gdal resampling then mapnik will produce inconsistent results based on zoom level. Especially if overviews are present. In addition it is very hard to find out why mapnik does not work as expected with a gdal datasource. I've spend several hours figuring out why mapnik 3 does not work properly with our VRT files.

There are several approaches to this problem that I can think of that would make it clearer from a user perspective:

  1. The gdal plugin does exactly what gdal would do. So the users can test with gdal, or any other library/application that uses gdal and then switch to mapnik without a lot of surprises.
  2. The gdal plugin is smart enough to use the gdal RasterIO differently based on the selected settings. So if a resampling method is selected that is not supported by gdal then the resampling is done in mapnik and overviews are ignored.
  3. A combination of 1 and 2 where the user can select a e.g. pure-gdal mode or a mapnik mode.

@lightmare
Copy link
Contributor

Part of the goal has been for the raster plugin and the GDAL plugin to reproduce the same results if possible.

I think it'd be better for the GDAL plugin to delegate as much as possible to libgdal. It will be less surprising for users if it works similar to other gdal tools, rather than an unrelated raster plugin.

If a resampling filter unsupported by GDAL is selected, the plugin should issue a warning and select a supported filter with similar characteristics (e.g. when a Sinc-family filter is selected, fallback to Lanczos).

If there is actual demand for resampling a GDAL datasource on mapnik side, it should be opt-in via parameters. And then perhaps another parameter could provide information about nested overviews.

@cpaulik
Copy link
Contributor Author

cpaulik commented Aug 15, 2018

If a resampling filter unsupported by GDAL is selected, the plugin should issue a warning and select a supported filter with similar characteristics (e.g. when a Sinc-family filter is selected, fallback to Lanczos).

That's a good compromise solution and would work beautifully for our use case.

If I can be any help in implementing this please tell me. I don't know the mapnik code base very well so I don't feel comfortable with making a PR for this at the moment.

@flippmoke
Copy link
Member

I don't know the mapnik code base very well so I don't feel comfortable with making a PR for this at the moment.

@cpaulik updating PRs until it works is how most everyone in this ticket learned the mapnik code base 😄

@cpaulik
Copy link
Contributor Author

cpaulik commented Aug 15, 2018

@flippmoke Ok, If you are patient with me then I can try.

@cpaulik
Copy link
Contributor Author

cpaulik commented Aug 22, 2018

I was trying to get started and I'm curious to know if there is some documentation about the mapnik internals and how the different classes work with each other. There also does not seem to be any documentation in the code about what all the parameters mean. Am I missing something or does it really not exist?

@talaj
Copy link
Member

talaj commented Aug 27, 2018

Am I missing something or does it really not exist?

Unfortunately there really is no documentation about Mapnik's internals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants