Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Possibility of Effectively Limiting Data Query Results When Using Time Dimensions #6952

Open
MathewNWSH opened this issue Oct 19, 2023 · 21 comments

Comments

@MathewNWSH
Copy link

Expected behavior and actual behavior.

This ticket is based on my experience described in the mailing archive: https://marc.info/?t=169770835500001&r=1&w=2

To summarize, I've discovered that it's not possible to limit the DATA query result when the TIME dimension is enabled. The layer data source is constrained when opening the data source, and by using the LIMIT clause as described here: https://mapserver.org/input/vector/postgis.html#data-access-connection-method, it limits the source data at the beginning. Subsequent filtration based on dimensions only operates on this already limited dataset.

I can think of two potential solutions:

  1. Avoid using the TIME dimension and, instead, use two other dimensions: start and stop. Then, include them in the DATA query like this:
DATA "geom from (select * from plua_overview WHERE timestamp >= '%start%' AND timestamp <=  '%stop%') subquerry using unique unique_id"

However, this approach might limit the types of requests you can use, as mentioned here:
image

  1. Another solution is to create a proxy, but I believe this problem can be more easily resolved for all useres on the MapServer side. You could utilize the existing warper, as mentioned in the point above.

Attach simple test case (drag/drop it here)

Mapfile example:

 DEBUG 5
 STATUS OFF
 NAME "time_idx"
 TYPE POLYGON

 CONNECTIONTYPE postgis
 CONNECTION "***"

 DATA 'geometry from (select * from mrc order by maxcc desc limit 10) as subquery using unique unique_id'

 PROJECTION
   "init=epsg:3857"
 END

 VALIDATION
    'maxCC' '^[0-9]{1,3}$'
    'default_maxCC' '100'
 END

 METADATA
   "wms_title" "tile-index-cloud"
   "wms_timeextent" "2022-02-01/2023-10-10/P1D"
   "wms_timeitem" "timestamp"
   "wms_timedefault" "2023-10-10"
   "wms_enable_request" "!*"
 END
END

example request:

&TIME=2023-08-01/2023-08-31&SERVICE=WMS&VERSION=1.3.0&REQUEST=GetMap&BBOX=1559074.156322684372,6519475.858540875837,1573681.021033628611,6535087.428231937811&CRS=EPSG:3857&WIDTH=756&HEIGHT=808&LAYERS=Cloudless%20Mozaik&STYLES=&FORMAT=image/png&DPI=157&MAP_RESOLUTION=157&FORMAT_OPTIONS=dpi:157&TRANSPARENT=TRUE

And mapserver LOG which I get after receiving NoData layer:

[Thu Oct 19 10:53:43 2023].184724 CGI Request 1 on process 4107971
[Thu Oct 19 10:53:43 2023].186013 msDrawMap(): rendering using outputformat named png (AGG/PNG).
[Thu Oct 19 10:53:43 2023].186021 msDrawMap(): WMS/WFS set-up and query, 0.000s
[Thu Oct 19 10:53:43 2023].186030 msDrawRasterLayerLow(S2 Masking_02): entering.
[Thu Oct 19 10:53:43 2023].187613 msPostGISLayerOpen called: geometry from (select * from mrc order by maxcc desc limit 10) as subquerry using unique unique_id
[Thu Oct 19 10:53:43 2023].187619 msPostGISLayerOpen: No connection in pool, creating a fresh one.
[Thu Oct 19 10:53:43 2023].205466 msConnPoolRegister(***)
[Thu Oct 19 10:53:43 2023].211101 msPostGISLayerOpen: Got PostGIS version 30300.
[Thu Oct 19 10:53:43 2023].211106 msPostGISLayerOpen: Forcing 2D geometries: no.
[Thu Oct 19 10:53:43 2023].211110 msPostGISLayerFreeItemInfo called.
[Thu Oct 19 10:53:43 2023].211132 msPostGISLayerInitItemInfo called.
[Thu Oct 19 10:53:43 2023].217863 msPostGISParseData called.
[Thu Oct 19 10:53:43 2023].217874 msPostGISParseData: unique_column=unique_id, srid=, geom_column_name=geometry, table_name=(select * from mrc order by maxcc desc limit 10) as subquerry
[Thu Oct 19 10:53:43 2023].217878 msPostGISLayerTranslateFilter. String: (`[timestamp]` >= `2023-08-01` AND `[timestamp]` <= `2023-08-31`).
[Thu Oct 19 10:53:43 2023].217880 msPostGISLayerTranslateFilter. There are tokens to process... 
[Thu Oct 19 10:53:43 2023].218287 msPostGISLayerWhichShapes called.
[Thu Oct 19 10:53:43 2023].218290 msPostGISParseData called.
[Thu Oct 19 10:53:43 2023].218294 msPostGISParseData: unique_column=unique_id, srid=, geom_column_name=geometry, table_name=(select * from mrc order by maxcc desc limit 10) as subquerry
[Thu Oct 19 10:53:43 2023].218297 msPostGISBuildSQL called.
[Thu Oct 19 10:53:43 2023].218299 msPostGISBuildSQLItems called.
[Thu Oct 19 10:53:43 2023].218302 msPostGISBuildSQLItems: 3 items requested.
[Thu Oct 19 10:53:43 2023].218305 msPostGISBuildSQLFrom called.
[Thu Oct 19 10:53:43 2023].218308 msPostGISBuildSQLWhere called.
[Thu Oct 19 10:53:43 2023].218310 msPostGISBuildSQLSRID called.
[Thu Oct 19 10:53:43 2023].218312 msPostGISBuildSQLSRID: Building find_srid line.
[Thu Oct 19 10:53:43 2023].218315 msPostGISBuildSQLBox called.
[Thu Oct 19 10:53:43 2023].218327 msPostGISLayerWhichShapes query: select "timestamp"::text,"cloudless_60"::text,"epsg"::text,ST_AsBinary(("geometry"),'NDR') as geom,"unique_id"::text from (select * from mrc order by maxcc desc limit 10) as subquerry where "geometry" && ST_GeomFromText('POLYGON((2791286.85068837 5622573.79066471,2791286.85068837 5638166.03910615,2805874.3941497 5638166.03910615,2805874.3941497 5622573.79066471,2791286.85068837 5622573.79066471))',find_srid('','mrc','geometry')) and (("timestamp" >= date_trunc('day',date '2023-08-01') and "timestamp" <= (date_trunc('day',date '2023-08-31') + interval '1 day' - interval '1 second')))
[Thu Oct 19 10:53:43 2023].262344 msPostGISLayerWhichShapes query status: PGRES_TUPLES_OK (2)
[Thu Oct 19 10:53:43 2023].262350 msPostGISLayerWhichShapes got 0 records in result.
[Thu Oct 19 10:53:43 2023].262353 msPostGISLayerNextShape called.
[Thu Oct 19 10:53:43 2023].262355 msPostGISLayerFreeItemInfo called.
[Thu Oct 19 10:53:43 2023].262358 msPostGISLayerClose called: geometry from (select * from mrc order by maxcc desc limit 10) as subquerry using unique unique_id
[Thu Oct 19 10:53:43 2023].262361 msConnPoolRelease(***)
[Thu Oct 19 10:53:43 2023].262364 msConnPoolClose(***)
[Thu Oct 19 10:53:43 2023].262408 msDrawMap(): Layer 3 (S2 Masking_02), 0.076s
[Thu Oct 19 10:53:43 2023].262414 msDrawMap(): Drawing Label Cache, 0.000s
[Thu Oct 19 10:53:43 2023].262417 msDrawMap() total time: 0.077s
[Thu Oct 19 10:53:43 2023].273102 msSaveImage(stdout) total time: 0.011s
[Thu Oct 19 10:53:43 2023].274001 mapserv request processing time (msLoadMap not incl.): 0.089s
[Thu Oct 19 10:53:43 2023].274007 msFreeMap(): freeing map at 0x560808b493d0.
[Thu Oct 19 10:53:43 2023].274029 freeLayer(): freeing layer at 0x560808e75570.
[Thu Oct 19 10:53:43 2023].274032 msPostGISLayerIsOpen called.
[Thu Oct 19 10:53:43 2023].274041 freeLayer(): freeing layer at 0x560808e8b2c0.
[Thu Oct 19 10:53:43 2023].274044 msPostGISLayerIsOpen called.

Operating system

Ubuntu 20.04 LTS

MapServer version and installation method

Mapserver 7.6.4 and Mapserver 8

@MathewNWSH
Copy link
Author

I think that creating new class called LIMIT within Layer would be a good idea. Then users will be able to define maximal number of rendered features within layer based on data after filtrations.

@geographika
Copy link
Member

Does setting MAXFEATURES to 10 give you the result you want?

@MathewNWSH
Copy link
Author

MathewNWSH commented Oct 20, 2023

No, I believe that MAXFEATURES defines the number of images to be drawn, so if I will have my data i descending cloud cover order then on top of my service I will have the Tenth worst image. If I will have my data i ascending cloud cover order then the one on top would be the Tenth best image.

t. In my case, as you can see:

DATA 'geometry from (select * from mrc order by maxcc desc) as subquery using unique unique_id'

I'm selecting all of the images, but I only need to render the top ten, meaning the last ten images in the order, with the one on top being the best.

@geographika
Copy link
Member

It seems similar to an issue I had in #5008. To resolve it an ORDER BY was added to the outer query to guarantee order of the filtered data. This in combination with MAXFEATURES might work. The fix was only for the SQL Server driver, AFAIK this does not exist in the Postgres one: e653b9b

@MathewNWSH
Copy link
Author

As I wrote, the "order by" in its current form returns the column in the following order:

maxcc
90%
90%
90%
90%
90%
90%
90%
85%
75%
65%
50%
10%
0%

Let's assume I need the last 5, which are:

75%
65%
50%
10%
0%

And they must be in the order so that the image with 0% cloud cover is at the top.

In the example you mentioned, the column is sorted by the column name. Let's assume that my column will be ordered in the following way:

Descending, just like above. In this case, MAXFEATURES 5 will return:

90%
90%
90%
90%
90%

Ascending. In this case, MAXFEATURES 5 will return:

0%
10%
50%
65%
75%

So in both situations, the best image will not be returned at the very top.
Am I right?

@jratike80
Copy link

Is your use case somewhat similar than the one in #5899?

Would that serve also your needs? Start filling pixels of the output image starting from the best source image within the time window until all the pixels are filled. Then stop.

@geographika
Copy link
Member

You currently have your ORDER BY within the subquery. The SQL Server driver allows this to be moved to the outer query, so records are ordered after the time filter has been applied and not before. In this case you'd have filtered out the 90% values and then records would be ordered as follows:

75%
65%
50%
10%
0%

@MathewNWSH
Copy link
Author

@geographika
Okay but how to add ORDER BY to outer query? From what I understand config keyword SORTBY mentioned here #5041 (comment) is not implemented?

I guess for now this is how it should be done? #5008 (comment)

@MathewNWSH
Copy link
Author

@jratike80
Yup saw this one before.
But from what I understand it is based on timestamp parameter passed to DATA query, but this parameter is not TIME dimension, so in fact I will not be able to use WMS-T full potential?
So how to let Mapserver know which image is the best one, and where is NoData to be filled by second best? -- Really don't get this part, please explain if possible :)

@jratike80
Copy link

It is a feature request, not implemented. If it was implemented it would work with any sorted tileindex features. User decides what is the best just like you want to do by sorting the images with ORDER BY maxcc DESC. The difference would be that there would be no need to use LIMIT for shortening the list. Mapserver would take all pixels which are not nodata from 0% and then automatically try to fill the holes from 10%, 50% and so on.

Because this is not implemented we are not using the native TIME support of Mapserver. We have additional time attributes in the tile index. The WMS GetMaps from standard WMS-T clients contain the &TIME= parameter but we modify the request on-the-fly with a frontal service. &TIME= is dropped and new query parameters &START= and &END= are created. In the mapfile the DATA is something like this

DATA "SELECT path||location as location, crs, resolution, time, GEOMETRY FROM ortokuva where (end_year is null or end_year>=%END%) and (start_year>=%START% and start_year<=%END%) order by time,resolution desc"

Order by is still needed because there may be several images from the same time and then we want to render the one with smallest pixel size last. Isn't that the same use case that you have? We just prefer small pixels but you would like to prefer low cloud coverage.

@MathewNWSH
Copy link
Author

This is practically the same situation.

Okay, now I fully understand the process of selecting images for rendering. You work around the lack of implementation of this functionality by using a time parameter that is not the TIME dimension. But from my experience with MapServer, the last rendered image is the one on top. I understand how to make MapServer recognize the best image, but how do you make it recognize NoData and use the second best image to fill in the gaps and also do not overlap actual data from the first/best image? Or Is it also an unimplemented feature, and just an idea?

@geographika
Copy link
Member

@geographika Okay but how to add ORDER BY to outer query? From what I understand config keyword SORTBY mentioned here #5041 (comment) is not implemented?

I guess for now this is how it should be done? #5008 (comment)

Correct - the ORDER BY isn't as yet implemented in the PostGIS driver, It would need similar changes to e653b9b added.

@jratike80
Copy link

jratike80 commented Oct 20, 2023

But from my experience with MapServer, the last rendered image is the one on top.

My experience since this https://lists.osgeo.org/pipermail/mapserver-users/2008-April/055317.html is the opposite. Mapserver takes the first image that it gets from the tileindex (the one on top) renders it, takes the next image, renders that above the previous etc. What remain on the top of the final rendered image is the one that was on the bottom of the image list from the tileindex. order by time shows the newest image, not the oldest.

You can see Mapserver in action here https://kartta.paikkatietoikkuna.fi/?zoomLevel=8&coord=433861.1118164062_6947879.803305684&mapLayers=801+100+default,3400+100+ortokuva:indeksi&timeseries=1950&uuid=90246d84-3958-fd8c-cb2c-2510cccca1d3&noSavedState=true&showIntro=false

I can see from your debug info above that when time dimension is used with WMS-T Mapserver generates a query like this
...and (("timestamp" >= date_trunc('day',date '2023-08-01') and "timestamp" <= (date_trunc('day',date '2023-08-31') + interval '1 day' - interval '1 second')))

However, Mapserver does not sort the selected tileindex rows. I think it should do ORDER BY timestamp because now what gets rendered into the final image when the time range is large depends on the native order of entries in the data source.

@MathewNWSH
Copy link
Author

Yes I agree thank you for the clarification. The query I should aim for should contain information on two parameters: start and stop. This would select the rasters acquired in the period I'm interested in (so I don't know if ORDER BY timestamp is necessary for this), and then I would need an order BY cloud cover desc/asc so that the raster with the least cloud cover is at the top.

Right?

But there is one thing that still puzzles me. Is it possible for Mapserver to know when it can stop rendering tiles? Let's assume that I have such a raster:
image

How to make the mapserver generate data only where there is NoData in first raster, and when all the pixels are full (based on the rasters in the right order using order by) it will stop rendering, even if there were other rasters (because all of the pixels for requested extent are already associated with values)?

@jratike80
Copy link

jratike80 commented Oct 20, 2023

There are two separate things: how Mapserver behave now and how it could behave for being more efficient. The current way is to use the painters algorithm.

so that the raster with the least cloud cover is at the top.

The pixels from this raster should go to the top of the resulting raster which means that by the painters method it must be rendered last, and that means that the image must appear last in the list that is returned by the "select" from the tileindex.

stop rendering, even if there were other rasters

It is not possible now. Painters algorithm keeps on rendering till the end of the image list. What Mapserver does is

  • Render the first image from the tileindex stack; nodata remains nodata in the result
  • Render the second image from the stack; pixels which are not nodata in image 2 are overwriting all the existing pixels in the rendered interim raster. Now the result contains pixels from image 2 where image 2 has valid data. In places which are nodata in image 2 the pixels originate from image 1. If the pixel was nodata in both image 1 and image 2 then the result has also nodata there.
  • Render the 3rd image on top of the rendered image etc.
  • Render the last image in the stack.

The painters algorithm is making a good end result: every pixel comes from the most valuable source image but it is heavy and slow if there are many overlapping images in the source image list that is selected from the tileindex.

@jratike80
Copy link

For comparison, Geoserver has an option named ExcessGranuleRemoval for avoiding unnecessary rendering. I found documentation from https://docs.geoserver.geo-solutions.it/edu/en/raster_data/mosaic_pyramid.html. The system filters out images (granules in Geoserver language) based on their footprints. Documentation about how Geoserver handles the footprints in https://docs.geoserver.geo-solutions.it/edu/en/raster_data/imagemosaic_footprint.html#geoserver-imagemosaic-footprint.

@MathewNWSH
Copy link
Author

@geographika Is there any possibility of implementing the LIMIT class in the near future?

@MathewNWSH
Copy link
Author

For comparison, Geoserver has an option named ExcessGranuleRemoval for avoiding unnecessary rendering. I found documentation from https://docs.geoserver.geo-solutions.it/edu/en/raster_data/mosaic_pyramid.html. The system filters out images (granules in Geoserver language) based on their footprints. Documentation about how Geoserver handles the footprints in https://docs.geoserver.geo-solutions.it/edu/en/raster_data/imagemosaic_footprint.html#geoserver-imagemosaic-footprint.

The possibility of stopping rendering data when every pixel is covered seems like an amazing performance improvement opportunity. I'm rooting for this to happen someday!

@geographika
Copy link
Member

@geographika Is there any possibility of implementing the LIMIT class in the near future?

@MathewNWSH - I'm not currently a user of the PostGIS drive so no plans on implementing this in the near future. You could try making the modifications and submitting a pull request, or sponsoring adding this feature to MapServer.

@sdlime
Copy link
Member

sdlime commented Nov 1, 2023

Is the algorithm associated with the solution well known?

@j-musial
Copy link

j-musial commented Nov 2, 2023

Would it be possible to add MAXFEATURES_FIRST MAXFEATURES_LAST along with the MAXFEATURES keyword in order to select first/last N features? This would resolve the problem mentioned by @MathewNWSH

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants