Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MapService::LoadMissingTileData() produces more data than expected #1506

Open
frank-aurich opened this issue Sep 22, 2023 · 7 comments
Open
Labels
database For issues regarding the core database performance For issues regarding performance question For issues of type 'question'

Comments

@frank-aurich
Copy link

Hi,

I'm trying to use libosmscout's MapService to fetch map data (specifically, ways) in a tile-based manner.
But, results show that the method not only produces features within the bounds of the requested tile, but also outside of it.
In fact, there are way more features returned outside the bounds of the requested tile than inside.

I used this OSM dataset: http://download.geofabrik.de/europe/germany/sachsen-latest.osm.pbf
and converted it with the Import tool using the default settings (i.e. the maps.ost from the repository).

Simplified code example:

const osmscout::Magnification magnification(osmscout::MagnificationLevel(14));
tiles.push_back(osmscout::TileKey(magnification, osmscout::TileId(8814, 12838)));
tiles.push_back(osmscout::TileKey(magnification, osmscout::TileId(8814, 12839)));
osmscout::MapService::TypeDefinition types;
m_styleConfig->GetWayTypesWithMaxMag(magnification, types.wayTypes);

auto data = std::make_shared<osmscout::MapData>();
auto tile = m_mapService->LookupTile(tileId);
m_mapService->LoadMissingTileData(searchParameter, magnification, types, tiles);
m_mapService->AddTileDataToMapData(tiles, *data);
[...]

When I iterate over the resulting ways, and skip all ways that start and/or end in any of the requested tiles, I get this image:
(the red bounding box shows the requested tiles)
image

Now, there are a few ways that neither start nor end in the requested tiles, but go through them.
But why are all the other features there, kilometers aways from the requested region?

Is this by design? Or can it somehow be configured to reduce the amount of produced ways?

@Framstag
Copy link
Owner

See AreaWayIndex and AreaIndex. Libosmscout uses bitmap based index per type. That means, that for each type there is a flat tile-based index. Each tile has a list of containing objects. The size of a tile depends on the actual type, especially distribution. The "right" magnification is decided based on AreaIndexGenerator::FitsIndexCriteria(). Type with many object instances should have smaller tiles than type with less data.

Idea is to optimize the size of the index on disk with too much performance loss. The mechanism could be influenced to get smaller tiles with less data overhead but bigger size on disk, though currently values seem to be hard coded.

So, indexes are not perfect, it is expected that you get possibly more data than selected.

The painter should filter them out, if the corresponding parameter is set (which the index cannot, since it does not hold the actual data and thus has no boundary to filter against).

@Framstag Framstag added question For issues of type 'question' database For issues regarding the core database performance For issues regarding performance labels Sep 22, 2023
@frank-aurich
Copy link
Author

it is expected that you get possibly more data than selected.

But I'm getting WAY MORE data then selected. That doesn't seem right?

The painter should filter them out, if the corresponding parameter is set

What parameter is that, and how does it do the filtering?

@Framstag
Copy link
Owner

it is expected that you get possibly more data than selected.

But I'm getting WAY MORE data then selected. That doesn't seem right?

The painter should filter them out, if the corresponding parameter is set

What parameter is that, and how does it do the filtering?

Checked the code. There is no such parameter. The code in MapPainter.cpp checks for "visible in bounding box" using the local IsVisibleWay() and IsVisibleArea() methods.

@Framstag
Copy link
Owner

it is expected that you get possibly more data than selected.

But I'm getting WAY MORE data then selected. That doesn't seem right?

You can exchange the given code in AreaIndexGenerator.h starting at line 180 witht he following code:

    GeoBox boundingBox=typeData.tileBox.GetCenter().GetBoundingBox(typeData.indexLevel);

    progress.Info("Writing map for "+
                  typeInfo.GetName()+" , "+
                  ByteSizeToString(1.0*dataOffsetBytes*typeData.tileBox.GetCount()+dataSize)+" ("+
                  GetEllipsoidalDistance(boundingBox.GetTopLeft(),boundingBox.GetBottomRight()).AsString()+")");

This will dump the diagonal in Km of the center tile of the given index for the given type.

On my system it returns 15,757647 km for index level 11, 7,881098 km for level 12 and 3,939981 km for level 12. Yes these are huge areas and thus may return some data. Heuristics should make sure that you do not get too much data in relation to the limits (per tile).

By changing the following params:

                     parameter.GetAreaRouteMinMag(),
                     parameter.GetAreaRouteIndexMaxLevel(),

You should be able to enforce higher starting level and thus smaller tiles. You should get less overhead data, but it would be interesting if database decreases or increased and what influence on performance you will get.

@Framstag
Copy link
Owner

Corresponding changes in code improved and committed

Framstag added a commit that referenced this issue Oct 3, 2023
Improve code and indexing strategy
Framstag added a commit that referenced this issue Oct 3, 2023
@frank-aurich
Copy link
Author

I can see an improvement with the changes done to the import process:
Before:
image
After:
image

Feature (way) count was reduced from ~2800 to ~1800, which is definitely better than before.

I still find it curious that there are tiny ways of lower class that are sprinkled all over the map, and are far away from the requested tile.

I'll see how I can work with this.

@Framstag
Copy link
Owner

Framstag commented Oct 5, 2023

As I explained. The code looks for each type that is active in the style sheet in the index. Depending on the type, its data count, the index tile may be differently sized...and thus may contain more or less data. And it at least load one tile but depending on the bonding box even multiple tiles.

It would be interesting to know which types have a high count.

You can check the output of the index generator yourself (step 16 for example):

highway_footway 0 | 2919 < 1236 < 136 *121* - 4412
   WW highway_footway has more than 1% cells with much too high entry count, will use smaller tile size

There are 0 tiles that are empty
There are 2919 tiles, that have a low fill count
1236 has an "ok" count
136 have a higher than requested fill count
121 have a much higher fill count
In sum we have 4412 tiles

As you can see, distribution is not even. Nevertheless, for ways we still use a bitmap count, that ties the most effective tile size. An alternative would be a tree-based index - however these are normally bigger than more expensive. In the perfect world, one would analyze distribution and would decide differently depending on the type. However, this is not implemented.

The sprinkle effect is created because likely these values still are contained din one big or a few big tiles.

If you make TypeData.tileBox accessible from AreaIndex.h during rendering, you should be able to render the various tiles of each type.

My changes resulted in reducing the limit for tile size (now smaller tiles are allowed) and in a slightly improved distribution handling. For further optimization, you likely have to further increase the magnification limit. See

ImportParameter.cpp

      areaWayIndexMinMag(10), // Should not be >= than areaWayIndexMaxMag
      areaWayIndexMaxMag(15),

If this does not help enough, we would have to change the "good distribution" criteria in AreaIndexGenerator,h:

    size_t tooLowValue=4*average/10;
    size_t tooHighValue=64+32;
    size_t muchTooHighValue=128+64;
...
    if (double(muchTooHighCount) / double(allCount) >= 0.01) {
      progress.Warning(typeInfo.GetName() + " has more than 1% cells with much too high entry count, will use smaller tile size");
      return false;
    }

    if (double(tooHighCount) / double(allCount) >= 0.05) {
      progress.Warning(typeInfo.GetName() + " has more than 5% cells with too high entry count, will use smaller tile size");
      return false;
    }

    if (double(tooLowCount) / double(allCount) >= 0.2) {
      progress.Warning(typeInfo.GetName() + " has more than 20% cells with <40% of average filling");
    }
...

@Karry : Note that my changes may already have influence on current database size and performance.

Framstag added a commit that referenced this issue Oct 8, 2023
Add options "--areaWayIndexMinMag" and "--areaWayIndexMaxMag"
Framstag added a commit that referenced this issue Oct 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database For issues regarding the core database performance For issues regarding performance question For issues of type 'question'
Projects
None yet
Development

No branches or pull requests

2 participants