Missing QA tiles create empty labels at country boundaries #106

joshwapiano · 2018-08-19T10:31:12Z

After seeing really poor results in my ResNet50 I have manually inspected the data and labels and it appears that the background images label_maker is producing are labelled almost randomly.

Here's the config that I'm using:

{
  "country": "singapore",
  "bounding_box": [103.5917,1.132909,104.10712,1.487815],
  "zoom": 14,
  "classes": [{ "name": "human construction", "filter": ["any", ["has", "man_made"], ["has", "military"], ["has", "building"], ["has", "aerialway"], ["has", "aeroway"], ["in", "landuse", "allotments", "brownfield", "cemetery", "commercial", "construction", "depot", "farmland", "farmyard", "garages", "greenhouse_horticulture", "industrial", "landfill", "military", "orchard", "plant_nursery", "port", "quarry", "railway", "recreation_ground", "religious", "residential", "retail", "village_green", "vineyard"] ] } ],
  "imagery": "http://a.tiles.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}.jpg?access_token=###(removed)",
  "background_ratio": 1,
  "ml_type": "classification"
}

Essentially what I've attempted to do in the above 'classes' is to identify tiles with any sign of human activity.
At that zoom level around 300 images are packaged in to data.npz.
Once we have the data.npz file a quick inspection of some of the imagery with the following code (I used a Jupyter Notebook) shows a high proportion of the images are labelled as not in the class above despite very clearly containing man made structures.

import numpy as np
import os
# load data
npz = np.load('final_singa/data.npz')
x_train = npz['x_train']
y_train = npz['y_train']
x_test = npz['x_test']
y_test = npz['y_test']

# check out labels for first 25 images:
for i in range(0,25):
    print('row:',i)
    fig = plt.subplots(nrows=1, ncols=1, figsize=(10, 10))
    plt.imshow(x_train[i])
    if y_train[i][0] == 1:
        plt.title(str('No man made'))
    else:
        plt.title(str('Yes man made'))
    plt.tight_layout
    plt.show()
    plt.pause(0.0001)

I also tried at zoom level 15 and the same issue exists.
I'm wondering a few things:
Is the class I'm using too complex to use label-maker for this purpose?
Is the OSM tiling simply unreliable?
Is there a bug in label maker?
Am I using background_ratio incorrectly?
Am I making a stupid mistake somewhere?

Any advice greatly appreciated!

Thanks
Josh

drewbo · 2018-08-22T15:37:19Z

@joshwapiano can you use a tool like mbview to inspect the underlying OSM data? Without knowing that, it's hard to know if there is any bug happening in label-maker or if the issue is that the data isn't entered into OSM. This tool can't really control the latter issue and relies on good input data to create the package.

joshwapiano · 2018-08-29T15:55:15Z

Hi @drewbo , appreciate the fast response!
I haven't used mbview successfully, but I've inspected the geoJSON classification file in QGIS and the tiles appear sensible. I won't be able to investigate further now until October, but I tried to use a small example in the above to make it easier for yourself and others to find the bug (or mistake)!

I'm assuming based on your response that the "classes" syntax I've used above isn't beyond the complexities that label-maker allows?

Many thanks
Josh

drewbo · 2018-08-30T11:42:13Z

@joshwapiano the class syntax looks correct (and is supported). The overall classification looks generally correct except near the country boundary.

green showing human construction tiles

It looks like what's happening is that because of the country boundary, it skips including certain tiles (belonging to Malaysia) in the OSM QA tiles. Then the underlying data isn't present so label-maker can't create the correct labels.

I'm going to retitle this issue to reflect this and will inquire with the maintainers of the upstream data to see if there's a good workaround. For now, I'd leave out boundary tiles from training.

joshwapiano · 2018-08-30T16:20:29Z

@drewbo Thanks for looking in to this. I should have time to make changes based on your findings before my project deadline.

My previous approach was to first run:

$ label-maker download --dest <example> --config <example>.json
$ label-maker labels --dest <example> --config <example>.json
$ label-maker images --dest <example> --config <example>.json

I would then manually inspect the images that had been downloaded - deleting those which were either cloud covered, or poor quality (a high proportion when requesting data at 14 zoom level!) to create a cleansed folder of images.
My next step was to run the final CLI command to creates a data.npz file:

$ label-maker package --dest <example> --config <example>.json

This has taken me quite some time, so could you confirm whether it would be possible to copy the cleansed image folders already created in to a different directory and from there, make changes to the config file to reduce risk of boundary overlap and simply run the following:

$ label-maker download --dest <example_2> --config <example_2>.json
$ label-maker labels --dest <example_2> --config <example_2>.json

$ label-maker package --dest <example_2> --config <example_2>.json

Or would I need to re-download and re-cleanse the images for each country?

Many thanks
Josh

P.s. One other thought - could the issue you've identified also impact on land/sea borders as well? The full set of bounding boxes I've used are as follows:
Philippines, Indonesia (Lombok), Indonesia (Borneo), Malaysia, Vietnam, Brunei, China, Taiwan. I combine these in to one data.npz file for resnet50 training purposes, and then apply the trained net to Sentinel-2 data.

joshwapiano · 2018-09-01T20:55:36Z

@drewbo If I can find time I'll look in to this in more detail too over the next few days - would you mind sharing the code you used to produce the black/purple map above, was this using Mapnik?
Did you hear back from your inquiry with the maintainers of the upstream data to see if there's a good workaround?

drewbo · 2018-09-03T20:34:39Z

@joshwapiano

I think you can follow the procedure you outlined above.
The issue could affect land/sea borders as well. There isn't good documentation on what file is used to divide the QA tiles.
The code to produce the black/purple map is mbview data/singapore-z14.mbtiles
I don't think there's a workaround on the background issue yet

drewbo changed the title ~~Labelling accuracy exceptionally poor for 'background images'~~ Missing QA tiles creates empty labels at country boundaries Aug 30, 2018

drewbo changed the title ~~Missing QA tiles creates empty labels at country boundaries~~ Missing QA tiles create empty labels at country boundaries Aug 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing QA tiles create empty labels at country boundaries #106

Missing QA tiles create empty labels at country boundaries #106

joshwapiano commented Aug 19, 2018

drewbo commented Aug 22, 2018

joshwapiano commented Aug 29, 2018

drewbo commented Aug 30, 2018

joshwapiano commented Aug 30, 2018 •

edited

joshwapiano commented Sep 1, 2018

drewbo commented Sep 3, 2018

Missing QA tiles create empty labels at country boundaries #106

Missing QA tiles create empty labels at country boundaries #106

Comments

joshwapiano commented Aug 19, 2018

drewbo commented Aug 22, 2018

joshwapiano commented Aug 29, 2018

drewbo commented Aug 30, 2018

joshwapiano commented Aug 30, 2018 • edited

joshwapiano commented Sep 1, 2018

drewbo commented Sep 3, 2018

joshwapiano commented Aug 30, 2018 •

edited