Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing QA tiles create empty labels at country boundaries #106

Open
joshwapiano opened this issue Aug 19, 2018 · 6 comments
Open

Missing QA tiles create empty labels at country boundaries #106

joshwapiano opened this issue Aug 19, 2018 · 6 comments

Comments

@joshwapiano
Copy link

After seeing really poor results in my ResNet50 I have manually inspected the data and labels and it appears that the background images label_maker is producing are labelled almost randomly.

Here's the config that I'm using:

{
  "country": "singapore",
  "bounding_box": [103.5917,1.132909,104.10712,1.487815],
  "zoom": 14,
  "classes": [{ "name": "human construction", "filter": ["any", ["has", "man_made"], ["has", "military"], ["has", "building"], ["has", "aerialway"], ["has", "aeroway"], ["in", "landuse", "allotments", "brownfield", "cemetery", "commercial", "construction", "depot", "farmland", "farmyard", "garages", "greenhouse_horticulture", "industrial", "landfill", "military", "orchard", "plant_nursery", "port", "quarry", "railway", "recreation_ground", "religious", "residential", "retail", "village_green", "vineyard"] ] } ],
  "imagery": "http://a.tiles.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}.jpg?access_token=###(removed)",
  "background_ratio": 1,
  "ml_type": "classification"
}

Essentially what I've attempted to do in the above 'classes' is to identify tiles with any sign of human activity.
At that zoom level around 300 images are packaged in to data.npz.
Once we have the data.npz file a quick inspection of some of the imagery with the following code (I used a Jupyter Notebook) shows a high proportion of the images are labelled as not in the class above despite very clearly containing man made structures.

import numpy as np
import os
# load data
npz = np.load('final_singa/data.npz')
x_train = npz['x_train']
y_train = npz['y_train']
x_test = npz['x_test']
y_test = npz['y_test']

# check out labels for first 25 images:
for i in range(0,25):
    print('row:',i)
    fig = plt.subplots(nrows=1, ncols=1, figsize=(10, 10))
    plt.imshow(x_train[i])
    if y_train[i][0] == 1:
        plt.title(str('No man made'))
    else:
        plt.title(str('Yes man made'))
    plt.tight_layout
    plt.show()
    plt.pause(0.0001)

I also tried at zoom level 15 and the same issue exists.
I'm wondering a few things:
Is the class I'm using too complex to use label-maker for this purpose?
Is the OSM tiling simply unreliable?
Is there a bug in label maker?
Am I using background_ratio incorrectly?
Am I making a stupid mistake somewhere?

Any advice greatly appreciated!

Thanks
Josh

@drewbo
Copy link
Contributor

drewbo commented Aug 22, 2018

@joshwapiano can you use a tool like mbview to inspect the underlying OSM data? Without knowing that, it's hard to know if there is any bug happening in label-maker or if the issue is that the data isn't entered into OSM. This tool can't really control the latter issue and relies on good input data to create the package.

@joshwapiano
Copy link
Author

Hi @drewbo , appreciate the fast response!
I haven't used mbview successfully, but I've inspected the geoJSON classification file in QGIS and the tiles appear sensible. I won't be able to investigate further now until October, but I tried to use a small example in the above to make it easier for yourself and others to find the bug (or mistake)!

I'm assuming based on your response that the "classes" syntax I've used above isn't beyond the complexities that label-maker allows?

Many thanks
Josh

@drewbo
Copy link
Contributor

drewbo commented Aug 30, 2018

@joshwapiano the class syntax looks correct (and is supported). The overall classification looks generally correct except near the country boundary.

screen shot 2018-08-30 at 2 07 39 pm

green showing human construction tiles

It looks like what's happening is that because of the country boundary, it skips including certain tiles (belonging to Malaysia) in the OSM QA tiles. Then the underlying data isn't present so label-maker can't create the correct labels.

screen shot 2018-08-30 at 2 07 08 pm

I'm going to retitle this issue to reflect this and will inquire with the maintainers of the upstream data to see if there's a good workaround. For now, I'd leave out boundary tiles from training.

@drewbo drewbo changed the title Labelling accuracy exceptionally poor for 'background images' Missing QA tiles creates empty labels at country boundaries Aug 30, 2018
@drewbo drewbo changed the title Missing QA tiles creates empty labels at country boundaries Missing QA tiles create empty labels at country boundaries Aug 30, 2018
@joshwapiano
Copy link
Author

joshwapiano commented Aug 30, 2018

@drewbo Thanks for looking in to this. I should have time to make changes based on your findings before my project deadline.

My previous approach was to first run:

$ label-maker download --dest <example> --config <example>.json
$ label-maker labels --dest <example> --config <example>.json
$ label-maker images --dest <example> --config <example>.json

I would then manually inspect the images that had been downloaded - deleting those which were either cloud covered, or poor quality (a high proportion when requesting data at 14 zoom level!) to create a cleansed folder of images.
My next step was to run the final CLI command to creates a data.npz file:

$ label-maker package --dest <example> --config <example>.json 

This has taken me quite some time, so could you confirm whether it would be possible to copy the cleansed image folders already created in to a different directory and from there, make changes to the config file to reduce risk of boundary overlap and simply run the following:

$ label-maker download --dest <example_2> --config <example_2>.json
$ label-maker labels --dest <example_2> --config <example_2>.json

$ label-maker package --dest <example_2> --config <example_2>.json 

Or would I need to re-download and re-cleanse the images for each country?

Many thanks
Josh

P.s. One other thought - could the issue you've identified also impact on land/sea borders as well? The full set of bounding boxes I've used are as follows:
Philippines, Indonesia (Lombok), Indonesia (Borneo), Malaysia, Vietnam, Brunei, China, Taiwan. I combine these in to one data.npz file for resnet50 training purposes, and then apply the trained net to Sentinel-2 data.

@joshwapiano
Copy link
Author

@drewbo If I can find time I'll look in to this in more detail too over the next few days - would you mind sharing the code you used to produce the black/purple map above, was this using Mapnik?
Did you hear back from your inquiry with the maintainers of the upstream data to see if there's a good workaround?

@drewbo
Copy link
Contributor

drewbo commented Sep 3, 2018

@joshwapiano

  • I think you can follow the procedure you outlined above.
  • The issue could affect land/sea borders as well. There isn't good documentation on what file is used to divide the QA tiles.
  • The code to produce the black/purple map is mbview data/singapore-z14.mbtiles
  • I don't think there's a workaround on the background issue yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants