Skip to content

A project assessing suitability of random forests for Local Climate Zone classification in Hong Kong.

Notifications You must be signed in to change notification settings

ericka-howard/masters-project-lcz-classification

Repository files navigation

Local Climate Zone Classification Using Random Forests

The goal of this project was to replicate aspects of Comparison between convolutional neural networks and random forest for local climate zone classification in mega urban areas using Landsat images. The report can be found here.*

Local Climate Zone (LCZ) classification can be useful in identifying microclimates within cities that may be useful for targeting climate risk adaptation efforts, which can help alleviate the issues created by the Urban Heat Island Effect.

In this work the focus was on random forest without inclusion of convolutional neural networks. Rather than four cities, this investigation will focus on just Hong Kong. This city was chosen because each LCZ class has at least four polygons. The data used was accessed from the 2017 IEEE GRSS Data Fusion Contest anc includes both Landsat 8 imagery and LCZ reference data. The classification scheme used by the World Urban Database and Access Portal Tools project (S1 in the paper) will be recreated, with varying numbers of the tuning parameter ntree, which controls the number of trees in the random forest. Accuracy with out-of-bag data will be compared to that with the test dataset.

Here's the inital LCZ data and one Landsat scene, both with a Google Maps satellite baselayer:

LCZ Reference DataLandsat Scene

Accuracy Metrics

Accuracy metrics fall in line with the remote sensing field and include the following:

OA equation

For overall comparisons. OA_urb and OA_nat will also be used, which are the same as OA but only include the urban or natural classes, respectively. For by class comparisons F1 score will be used.

F1 equation

where,

PA equation

UA equation

UA is a measure of user's accuracy, which is also called precision or positive predictive value. PA is the measure of producer's accuracy, also known as recall or sensitivity. The F1 score is the harmonic mean of UA and PA. An F1 score closer to 1 indicates a model that has both low false positives and low false negatives.

Results

The results from varying the tuning parameter indicate that there is an upper limit to how much the number of trees can affect the accuracy of the prediction, and it lies around 125 trees for OA metrics, and around 100 trees for F1 scores.

OA Metrics when varying ntree from 5 to 500 in intervals of 5. Based on out-of-bag dataset.

F1 Scores when varying ntree from 5 to 500 in intervals of 5. Based on out-of-bag dataset

Results also indicate a lack of transferability between accuracy of predictions for the out-of-bag data as compared to that of the test dataset. This makes sense considering the spatial autocorrelation present in data such as these, but is concerning nonetheless. Additionally, OA metrics seem to mask low F1 scores in individual classes.

Validation metrics based on test dataset.

A Full Prediction

Finally, here is an example of a full prediction from the best random forest:

Fully Predicted LCZ Map

Legend

If you have any comments or questions feel free to contact me at smith.ericka.b@gmail.com

About

A project assessing suitability of random forests for Local Climate Zone classification in Hong Kong.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages