Local Climate Zone Classification Using Random Forests

The goal of this project was to replicate aspects of Comparison between convolutional neural networks and random forest for local climate zone classification in mega urban areas using Landsat images. The report can be found here.*

Local Climate Zone (LCZ) classification can be useful in identifying microclimates within cities that may be useful for targeting climate risk adaptation efforts, which can help alleviate the issues created by the Urban Heat Island Effect.

In this work the focus was on random forest without inclusion of convolutional neural networks. Rather than four cities, this investigation will focus on just Hong Kong. This city was chosen because each LCZ class has at least four polygons. The data used was accessed from the 2017 IEEE GRSS Data Fusion Contest anc includes both Landsat 8 imagery and LCZ reference data. The classification scheme used by the World Urban Database and Access Portal Tools project (S1 in the paper) will be recreated, with varying numbers of the tuning parameter ntree, which controls the number of trees in the random forest. Accuracy with out-of-bag data will be compared to that with the test dataset.

Here's the inital LCZ data and one Landsat scene, both with a Google Maps satellite baselayer:

Accuracy Metrics

Accuracy metrics fall in line with the remote sensing field and include the following:

For overall comparisons. OA_urb and OA_nat will also be used, which are the same as OA but only include the urban or natural classes, respectively. For by class comparisons F1 score will be used.

where,

UA is a measure of user's accuracy, which is also called precision or positive predictive value. PA is the measure of producer's accuracy, also known as recall or sensitivity. The F1 score is the harmonic mean of UA and PA. An F1 score closer to 1 indicates a model that has both low false positives and low false negatives.

Results

The results from varying the tuning parameter indicate that there is an upper limit to how much the number of trees can affect the accuracy of the prediction, and it lies around 125 trees for OA metrics, and around 100 trees for F1 scores.

Results also indicate a lack of transferability between accuracy of predictions for the out-of-bag data as compared to that of the test dataset. This makes sense considering the spatial autocorrelation present in data such as these, but is concerning nonetheless. Additionally, OA metrics seem to mask low F1 scores in individual classes.

A Full Prediction

Finally, here is an example of a full prediction from the best random forest:

If you have any comments or questions feel free to contact me at smith.ericka.b@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
R		R
analysis		analysis
data		data
doc		doc
results		results
.gitignore		.gitignore
README.html		README.html
README.md		README.md
TODO.md		TODO.md
masters-project-lcz-classification.Rproj		masters-project-lcz-classification.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R

R

analysis

analysis

data

data

doc

doc

results

results

.gitignore

.gitignore

README.html

README.html

README.md

README.md

TODO.md

TODO.md

masters-project-lcz-classification.Rproj

masters-project-lcz-classification.Rproj

Repository files navigation

Local Climate Zone Classification Using Random Forests

Accuracy Metrics

Results

A Full Prediction

About

Releases

Packages

Contributors 2

Languages

ericka-howard/masters-project-lcz-classification

Folders and files

Latest commit

History

Repository files navigation

Local Climate Zone Classification Using Random Forests

Accuracy Metrics

Results

A Full Prediction

About

Topics

Resources

Stars

Watchers

Forks

Languages