Skip to content

1.1 Goodness of Fit

Antonio Gomes de Oliveira Junior edited this page Feb 12, 2016 · 21 revisions

#Goodness of Fit metrics#

The Calibration package contains some goodness of fit metrics, which are used to verify the accuracy of a model simulation, by comparing it to real data or other reference data. The package provides simple cell by cell metrics, which can be used for discrete and continuous data. It also provides a multi-level comparison, proposed by Constanza (Constanza, R, 1989).

#PixelByPixel Metric#

Compares two CellularSpace objects pixel by pixel, and returns a number with the average precision value. This precision is a value between 0 and 1. Two identical pixel maps, when compared, should return a value of 1.

If the cellular spaces are continuous:

We compare two cells by calculating the absolute difference between them. The precision of a cell is 1 minus this difference. The final result for the map is the sum of the precision by all cells, divided by the number of cells.

If the cellular spaces are discrete:

The precision between two cells is 1 if the value of both cells is equal, and 0 otherwise. The total precision for the maps is the sum of the cell precisions, divided by the number of cells`

Parameters:

pixelByPixel(cs1, cs2, attribute1, attribute2, continuous)

  • 1st cs1 First Cellular Space.
  • 2nd cs2 Second Cellular Space.
  • 3rd attribute1 attribute from the first cellular space that should be compared.
  • 4th attribute2 attribute from the second cellular space that should be compared.
  • 5th continuous boolean that indicates if the model is continuous (default: false, discrete model).
import("calibration")

local cell = Cell{a = 0.8, b = 0.7}
local cs = CellularSpace{xdim = 10, instance = cell}
pixelByPixel(cs, cs, "a", "b")

#MultiLevel Costanza Metric#

In Constanza's method, the percentage difference between the CellularSpaces is calculated.

Parameters: The function receives a single table with the described values.

local result = multiLevel{cs1, cs2, attribute, continuous, graphics}

  • cs1 First CellularSpace.
  • cs2 Second CellularSpace.
  • attribute An attribute present in both CellularSpace, which values should be compared.
  • continuous Boolean that indicates if the model to be calibrated is continuous.
  • graphics Boolean argument that indicates whether or not to draw a Chart with each square fitness. (Default = False, discrete model).
-- Example described in the Costanza paper.
import("calibration")
local cs12 = CellularSpace{
	database = file("Costanza.map", "calibration"),
	attrname = "Costanza"
}
		local cs22 = CellularSpace{
	database = file("Costanza2.map", "calibration"),
	attrname = "Costanza"
}

local result = multiLevel{cs1 = cs12, cs2 = cs22, attribute = "Costanza", continuous = false, graphics = true}
local result2 = multiLevel{cs1 = cs12, cs2 = cs22, attribute = "Costanza", continuous = true, graphics = true}

Initially this difference is calculated by a 1x1 pixel window using the pixelByPixel function, but after each iteration, the window area is doubled. In the image below, a 2x2 window comparison can be seen.

Comparing two discrete CellularSpace.

The fit between the scenes (Fw) vs. the size of the sampling window (w) is only plotted if the 'graphics' parameters in the MultiLevel type is set to true.

The fit for each sampling window is estimated as 1 minus the proportion of cells that would have to be changed to make the sampling windows each have the same number of cells in each category, regardless of their spatial arrangement. For example:

  • In a discrete model: If a particular 2 × 2 window had two cells of forest and two of marsh in both scenes, the fit would be 100% regardless of how the cells were arranged within the windows. If one sampling window had one forest and three marsh, while the other had two of each category, the fit would be 75% (since one cell out of four would have to be changed to make the fit 100%).
  • In a continuous model: If the sum of the values in a particular 2 × 2 window is 10 in both scenes, the fit would be 100% regardless of how the cells were arranged within the windows. If one sampling window had a sum of 7.5, while the other has a total value of 10, the fit would be 76% (since 100% - (abs[7.5-10]/17.5)*100 ~= 76%). The fit for the whole scene for a particular sampling window size is the average fit over all the sampling windows of that size. The sampling window is moved through the scene one cell at a time until the entire image is covered. this is calculated using the following formula:

Where Fw is the fit for sampling window size w, w the dimension of one side of the (square) sampling window, aki the number of cells of category i in scene k in the sampling window, p the number of different categories (e.g., habitat types) in the sampling windows, s the sampling window of dimension w by w which slides through the scene one cell at a time, and tw, the total number of sampling windows in the scene for window size w.

At the end of this process, all the different percentages calculated are considered for the final fitness value using the following formula:

where F t is a weighted average of the fits over all window sizes, F,. the fit for sampling windows of linear dimension w, k a constant, and w linear dimension of a sampling window. This formula gives exponentially less weight to the fit at lower resolution. The value of k determines how much weight is to be given to small vs. large sampling windows. If k = 0, all window sizes are given the same weight. At k = 1, only the first few window sizes will be important.

The goal of this process is to consider the proportions between the data during the evaluation of a model. For example, in a deforestation model we want to consider both the precise positions of deforested regions and the deforested area ratios of different districts when assessing the accuracy of a model. The relative importance of matching the patterns precisely vs. crudely must be answered in the context of the model's objectives and the quality of the data. For the purposes of matching spatial patterns of land use, it was found that a value of k---0.1 gives an 'adequate' amount of weight to the larger window sizes, and so it's the default value.

The full description of the Costanza method can be read in the Costanza paper on the link below.


Costanza paper: Costanza, R., 1989. Model goodness of fit: a multiple resolution procedure. Ecol. Modelling, 47: 199--215. https://www.pdx.edu/sites/www.pdx.edu.sustainability/files/Costanza%20EM%201989.pdf