updated documentation

JannisHoch · Nov 17, 2020 · a660f97 · a660f97
1 parent cc865b6
commit a660f97
Show file tree

Hide file tree

Showing 3 changed files with 27 additions and 17 deletions.
diff --git a/README.rst b/README.rst
@@ -67,12 +67,15 @@ Runner script
 To run the model from command line, a command line script is provided. 
 All data and settings are retrieved from the settings-file which needs to be provided as inline argument.
 
+There are two settings-files, one for evaluating the model for the reference situation, and another one for additionally making projections.
+
 .. code-block:: console
 
     $ cd path/to/copro/scripts
     $ python runner.py ../example/example_settings.cfg
+    $ python runner.py ../example/example_settings_proj.cfg
 
-By default, output is stored to the output directory specified in the settings-file. 
+By default, output is stored to the output directory specified in the specific settings-file. 
 
 Documentation
 ---------------

diff --git a/docs/Execution.rst b/docs/Execution.rst
@@ -47,11 +47,6 @@ Projection runs
 If also projections are computed, multiple additional cfg-files can be provided.
 For each projection, one individual cfg-file is required.
 
-.. note::
-
-    Not all sections of the cfg-file are required here anymore. Most important is a correct reference
-    to the files containing the future variable values and the fitted classifier which is saved to file from the reference run.
-
 Since the projections are based on the reference run, at least two cfg-file are needed.
 The command would then look like this:
 
@@ -60,6 +55,10 @@ The command would then look like this:
     $ cd path/to/copro/scripts
     $ python runner.py ../example/example_settings.cfg -proj ../example/example_settings_proj.cfg
 
+.. info::
+
+    Multiple projections can be made by specifing various cfg-files with the -proj flag.
+
 Help
 ^^^^^^^^^^^^^^^^
 For further help how to use the script, try this:

diff --git a/docs/Output.rst b/docs/Output.rst
@@ -30,6 +30,8 @@ During model execution, data is sampled per polygon and time step.
 This data contains the geometry and ID of each polygon as well as unscaled variable values (X) and a boolean identifier whether conflict took place or not (Y).
 If the model is re-run without making changes to the data and how it is sampled, the resulting XY-array is stored to ``XY.npy``. This file can be loaded again with ``np.load()``.
 
+If making projections, the Y-part is not available. The remaining X-data is still written to a file ``X.npy``.
+
 .. note:: 
 
     Note that ``np.load()`` returns an array. This can be further processed with e.g. pandas.
@@ -45,15 +47,21 @@ Per model run, a fraction of the total XY-data is used to make a prediction.
 To be able to analyse model output, all predictions (stored as pandas dataframes) made per run are appended to a main output-dataframe.
 This dataframe is, actually, the basis of all futher analyes.
 When storing to file, this can become a rather large file. 
-Therefore, the dataframe is converted to npy-file (``out_y_df.npy``). This file can be loaded again with ``np.load()``.
+Therefore, the dataframe is converted to npy-file (``raw_output_data.npy``). This file can be loaded again with ``np.load()``.
 
 .. note:: 
 
     Note that ``np.load()`` returns an array. This can be further processed with e.g. pandas.
 
-Prediction metrics
+Evaluation metrics
 -----------------------
-Per model run, a range of metrics are computed to evalute the predictions made. They are all appended to a dictionary and saved to the file ``out_dict.csv``.
+Per model run, a range of metrics are computed to evalute the predictions made. 
+They are all appended to a dictionary and saved to the file ``evaluation_metrics.csv``.
+
+ROC-AUC
+--------
+To be able to determine the mean of the ROC-AUC score plus its standard deviation, the required data is stored to csv-files.
+``ROC_data_tprs.csv`` contains the false positive rates per evaluation, and ``ROC_data_aucs.csv`` the area-under-curve values per run. 
 
 Model prediction per polygon
 ---------------------------
@@ -66,6 +74,13 @@ Three main output metrics are calculated per polygon:
 2. The total number of conflicts in the test  (*NOC*);
 3. The chance of conflict (*COC*), defined as the ration of number of conflict predictions to overall number of predictions made.
 
+all data
+^^^^^^^^^
+
+All output metrics (CCP, NOC, COC) are determined based on the entire data set at the end of the run, i.e. without splitting it in chunks.
+
+The data is stored to ``output_per_polygon.shp``.
+
 k-fold analysis
 ^^^^^^^^^^^^^^^^
 The model is repeated several times to eliminate the influence of how the data is split into training and test samples.
@@ -74,14 +89,7 @@ As such, the accuracy per run and polygon will differ.
 To account for that, the resulting data set containing all predictions at the end of the run is split in k chunks. 
 Subsequently, the mean, median, and standard deviation of CCP is determined from the k chunks.
 
-The resulting shp-file is named ``kFold_CCP_stats.shp``.
-
-all data
-^^^^^^^^^
-
-All output metrics (CCP, NOC, COC) are determined based on the entire data set at the end of the run, i.e. without splitting it in chunks.
-
-The data is stored to ``all_stats.shp``.
+The resulting shp-file is named ``output_kFoldAnalysis_per_polygon.shp``.
 
 .. note::