-
Notifications
You must be signed in to change notification settings - Fork 23
/
guide.Rmd
265 lines (195 loc) · 13.1 KB
/
guide.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
---
title: "Overview: Creating FFTs with FFTrees"
author: "Nathaniel D. Phillips and Hansjörg Neth"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: fft.bib
csl: apa.csl
vignette: >
%\VignetteIndexEntry{Overview: Creating FFTs with FFTrees}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, echo = FALSE}
knitr::opts_chunk$set(collapse = FALSE,
comment = "#>",
prompt = FALSE,
tidy = FALSE,
echo = TRUE,
message = FALSE,
warning = FALSE,
# Default figure options:
dpi = 100,
fig.align = 'center',
fig.height = 6.0,
fig.width = 6.5,
out.width = "580px")
```
```{r pkgs, echo = FALSE, message = FALSE, results = 'hide'}
library(FFTrees)
library(dplyr)
library(testthat)
library(tidyselect)
library(magrittr)
library(knitr)
```
```{r urls, echo = FALSE, message = FALSE, results = 'hide'}
# URLs:
url_pkg_CRAN <- "https://CRAN.R-project.org/package=FFTrees"
url_pkg_GitHub <- "https://github.com/ndphillips/FFTrees"
url_pkg_issues <- "https://github.com/ndphillips/FFTrees/issues"
url_JDM_issue <- "https://journal.sjdm.org/vol12.4.html"
url_JDM_html <- "https://journal.sjdm.org/17/17217/jdm17217.html"
url_JDM_pdf <- "https://journal.sjdm.org/17/17217/jdm17217.pdf"
url_JDM_doi <- "https://doi.org/10.1017/S1930297500006239"
doi_JDM <- "10.1017/S1930297500006239"
email_contact <- "Nathaniel.D.Phillips.is@gmail.com"
url_contact <- "https://www.linkedin.com/in/nathanieldphillips/"
```
<!-- FFTrees logo: --->
<!-- <img src = "../inst/logo.png" align = "right" alt = "FFTrees" width = "160" /> -->
<!-- Brief intro: -->
The R package **FFTrees** [@phillips2017FFTrees; @FFTrees-pkg] makes it easy to create, visualize, and evaluate fast-and-frugal decision trees\ (FFTs).
FFTs are simple and transparent decision algorithms for solving binary classification problems in an\ effective and efficient fashion.
## Fast-and-Frugal Trees (FFTs)
<!-- Defining FFTs: -->
A _fast-and-frugal tree_ (FFT) [@martignon2003naive] is a set of hierarchical rules for solving binary classification tasks based on very little pieces of information (usually using\ 4 or fewer cues).
In contrast to more complex decision trees, each node of an\ FFT has exactly two branches.
A branch can either contain another cue (i.e., ask another question) or lead to an exit (i.e., yield a decision or prediction outcome).
Each non-final node of an\ FFT has one exit branch and the final node has two exit branches.
<!-- Characteristics and benefits of FFTs: -->
FFTs are simple and effective decision strategies that use minimal information for making decisions in binary classification problems [see @gigerenzer1999fast;@gigerenzer1999good].
FFTs are often preferable to more complex decision strategies (such as logistic regression, LR) because they rarely over-fit data [@gigerenzer2009homo] and are easy to interpret, implement, and communicate in real-world settings [@marewski2012heuristic].
FFTs have been designed to tackle many real world tasks from making fast decisions in emergency rooms [@green1997alters] to detecting depression [@jenny2013simple].
<!-- Emphasize transparency: -->
Whereas their performance and success are empirical questions, a key theoretical advantage of FFTs is their _transparency_ to decision makers and anyone aiming to understand and evaluate the details of an algorithm.
In the words of @burton2020, "human users could interpret, justify, control, and interact with a fast-and-frugal decision aid" (p.\ 229).
## Using the FFTrees package
The **FFTrees** package makes it easy to produce, display, and evaluate FFTs [@phillips2017FFTrees].
The package's main function is `FFTrees()` which takes formula\ `formula` and dataset\ `data` arguments and returns several FFTs that attempt to classify training cases into criterion classes.
The FFTs created can then be used to predict new data to cross-validate their performance.
Here is an example of using the main `FFTrees()` function to fit FFTs to `heart.train` data:
```{r fft-example, message = FALSE, results = 'hide'}
# Create a fast-and-frugal tree (FFT) predicting heart disease:
heart.fft <- FFTrees(formula = diagnosis ~.,
data = heart.train,
data.test = heart.test,
main = "Heart Disease",
decision.labels = c("Healthy", "Diseased"))
```
The resulting `FFTrees` object `heart.fft` contains 7\ FFTs that were fitted to the `heart.test` data.
To evaluate a tree's predictive performance, we compare its predictions for the un-trained `heart.test` data with their true criterion values.
Here is how we can apply the best training FFT to the `heart.test` data:
```{r fig-1, fig.width = 6.5, fig.height = 6.0, out.width = "600px", fig.align = 'center', fig.cap = "A fast-and-frugal tree (FFT) to predict heart disease status."}
# Visualize predictive performance:
plot(heart.fft, data = "test")
```
## Getting started
To start using the **FFTrees** package, we recommend studying the [Tutorial: Creating FFTs for heart disease](FFTrees_heart.html).
The tutorial illustrates the basics steps of creating, visualizing, and evaluating fast-and-frugal trees (FFTs).
The scientific background of FFTs and the development of **FFTrees** are described in @phillips2017FFTrees (doi\ [10.1017/S1930297500006239](`r url_JDM_doi`) | [html](`r url_JDM_html`) | [PDF](`r url_JDM_pdf`)).
The following vignettes provide details on related topics and corresponding examples.
### Vignettes
<!-- Table of all vignettes: -->
Here is a complete list of the vignettes available in the **FFTrees** package:
| | Vignette | Description |
|--:|:------------------------------|:-------------------------------------------------|
| | [Main guide: FFTrees overview](guide.html) | An overview of the **FFTrees** package |
| 1 | [Tutorial: FFTs for heart disease](FFTrees_heart.html) | An example of using `FFTrees()` to model heart disease diagnosis |
| 2 | [Accuracy statistics](FFTrees_accuracy_statistics.html) | Definitions of accuracy statistics used throughout the package |
| 3 | [Creating FFTs with FFTrees()](FFTrees_function.html) | Details on the main `FFTrees()` function |
| 4 | [Manually specifying FFTs](FFTrees_mytree.html) | How to directly create FFTs without using the built-in algorithms |
| 5 | [Visualizing FFTs](FFTrees_plot.html) | Plotting `FFTrees` objects, from full trees to icon arrays |
| 6 | [Examples of FFTs](FFTrees_examples.html) | Examples of FFTs from different datasets contained in the package |
### Datasets
The **FFTrees** package contains several datasets ---\ mostly from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/)\ --- that allow you to address interesting questions when exploring FFTs:
- `blood` -- Which people donate blood? [source](https://archive.ics.uci.edu/ml/datasets/Blood+Transfusion+Service+Center)
- `breastcancer` -- Which patients suffer from breast cancer? [source](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic))
- `car` -- Which cars are acceptable? [source](http://archive.ics.uci.edu/ml/datasets/Car+Evaluation)
- `contraceptive` -- Which factors determine whether women use contraceptives? [source](https://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice)
- `creditapproval` -- Which factors determine a creditcard approval? [source](https://archive.ics.uci.edu/ml/datasets/Credit+Approval)
- `fertility` -- Which factors predict a fertile sperm concentration? [source](https://archive.ics.uci.edu/ml/datasets/Fertility)
- `forestfires` -- Which environmental conditions predict forest fires? [source](https://archive.ics.uci.edu/ml/datasets/Forest+Fires)
- `heartdisease` -- Which patients suffer from heart disease? [source](https://archive.ics.uci.edu/ml/datasets/Heart+Disease)
- `iris.v` -- Which iris belongs to the class "virginica"? [source](https://archive.ics.uci.edu/ml/datasets/Forest+Fires)
- `mushrooms` -- Which features predict poisonous mushrooms? [source](https://archive.ics.uci.edu/ml/datasets/Mushroom)
- `sonar` -- Did a sonar signal bounce off a metal cylinder (or a rock)? [source](https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks))
- `titanic` -- Which passengers survived the Titanic? [source](https://www.encyclopedia-titanica.org)
- `voting` -- How did U.S. congressmen vote in 1984? [source](https://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records)
- `wine` -- What determines ratings of wine quality? [source](https://archive.ics.uci.edu/ml/datasets/Wine)
#### Details about the datasets {-}
When preparing data to be predicted by FFTs, we usually distinguish between several (categorical or numeric) predictors and a (binary) criterion variable.
**Table\ 1** provides basic information on the datasets included in the **FFTrees** package (see their documentation for additional details).
**Table\ 1:** Key information on the datasets included in **FFTrees**.
```{r dataframe for overview data, echo = FALSE}
## Preparations for applying the describe_data() function to all data sets
## When new data sets are included, add their info so that they will also be shown in the vignette-table!
# List all data sets:
data_list <- list(blood, breastcancer, car, contraceptive, creditapproval, fertility, forestfires,
heartdisease, iris.v, mushrooms, sonar, titanic, voting, wine)
# Vector with all names of the data sets:
data_names <- c("blood", "breastcancer", "car", "contraceptive", "creditapproval", "fertility", "forestfires",
"heartdisease", "iris.v", "mushrooms", "sonar", "titanic", "voting", "wine")
# Vector with all criterion names:
criterion_names <- c("donation.crit", "diagnosis", "acceptability", "cont.crit", "crit",
"diagnosis","fire.crit", "diagnosis", "virginica", "poisonous", "mine.crit", "survived", "party.crit", "type")
# Vector with criterion values of interest:
baseline_values <- c(1, TRUE, "acc", TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, "red")
# Use combined lists/vectors and apply describe_data() to each:
result_list <- mapply(describe_data, data = data_list,
data_name = data_names, criterion_name = criterion_names,
baseline_value = baseline_values, SIMPLIFY = FALSE)
# Combine results in df:
combined_result <- do.call(rbind, result_list)
# Round baseline and NA pct values for brevity:
combined_result$Baseline_pct<- round(combined_result$Baseline_pct, 1)
combined_result$NAs_pct<- round(combined_result$NAs_pct, 2)
# Rename columns:
colnames(combined_result) <- c("Dataset name",
"Number of cases",
"Criterion name",
"Baseline (`TRUE`,\\ in\\ %)",
"Number of predictors",
"Number of NAs",
"NAs (in\\ %)")
# Render the table from the data frame
# use as many items per page as we have data sets
# redefine column names as we like them:
knitr::kable(combined_result, format = "html")
```
## Citing **FFTrees**
We had a lot of fun creating **FFTrees** and hope you like it too!
For an accessible introduction to FFTs, we recommend reading our article in the journal _Judgment and Decision Making_ ([2017](`r url_JDM_doi`)), entitled _FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees_ (available in [html](`r url_JDM_html`) | [PDF](`r url_JDM_pdf`)\ ).
**Citation** (in APA format):
- Phillips, N. D., Neth, H., Woike, J. K. & Gaissmaier, W. (2017).
FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees.
_Judgment and Decision Making_, _12_ (4), 344–368.
doi\ [`r doi_JDM`](`r url_JDM_doi`)
<!-- PDF available at `r url_JDM_pdf` -->
When using **FFTrees** in your own work, please cite our article and spread the word, so that we can continue developing the package.
**BibTeX Citation**:
```{r bibtex-citation, eval = FALSE, highlight = FALSE}
@article{FFTrees,
title = {FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees},
author = {Phillips, Nathaniel D and Neth, Hansjörg and Woike, Jan K and Gaissmaier, Wolfgang},
year = 2017,
journal = {Judgment and Decision Making},
volume = 12,
number = 4,
pages = {344--368},
url = {https://journal.sjdm.org/17/17217/jdm17217.pdf},
doi = {10.1017/S1930297500006239}
}
```
## Contact
- The latest release of **FFTrees** is available at [`r url_pkg_CRAN`](`r url_pkg_CRAN`).
- The latest developer version is available at [`r url_pkg_GitHub`](`r url_pkg_GitHub`).
- For comments, tips, and bug reports, please post at [`r url_pkg_issues`](`r url_pkg_issues`) or contact Nathaniel at `r email_contact` or [`r url_contact`](`r url_contact`).
<!-- Logo: -->
```{r logo, echo = FALSE, fig.align = "center", out.width="40%"}
knitr::include_graphics("../inst/FFTrees_Logo.jpg")
```
<!-- Automatic references: -->
## Bibliography
<!-- eof. -->