Merge pull request #197 from hneth/master

Revise vignettes and clean up code
ndphillips · Jun 3, 2023 · fed34f2 · fed34f2
2 parents 010f4ab + a3113de
commit fed34f2
Show file tree

Hide file tree

Showing 6 changed files with 50 additions and 60 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,8 +1,8 @@
 Package: FFTrees
 Type: Package
 Title: Generate, Visualise, and Evaluate Fast-and-Frugal Decision Trees
-Version: 1.9.0.9031
-Date: 2023-05-31
+Version: 1.9.0.9032
+Date: 2023-06-01
 Authors@R: c(person("Nathaniel", "Phillips", role = c("aut"), email = "Nathaniel.D.Phillips.is@gmail.com", comment = c(ORCID = "0000-0002-8969-7013")),
              person("Hansjoerg", "Neth", role = c("aut", "cre"), email = "h.neth@uni.kn", comment = c(ORCID = "0000-0001-5427-3141")),
              person("Jan", "Woike", role = "aut", comment = c(ORCID = "0000-0002-6816-121X")),

diff --git a/R/plotFFTrees_function.R b/R/plotFFTrees_function.R
@@ -28,7 +28,7 @@
 #'  }
 #' By default, \code{data = 'train'} (as \code{x} may not contain test data).
 #'
-#' @param what What should be plotted (as a string)? Valid options are:
+#' @param what What should be plotted (as a character string)? Valid options are:
 #' \describe{
 #'   \item{'all'}{Plot the tree diagram with all corresponding guides and performance statistics, but excluding cue accuracies.}
 #'   \item{'cues'}{Plot only the marginal accuracy of cues in ROC space.

diff --git a/README.Rmd b/README.Rmd
@@ -40,13 +40,11 @@ url_JDM_doi <- "https://doi.org/10.1017/S1930297500006239"
 [![R-CMD-check](https://github.com/ndphillips/FFTrees/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ndphillips/FFTrees/actions/workflows/R-CMD-check.yaml)
 <!-- Devel badges end. -->
 
-
 <!-- Release badges start: -->
 <!-- [![CRAN status](https://www.r-pkg.org/badges/version/FFTrees)](https://CRAN.R-project.org/package=FFTrees) -->
 <!-- [![Total downloads](https://cranlogs.r-pkg.org/badges/grand-total/FFTrees?color='00a9e0')](https://www.r-pkg.org/pkg/FFTrees) -->
 <!-- Release badges end. -->
 
-
 <!-- ALL badges start: --> 
 <!-- [![CRAN status](https://www.r-pkg.org/badges/version/FFTrees)](https://CRAN.R-project.org/package=FFTrees) -->
 <!-- [![Build Status](https://travis-ci.org/ndphillips/FFTrees.svg?branch=master)](https://travis-ci.org/ndphillips/FFTrees) -->
@@ -197,13 +195,11 @@ plot(heart_fft,
 heart_fft$competition$test
 ```
 
-
 <!-- FFTs by verbal description: -->
 
 ### Building FFTs from verbal descriptions 
 
-FFTs are so simple that we even can create them 'from words' and then apply them to data! 
-
+FFTs are so simple that we even can create them 'from words' and then apply them to data.  
 For example, let's create a tree with the following three nodes and evaluate its performance on the `heart.test` data:
 
 1. If `sex = 1`, predict _Disease_.
@@ -235,11 +231,10 @@ plot(my_fft,
 ![An FFT created from a verbal description.](man/figures/README-example-heart-verbal-1.png)
 
 **Figure\ 2**: An FFT predicting heart disease created from a verbal description.  
-
-As we can see, this particular tree is somewhat biased: 
+The performance measures (in the bottom panel of **Figure\ 2**) show that this particular tree is somewhat biased: 
 It has nearly perfect _sensitivity_ (i.e., is good at identifying cases of _Disease_) but suffers from low _specificity_ (i.e., performs poorly in identifying _Healthy_ cases). 
 Expressed in terms of its errors, `my_fft` incurs few misses at the expense of many false alarms. 
-Although the _accuracy_ of our custom tree still exceeds the data's baseline by a fair amount, the FFTs in `heart_fft` (from above) strike a better balance. 
+Although the _accuracy_ of our custom tree still exceeds the data's baseline by a fair amount, the FFTs in `heart_fft` (created above) strike a better balance. 
 
 <!-- A range of options, rather than 1 optimum: -->
 
@@ -249,7 +244,7 @@ To explore this range of options, the **FFTrees** package enables us to design a
 
 ## References
 
-We had a lot of fun creating **FFTrees** and hope you like it too! 
+We had a lot of fun creating the **FFTrees** package and hope you like it too! 
 As a comprehensive, yet accessible introduction to FFTs, we recommend reading our article in the journal _Judgment and Decision Making_ ([2017](`r url_JDM_doi`)), entitled _FFTrees: A toolbox to create, visualize,and evaluate fast-and-frugal decision trees_ (available in [html](`r url_JDM_html`) | [PDF](`r url_JDM_pdf`)\ ).  
 
 

diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 <!-- README.md is generated from README.Rmd. Please only edit the .Rmd file! -->
 <!-- Title, version and logo: -->
 
-# FFTrees 1.9.0.9031 <img src = "./inst/FFTrees_Logo.jpg" align = "right" alt = "FFTrees" width = "225" />
+# FFTrees 1.9.0.9032 <img src = "./inst/FFTrees_Logo.jpg" align = "right" alt = "FFTrees" width = "225" />
 
 <!-- Devel badges start: -->
 
@@ -211,8 +211,7 @@ heart_fft$competition$test
 ### Building FFTs from verbal descriptions
 
 FFTs are so simple that we even can create them ‘from words’ and then
-apply them to data!
-
+apply them to data.  
 For example, let’s create a tree with the following three nodes and
 evaluate its performance on the `heart.test` data:
 
@@ -247,15 +246,15 @@ plot(my_fft,
 description.](man/figures/README-example-heart-verbal-1.png)
 
 **Figure 2**: An FFT predicting heart disease created from a verbal
-description.
-
-As we can see, this particular tree is somewhat biased: It has nearly
-perfect *sensitivity* (i.e., is good at identifying cases of *Disease*)
-but suffers from low *specificity* (i.e., performs poorly in identifying
+description.  
+The performance measures (in the bottom panel of **Figure 2**) show that
+this particular tree is somewhat biased: It has nearly perfect
+*sensitivity* (i.e., is good at identifying cases of *Disease*) but
+suffers from low *specificity* (i.e., performs poorly in identifying
 *Healthy* cases). Expressed in terms of its errors, `my_fft` incurs few
 misses at the expense of many false alarms. Although the *accuracy* of
 our custom tree still exceeds the data’s baseline by a fair amount, the
-FFTs in `heart_fft` (from above) strike a better balance.
+FFTs in `heart_fft` (created above) strike a better balance.
 
 <!-- A range of options, rather than 1 optimum: -->
 
@@ -267,12 +266,12 @@ package enables us to design and evaluate a range of FFTs.
 
 ## References
 
-We had a lot of fun creating **FFTrees** and hope you like it too! As a
-comprehensive, yet accessible introduction to FFTs, we recommend reading
-our article in the journal *Judgment and Decision Making*
-([2017](https://doi.org/10.1017/S1930297500006239)), entitled *FFTrees:
-A toolbox to create, visualize,and evaluate fast-and-frugal decision
-trees* (available in
+We had a lot of fun creating the **FFTrees** package and hope you like
+it too! As a comprehensive, yet accessible introduction to FFTs, we
+recommend reading our article in the journal *Judgment and Decision
+Making* ([2017](https://doi.org/10.1017/S1930297500006239)), entitled
+*FFTrees: A toolbox to create, visualize,and evaluate fast-and-frugal
+decision trees* (available in
 [html](https://journal.sjdm.org/17/17217/jdm17217.html) \|
 [PDF](https://journal.sjdm.org/17/17217/jdm17217.pdf) ).
 
@@ -333,6 +332,6 @@ Examples include:
 
 ------------------------------------------------------------------------
 
-\[File `README.Rmd` last updated on 2023-05-31.\]
+\[File `README.Rmd` last updated on 2023-06-01.\]
 
 <!-- eof. -->
diff --git a/man/plot.FFTrees.Rd b/man/plot.FFTrees.Rd
diff --git a/vignettes/FFTrees_heart.Rmd b/vignettes/FFTrees_heart.Rmd
@@ -142,7 +142,8 @@ For definitions of all accuracy statistics, see the [accuracy statistics](FFTree
 
 ### Step\ 4: Visualise the final FFT
 
-Use `plot()` to visualize an FFT (an `FFTrees` object):
+We use `plot(x)` to visualize an FFT (from an\ `FFTrees` object\ `x`). 
+Using `data = "train"` evaluates an\ FFT for training data (fitting), whereas `data = "test"` predicts the performance of an\ FFT for a different dataset: 
 
 ```{r fft-plot, fig.width = 6.5, fig.height = 6}
 # Plot predictions of the best FFT when applied to test data:
@@ -152,9 +153,12 @@ plot(heart.fft,      # An FFTrees object
 
 #### Other arguments
 
+The `plot()` function for `FFTrees` object 
+
 - `tree`: Which tree in the object should beplotted? To plot a tree other than the best fitting tree (FFT \#1), just specify another tree as an integer (e.g.; `plot(heart.fft, tree = 2)`).
 
-- `data`: For which dataset should statistics be shown? Either `data = "train"` (showing fitting or "Training" performance by default), or `data = "test"` (showing prediction or "Testing" performance).
+- `data`: For which dataset should statistics be shown? 
+Either `data = "train"` (showing fitting or "Training" performance by default), or `data = "test"` (showing prediction or "Testing" performance).
 
 - `stats`: Should accuracy statistics be shown with the tree? To show only the tree, without any performance statistics, include the argument `stats = FALSE`. 
 
@@ -166,7 +170,9 @@ plot(heart.fft, what = "tree")
 
 - `comp`: Should statistics from competitive algorithms be shown in the ROC curve? To remove the performance statistics of competitive algorithms (e.g.; regression, random forests), include the argument `comp = FALSE`. 
 
-- `what`: To show individual cue accuracies (in ROC space), include the argument `what = "cues"`:
+- `what`: Which parts of an `FFTrees` object should be visualized (e.g., `all`, `icontree` and `tree`). 
+Using `what = "roc"` plots tree performance as an ROC\ curve. 
+To show individual cue accuracies (in ROC space), specify `what = "cues"`:
 
 ```{r fft-cues, fig.width = 6, fig.height = 6, out.width = "500px"}
 # Plot cue accuracies (for training data) in ROC space:
@@ -176,20 +182,28 @@ plot(heart.fft, what = "cues")
 See the [Plotting FFTrees](FFTrees_plot.html) vignette for details on plotting FFTs. 
 
 
-### Additional steps
+### Advanced functions
+
+Creating sets of FFTs and evaluating them on data by printing and plotting individual FFTs provides the core functionality of **FFTrees**. 
+However, the package also provides more advanced functions for accessing, defining, using and evaluating FFTs. 
 
-#### Accessing outputs
+#### Accessing outputs 
 
-An `FFTrees` object contains many different outputs, to see them all, run `names()`
+An `FFTrees` object contains many different outputs. 
+Basic performance information on the current data and set of FFTs is available by the `summary()` function. 
+To see and access parts of an `FFTrees` object, use `str()` or `names()`: 
 
 ```{r fft-names}
-# Show the names of all of the outputs in heart.fft:
+# Show the names of all outputs in heart.fft:
 names(heart.fft)
 ```
 
+Key elements of an `FFTrees` object are explained in the vignette on [Creating FFTs with FFTrees()](FFTrees_function.html). 
+
+
 #### Predicting for new data
 
-To predict classifications for a new dataset, use the standard `predict()` function. 
+To predict classification outcomes for new data, use the standard `predict()` function. 
 For example, here's how to predict the classifications for data in the `heartdisease` object (which actually is just a combination of `heart.train` and `heart.test`): 
 
 ```{r fft-predict, eval = FALSE}
@@ -198,32 +212,14 @@ predict(heart.fft,
         newdata = heartdisease)
 ```
 
-#### Defining FFTs in words
 
-To define a specific FFT and apply it to data, we can define a tree by providing its verbal description to the `my.tree` argument: 
+#### Directly defining FFTs
 
-```{r fft-my-tree, results = 'hide'}
-# Create an FFT manually (from description):
-my.heart.fft <- FFTrees(formula = diagnosis ~.,
-                        data = heart.train,
-                        data.test = heart.test,
-                        main = "My Heart FFT",
-                        my.tree = "If chol > 350, predict True. 
-                                   If cp != {a}, predict False. 
-                                   If age <= 35, predict False, otherwise, predict True.")
-```
-
-Running this code evaluates `my.tree` for the specified sets of data. 
-A visualization of the resulting tree shows its performance summary 
-(for the training data): 
-
-```{r plot-my-fft, fig.width = 6.5, fig.height = 6}
-plot(my.heart.fft, data = "train")
-```
+To define a specific FFT and apply it to data, we can define a tree by providing its verbal description to the `my.tree` argument. 
+Similarly, we can define sets of FFT definitions (as a data frame) and evaluate them on data by using the `tree.definitions` argument of `FFTrees()`. 
+As we often start from an existing set of FFTs, **FFTrees** provides a set of functions for extracting, converting, and modifying tree definitions. 
 
-The resulting tree is actually not too bad, although its first node is pretty useless (as it only classifies 3\ cases, all as false alarms). 
-Thus, omitting the first node will result in an even simpler FFT that cannot be worse. 
-Feel free to verify this ---\ and see the [Manually specifying FFTs](FFTrees_mytree.html) vignette for additional details on defining FFTs from verbal or abstract descriptions. 
+See the vignette on [Manually specifying FFTs](FFTrees_mytree.html) for defining FFTs from descriptions and modifying tree definitions. 
 
 
 ## Vignettes