/
ag_model.rmd
211 lines (174 loc) · 9.71 KB
/
ag_model.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
---
title: "evolution of accessory glands: interaction with parental care and mating system"
date: "`r format(Sys.Date(), '%d %B %Y')`"
author: "Ben Bolker and Lucas Eckert"
output:
html_document:
code_folding: hide
bibliography: corHMM.bib
---
```{r setup, message=FALSE}
source("R/utils.R")
source("R/mcmc.R")
source("R/functions.R")
load_pkgs()
zmargin <- theme(panel.spacing = grid::unit(0, "lines"))
theme_set(theme_bw())
library(targets)
library(phytools)
library(diversitree)
grDevices::X11.options(type = "cairo")
```
(**note** this document is gradually being disassembled/re-structured into the `ag_supp.rmd` (supp material) and `ag_tech.rmd` (technical notes) documents. What's left here is anything that hasn't already migrated.)
```{r mosaic, eval=FALSE, echo=FALSE}
## cool but useless
tar_load(ag_compdata)
(ag_compdata$data
%>% mutate(ag = paste0("ag:",ag),
pc = paste0("pc:",pc),
sc = paste0("sc:",sc))
%>%
ggplot() +
geom_mosaic(aes(x = product(pc, sc), fill = ag)) +
geom_mosaic_text(aes(x = product(pc, sc), fill = ag), as.label = TRUE) +
scale_fill_discrete_qualitative()
)
```
```{r phylo_plot, message=FALSE, fig.width=8, fig.height=8}
## slow: cache?
tar_load(treeblock)
tar_load(ag_compdata_tb)
dd <- ag_compdata_tb$data
rownames(dd) <- dd$species
dd <- dd[,-1]
dd$ag <- factor(dd$ag)
dd[c("sc", "pc")] <- lapply(dd[c("sc", "pc")], factor, levels = c("0","1","?"))
## https://yulab-smu.top/treedata-book/chapter7.html
tt <- drop.tip(treeblock[[1]], setdiff(treeblock[[1]]$tip.label, rownames(dd)))
circ <- ggtree(tt, layout = "fan", open.angle=5)
p1 <- (gheatmap(circ, dd[,"ag", drop=FALSE], width=0.1)
+ scale_fill_manual(name = "ag", values=paste0("grey", c(90,10)))
+ new_scale_fill()
)
p2 <- (gheatmap(p1, dd[,"pc", drop=FALSE], width=0.1, offset=0.001)
+ scale_fill_manual(name = "pc", breaks = c("0", "1", "?"), values=c("darkblue", "lightblue", "grey50"))
+ new_scale_fill()
)
p3 <- (gheatmap(p2, dd[,"sc", drop=FALSE], width=0.1, offset=0.002)
+ scale_fill_manual(name = "sc", breaks = c("0", "1", "?"), values=c("darkred", "pink", "grey50"))
+ new_scale_fill()
)
print(p3)
```
## Priors
According to @pagel_bayesian_2006, the prior distributions on the rate parameters are determined as follows:
> We did not specify the mean or variance of the gamma but rather seeded its two parameters, normally labeled $a$ and $b$, by drawing from a uniform (0–10) hyperprior distribution. The use of a hyperprior allows the investigator to remain relatively uncommitted about the details of the prior distribution, allowing them to be estimated from the data. In most comparative studies, investigators have very little information about the mean and variance of the rate coefficients.
This is a mild overstatement: the hyperparameters induce their own distribution (see below).
@pagel_bayesian_2006 also comment that the prior scale also depends on the units in which branch length is measured. The BayesTraits package provides an option to scale branch lengths so that the *mean* branch length is 0.1 [@meade_bayestraits_2016].
```{r prior_calcs}
n <- 100000
set.seed(101)
r <- rgamma(n, shape = runif(n, 0, 10), scale = runif(n, 0, 10))
## BayesTrait scales *mean* branch length to 0.1
## in our case
tot_edge <- attr(treeblock[[1]], "orig_sumbranches")
n_edge <- length(treeblock[[1]]$edge.length)
## our mean branch length is 1/n_edge, Pagel & Meade's is 0.1
## Pagel & Meade's total branch length is 0.1*n_edge
ntip <- ape::Ntip(treeblock[[1]])
b <- log(c(0.1, 100*ntip))
m <- mean(b)
sd <- (b[2]-m)/3
## scale from log to log10
m <- m / log(10)
sd <- sd / log(10)
r_scaled <- r*(0.1*n_edge)
```
We tried to choose priors more mindfully. We started by scaling the branch lengths to set the *sum* of all the branch lengths (i.e. the total evolutionary time experienced by all lineages to 1). Event rates of discrete transitions are measured on the *hazard* scale (probability density of an event per unit time); a rate of $r$ corresponds to a mean of $r$ transitions per unit time. We therefore set the lower limit of our prior to 0.1, meaning an expectation of only a single transition over the entire tree. For a tree with $N$ species, we set the upper limit of the prior to $100 N$, i.e. an expectation of 100 transitions *per species*. These are obviously extreme values; we used a log-Normal prior with the (geometric) mean set halfway between these values (an expectation of `r round(exp(mean(b)))` events of the course of the tree, or `r round(exp(mean(b))/ntip)` events per species in our phylogeny of `r ntip` species), and the standard deviation set so that the range between the lower and upper limits is 6 SD. This range corresponds to a 0.26% prior probability that the rates are beyond the limits.
(Talk about uniform prior issues? [@yang_bayesian_2006; @carpenter_computational_2017; @thorson_uniform_2017])
**todo**: maybe it's no longer relevant, but does BayesTraits *really* use flat priors by default?? Double-check manual ...
```{r prior_plot, warning=FALSE}
## plot(density(r))
brkvec <- seq(-2,6, by =2)
ggplot(data.frame(r=r_scaled),
aes(x=log10(r))) +
geom_density(fill="grey") +
stat_function(fun = function(x) dnorm(x, m, sd), geom ="area", col="blue", fill="blue", alpha=0.2) +
scale_x_continuous(breaks = brkvec,
labels = 10^brkvec,
limits=c(-3,6),
sec.axis = sec_axis( ~ . - log10(ntip),
name = "transitions per species",
breaks = brkvec-3,
labels = 10^(brkvec-3)
)) +
labs(y="prior probability density", x = "expected events per tree")
```
Here are the 95% quantile ranges of the prior distributions:
```{r prior_tab}
qq <- quantile(r, c(0.025, 0.975))
qqr <- qq*(0.1*n_edge)
rlims <- m + c(-1.96,1.96)*sd
qtab <- matrix(c(qqr, qqr/ntip, 10^rlims, 10^rlims/ntip), nrow=2, byrow=TRUE,
dimnames = list(c("Pagel & Meade", "ours"),
c("min per tree", "max per tree", "min per species", "max per species")))
knitr::kable(as.data.frame(qtab), digits=3)
```
### Gain/loss priors
We also decided to put prior distributions on the ratio of gain rate to loss rate. (Need to put more discussion/justification here.) This is less important since we also chose to fix the root (ancestral state) of the population at "no accessory glands, no paternal care, group rather pair spawning"; specifying this information should orient the tree and make it easier to distinguish losses from gains.
- `pc`: gain/loss ratio from 0.1 to 5
- `sc`: gain/loss ratio from 5 to 10
- `ag`: gain/loss ratio from 0.001 to 10
As with the priors on the rates, each of these ranges is used as the basis for
a +/- 3 SD range (on the log scale) of the ratio.
```{r cifun}
tar_load(ag_model_pcsc)
tar_load(ag_mcmc0)
tar_load(all_ci)
```
Distribution of states in the tree (computed from stochastic character mapping, histograms represent 100 simulations):
```{r states, warning=FALSE}
tar_load(states_df)
pivot_longer(states_df, everything(), names_to = "state") %>%
ggplot(aes(x=value)) + geom_histogram(bins=25) + facet_wrap(~state) + zmargin
## sum columns then divide by total to get sum == 1
states_avg <- apply(states_df, 2, sum, na.rm = TRUE) / sum(states_df, na.rm = TRUE)
```
Need to figure out weights e.g. if we want the intercept for `loss.ag` it should be something like:
```{r wts, eval=FALSE}
tar_load(starts_with("contrast"))
intercept_loss <- sum(occ.ag0_pc{i}_sc{j} * loss.ag_pc{i}_sc{j})
pc_loss <- (sum(occ.ag0_pc1_sc{i} * loss.ag_pc1_sc{i}) - sum(occ.ag0_pc0_sc{i} * loss.ag_pc0_sc{i})) /
sum(occ.ag0_pc{i}_sc{j})
```
etc. ...
**FIXME**: think more about net gain and how to characterize it.
Certainly should *not* be exponentiated, so leave it out here ...
Experiments with weighted contrasts:
```{r wtd_contrasts}
tar_load(contrast_mat)
wts <- rep(NA, nrow(contrast_mat))
names(wts) <- rownames(contrast_mat)
snm <- names(states_avg)
wts["gain.sc"] <- sum(states_avg[grepl("sc0", snm)]) ## can gain sc whenever sc==0
wts["loss.sc"] <- sum(states_avg[grepl("sc1", snm)]) ## " lose " " sc==1
wts["gain.pc"] <- sum(states_avg[grepl("pc0", snm)]) ## can gain pc whenever pc==0
wts["loss.pc"] <- sum(states_avg[grepl("pc1", snm)]) ## " lose pc " pc==1
## can gain ag when ag==0 are missing; fill in values for
gain.ag_states <- grep("^gain.ag", names(wts), value = TRUE)
wts[gain.ag_states] <- states_avg[gsub("^gain.ag_", "ag0_", gain.ag_states)]
loss.ag_states <- grep("^loss.ag", names(wts), value = TRUE)
wts[loss.ag_states] <- states_avg[gsub("^loss.ag_", "ag1_", loss.ag_states)]
## scale contrast values by the time spent in the relevant state
contrast_mat_wt <- sweep(contrast_mat, 1, wts, "*")
## now scale each column (contrast) by the _total_ time spent in the relevant states
pos_mat <- contrast_mat != 0
storage.mode(pos_mat) <- "numeric"
col_vals <- colSums(sweep(pos_mat, 1, wts, "*"))
contrast_mat_wt <- sweep(contrast_mat_wt, 2, col_vals, "/")
##plot_grid(image_plot(contrast_mat), image_plot(contrast_mat_wt))
```
Hmm, not sure about weighted contrasts any more. It's tricky.
- implement weighting by branch state: for a given focal trait (`ag`) and modifier (`sc`), want to weight the elements of the contrast only by the *non-focal* trait (e.g. if `w_pc0` and `w_pc1 = 1-w_pc0` are the overall fractions of time that the branches are in the corresponding states, then we want the `sc_loss` contrast to be `w_pc0*(loss.ag_pc0_sc1-loss.ag_pc0_sc0) + (1-w_pc0)*(loss.ag_pc1_sc1-loss.ag_pc1_sc0)`. (The factors in front of the parenthesized terms are currently 1/2.) Similar expressions would apply to the gain term and to the `pc` analogues.
---
## References