- (BMB) figure out the appropriate scaling of the prior for a mean branch length of 0.1
- sum(branch lengths) == mean(branch lengths)*(number of branches)
- so need to count number of branches
- implement multi-chain BT runs (set seeds differently), calculate R-hat etc.
- fix contrast calculation
- (gain contrasts, loss contrasts) x (with, w/o data) x (rj and regular) x (our priors, default priors)
- HIGH priority:
- fixing known weirdness (prior scaling, contrast calcs)
- figuring out discrepancy between our and BT baseline results
- LOWER priority:
- better Bayes diagnostics (multi-chain etc.)
- RJ
- workflow cleanup
- BMB: run prior-only model to extract log-mean and log-sd for rate-level priors (also look at correlations)
don't need to
prior.mean <- (lb+ub)/2
prior.sd <- (ub-lb)/(2*range)
lb = log(1),
ub = log(10 * ape::Ntip(ag_compdata$phy))
## sum(branch lengths) scaled to 1
- BMB: try to figure out if R-hat or any other standard MCMC diagnostics are available for BT (other than acceptance rate)
- soften description of BT vs our stuff
- LE ??maybe??: run RJMCMC
- LE: try to run with 'priors only' (RJMCMC)
- BMB: weighting contrasts
- LE: what would an example look like of weighted contrasts in a simple case where we had a smaller trait space?
- LE: figure out how to run multistate model with disallowed transitions
- BMB: fight with
btw
, maybe?
- segregate core vs pix packages
- fuss with transition plots -- label on legend?
- subscripts in output table?
- check with Beaulieu et al. about preferred ref
- Lit review/sample of how BayesTraits is used?
- inference?
- default priors/mention of priors?
- caveats para (hard-coded root; 12 vs 24 parameters)? Comment on (lack of) sensitivity?
- PDF/docx output? (code folding only in HTML ...)
- Work harder on getting reliable CIs for MLE fits
- Roll bayesdiag into supplement (or, keep as a separate doc?)
- tech supp
- static picture of
targets
flow?
- static picture of
- rerun with prior on root rather than hard value?
- Compute Bayes Factors???
- Stepping-stone algorithm??? see Baele and Vansteelandt 2013
Baele, Guy, Philippe Lemey, and Stijn Vansteelandt. “Make the Most of Your Samples: Bayes Factor Estimators for High-Dimensional Models of Sequence Evolution.” BMC Bioinformatics 14, no. 1 (March 6, 2013): 85. https://doi.org/10.1186/1471-2105-14-85.
- Weight contrasts by state occupancy?
- dig out/discuss description of BayesTraits priors?
- Mention that losses are uncertain/less well estimated because AGs are absent in most of the tree?
- cleanup/code stuff
- use
renv
for package versions? - priors should be specified externally to
_targets
file - make sure thinning params get passed through to MCMC function (but switch targets to use 10 rather than 20 to match current setup ...)
- describe/check lower/upper bounds in
ag_model_tb
(0.1 to 100N) vsag_model_pscs_prior
(1 to 10N) ?? - double-check thinning? (supposedly ran burn-in of 4000 + 80000 steps with thinning of 20; how come we have n = 8000 in the resulting chains? thinning arg didn't get passed through)
- use
- try all of the following for the two trait model
- BayesTraits and BayesFactors - DONE
- Reversible Jump Models
- Bayesian with contrasts (with tree block)
- Pagels test - DONE
- likelihood ratio test - DONE
- AIC comparison - DONE
- BayesTraits with contrasts
- put it all together in a rmd to share with Sigal
- compute ESS for bayesian stuff - DONE
- sanity check on contrasts
-
weight contrasts by state occupancy (????)
-
likelihood (point) comparison, 4 way (independent, pc-only, sc-only, full {pc+sc})
-
move tech/devel stuff out of
ag_model.rmd
, more text/explanation -
reorder params before Bayes pairs plots?
-
consider run/comparison for spawning vs repro system?
-
why is evidence apparently stronger in 2-trait model?
- joint test on 2 parameters (gain diff, loss diff) ?
- dilution by non-focal parameters?
-
general sanity check of parameters (DONE)
-
sanity check of treeblock runs (DONE)
-
compute improved R-hat (DONE)
-
visualize tip state distribution for non-treeblock/treeblock (DONE)
- run two trait model with care and compare to previous results
- make sure roots are set
- think about tree blocks
- implement priors/scaling (DONE)
- shorten names? (DONE)
- parameter ordering and log-scaling for params plot (DONE)
- adjust
getdata
to allow flexibility in selected trait columns (LOW) - dynamic branching (for primates + ag + ag w/o missing data + .. ? (LOW)
- finish contr - done
- synthesize advice on priors (gain/loss ratios)
- primate comparison mcmc
- think about why some pars are so uncertain (trait combinations)
- shorten names?
- think about tree blocks
- adjust
getdata
to allow flexibility in selected trait columns - dynamic branching (for primates + ag + ag w/o missing data + .. ?
- parameter ordering and log-scaling for params plot
- phylo scaling (pre- or post-hoc)
- discrete vs continuous comparison (Pagel/Meade/etc vs parameter estimation)
- Pagel & Meade
- Beaulieu O'Meara
- your thesis
- list of stuff to do/intentions, classified by importance and difficulty
- figure out where the tree block data comes from ... ?
- interactions
- https://www.embarcadero.com/starthere/xe5/mobdevsetup/ios/en/installing_the_commandline_tools.html
d <- as.data.frame(matrix(rnorm(6000), ncol=6))
library(tidyr)
d_long <- pivot_longer(d, cols=everything(), names_to="var")
library(ggplot2)
ggplot(d_long, aes(x=value)) + facet_wrap(~var) +
geom_histogram(binwidth=0.1)
- timing and venue
- gain/loss priors
- finish naming stuff (done)
- fix missing data printing bug: partially done (no longer breaks, but also doesn't match number of states? collapse vs non-collapse?)
- keep working on
targets
:- dynamic
- target documentation?
- tree blocks?
- gain/loss ratio?
- have a parameter set and you've identified gain/loss pairs: R1, R2, ... Rn
- right now we're passing a full parameter set and sticking parameterwise priors on top
- but we can fairly easily add priors for (R1[1] - R1[2]), etc. : in limit we could say R1[gain] - R1[loss] ~ dnorm(0, sd=0.0001) -> enforcing symmetry (Sigal??) differences in log-hazard rates are ("just") logs of proportional differences in gain vs loss for example, we could specify (lower bound, middle, upper bound) [loss rate is 50% higher than gain, middle = loss 10% higher than gain, upper = gain 20% higher than loss] - range +/- 2 or 3 SD
- priors on branch length:
- figure out ?s for missing data
- build up contrasts by computing intermediate values, e.g.
To identify a parameter we need to know the both the row ('from') and column ('to') name variables as 'gain' and loss'
{ag0_pc0_sc0, ag0_pc0_sc1} -> "gain.sc_ag0_pc0"
{ag0_pc0_sc1, ag0_pc0_sc0} -> "loss.sc_ag0_pc0"
## difference in gain rates for sc1 vs sc0 given pc0
sc.gain.pc0 <- gain.ag_sc1_pc0 - gain.ag_sc0_pc0
sc.gain <- (sc.gain.pc1 + sc.gain.pc0)/2
sc.netgain <- sc.gain - sc.loss
- bivariate plot of
sc.gain
vssc.loss
- histograms or violin plots of
sc.gain
,sc.loss
,sc.netgain
var sample value
sc.gain 1 ...
sc.gain 2 ...
...
sc.loss 1 ...
ggplot(df, aes(x=var, y=value)) + geom_violin()
- is there a parameter contrast for combined effects of PC and SC?
- loss of power due to missing data? (either in phylog or in traits)
## after introducing "?" for missing values and reading ?corHMM
corHMM:::corProcessData(data) ## then figure out what all the bits are
- multi-chain MCMC, HPD regions on pairs plots (done)
- helper function for variable names?
- look into missing-value computations ...
missing data, mostly for parental care/mating system traits (AG info is known)
- maximum fallback: forget it all, go back to BayesTools
- or: run pairwise analyses in augmented corHMM (i.e. get confidence intervals etc.)
- or: single-trait single imputation (continuous hack via phytools or invent something. Use ancestral character estimation/mapping to MRCA of known and unknown taxa/trait combos
- or: multiple imputation? ugh
- or: see if we can roll an 'unknown state' and/or unobserved values into the machinery
- consensus tree from tree block?
- use complete tree?
-
think about multimodal likelihood surface
- MCMC algorithms
- multiple chains?
- run lots longer?
- thin?
-
gain/loss polarity
- clear picture of ideal program
- figure out how to parameterize interactions
- compare monkey/Pagel & Meade?
- power differences?
- biologically
- weighting when averaging across states?
- figure out a good workflow that handles slow bits? Do we need
make
, ortargets
, now? - for now: Rmarkdown with caching, plus explicit caching
- LE: install
styler
add-in, use it to clean up formatting as you go: https://styler.r-lib.org/ - BMB: implement multi-chain/multi-start/parallel M-H (with tuning??) e.g. start with the code here (+ pick random starting values from the priors, use Hessian for scale; adapt scale??) (NIMBLE package is suggested; or the Metropolis function from BayesianTools;
ramcmc
package? - image method for rate matrices? improved parameter naming?
- what about identifiability of gain/loss?
- are they interested in the philosophical/CI-vs-hypothesis-comparison stuff?