Skip to content
This repository has been archived by the owner on Dec 3, 2019. It is now read-only.

Empirical protein models like WAG+F with estimated frequencies are not supported #132

Open
bredelings opened this issue Jul 31, 2018 · 4 comments

Comments

@bredelings
Copy link
Contributor

bredelings commented Jul 31, 2018

Now that #130 has been fixed, fnWAG() uses the fixed frequencies estimated in the WAG paper, and produces a fixed Q matrix with no parameters.

However, what people usually do is to estimate the frequencies, while fixing the symmetric exchangabilities. This would be easy to code, the only question is what kind of syntax we would want and what to name things.

  1. Basically we want something like fnWAG() has the current behavior, but fnWAG(pi) uses frequencies in pi. Then we could place a dirichlet distribution (or something) on pi.

  2. Also, technically, this is a GTR model, with exchangabilities supplied by WAG, and frequencies pi being estimated. If the GTR model could be changed to take a symmetric matrix, then we could make fnWAG() just return the symmetric matrix, and WAG+F would be something like fnGTR(fnWAG(),pi).

  3. A third approach (which seems to work so far) is to define fnWAG(pi) to always take a frequency vector. We then add a fnWAG_freq() to yield the fixed frequencies from the WAG paper. Users would the write fnWAG(pi) to estimate frequencies pi, and would write fnWAG(fnWAG_freq()) to use the fixed frequencies.

Since estimating frequencies is more common than using the fixed frequencies, I would recommed something like approach 3. If revbayes functions support default values for parameters, we could make fnWAG_freq() to be the default_value of pi for fnWAG(pi), which would be pretty nice.

Thoughts?

P.S. Here is a case where someone wants to estimate the amino-acid frequencies, although not with the WAG - https://groups.google.com/forum/#!topic/revbayes-users/cmhwuYklecg

@bredelings
Copy link
Contributor Author

@mlandis @hoehna @jembrown

@jembrown
Copy link
Member

jembrown commented Aug 2, 2018 via email

@mlandis
Copy link
Member

mlandis commented Aug 2, 2018

(I emailed this to Ben, but I guess GitHub didn't add it to the thread)

Personally, I like Option 1 the best, since it wouldn't require new users to learn "special" functions to design their models. If we want to allow all empirical rate matrices to accept pi/er parameters, that might require some deeper redesign/reorganization of the empirical rate matrix family. So I'd vote to hold off on that for now.

A variant on Option 2 would be to add a helper function that supplies various empirical rate matrix values, e.g.

bf_WAG <- makeEmpiricalMatrixValues(model="WAG", parameter="frequencies")
er_WAG <- makeEmpiricalMatrixValues(model="WAG", parameter="rates")
Q_const <- fnGTR( exchangeRates=er_WAG, baseFrequencies=bf_WAG )

bf_flat ~ dnDirichlet( simplex(rep(1,20)) )
Q_flat := fnGTR( exchangeRates=er_WAG, baseFrequencies=bf_flat )

bf_emp ~ dnDirichlet( simplex(bf_WAG) )
Q_emp := fnGTR( exhangeRates=er_WAG, baseFrequencies=bf_emp )

What do you think?

@bredelings
Copy link
Contributor Author

Hi Michael, I didn't see your e-mail, just the github post.

Anyway, yes, it does seem like Option 1 is nicest and easiest to guess or learn. Does RevBayes allow different functions to have the same name but different numbers of arguments? Alternatively, does RevBayes allow functions to have default values for parameters (Option 3)? If either is true, then I think I see how to implement this.

Your variant on Option 2 is interesting. I like the option to use the bf_WAG but put a prior on it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants