-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DE, DEzs, DREAM, DREAMzs: settings$nrChains = x results in 2x or more chains #224
Comments
This is all expected / intended behaviour, although it may be true that documentation can be improved. The DE sampler family are so-called population MCMCs, that is, if you do one MCMC run, the MCMC runs several internal chains. Note, for example, that starting conditions for all DE family samplers is a matrix, not a single parameter vector. The argument nrChains in the runMCMC function, in retrospect, should have probably been called nrIndependentMCMCruns or something like that, because it determines how many independent MCMCs are run, which is 3 in your example. I know that Cajo ter Braak has suggested that it is sufficient to run only on DE MCMC, and check convergence of the internal chains, based on the proof in one of his papers that the internal chains in the DE algorithms are independent when the sampler is equilibrium. However, when working with the DE MCMCs in practice, you will notice that internal chains in the DE family often show cross-correlation. I don't actually know what the exact reason for this is, but we set up the starting conditions and Z matrix from the prior, which is substantially wider than the posterior, but we still often observer that the internal chains are more similar than the independent MCMC runs. Based on this, my / our recommendation (which I think is also noted somewhere in the help) is to not treat the internal chains of the DE family as independent MCMC chains. From this logic, we derive the following behaviour:
|
I wanted to suggest some possible documentation improvements based on observations of unexpected behavior and some looking through these samplers' source code.
For context, I'm evaluating several variations of the
runMCMC
call below and noticed trace plots were showing more chains thannrChains
. Calling coda package methods also shows a mismatch betweennrChains
(3 in this example) and the number of chains returned (9).Looking in the BayesianTools sources, these samplers seem set up to yield
Npop * nrChains
chains, whereNpop
is the number of rows in the initialization matrixX
. In the default case wheresettings$startValue
isn't populated,X
is set by these lines of code.There is also
From looking through the journal papers on DE, DEzs, DREAM, and DREAMzs I think the way BayesianTools is designed is that
settings$nrChains
is not actually the intended number of chains. It's the number of populations.If I'm understanding correctly, I think what this indicates is
settings$nrChains = 1
is a typical default for these samplers and, if it's desired to have a number of chains different from the algorithm defaults, manipulation ofsettings$startValue
may be more desirable than changingnrChains
.Apologies if I've missed something in the package documentation but, assuming the above interpretation is correct, it seems to me it may be helpful to
nrChains
in the documentation forrunMCMC()
.nrChains
in the settings documentation for DE, DEzs, DREAM, DREAMzs, and similar.nrChains
in the settings objects for population based samplers.A related consideration is these four samplers all calculate the length of a chain as
n.iter <- ceiling(settings$iterations/Npop)
but the current documentation for
iterations
is typically worded in ways which suggests it controls the total number of times likelihood is evaluated across all chains rather than for a single population of chains. SinceNpop
isn't readily user visible and it happens to be thatNpop = nrChains = 3
appears to be a recommended configuration for some of these samplers, it's easy for a user to mistakenly conclude the chains are of lengthceiling(settings$iterations / settings$nrChains)
. There's also potential for confusion over the total number of likelihood evaluations beingnrChains * iterations
ratheriterations
.The text was updated successfully, but these errors were encountered: