icer
was designed to assess for interference between Plasmodium
species to see if some species appeared in infections more often than
expected under the assumption of independent random infection events,
i.e. one species type is introduced from one infectious bite.
icer
works out the most likely population frequency of each species
and the most likely number of infection types in individual by
comparison to the observed data. These are then used to test if our data
can be explained by this distribution by comparing the observed data to
the bootstrapped samples from the resultant multinomial distribution
describing the probability of being infected with each infection
compositon type.
The approach is somewhat generic and could be used for any coinfection or cooccurence data in which the observed prevalence of events may not be predictive of the frequency of the infecting entities.
Let’s assume we have some data about occurence of Plasmodium falciparum, malariae and vivax as follows:
data <- c("pf" = 902, "pm" = 200, "pv" = 44,
"pf/pm" = 9, "pf/pv" = 29, "pm/pv" = 1,
"pf/pm/pv" = 1)
We can test to see if any of the coinfection types occur more or less than best explained by the data and the independence model.
library(icer)
#> Registered S3 methods overwritten by 'ggplot2':
#> method from
#> [.quosures rlang
#> c.quosures rlang
#> print.quosures rlang
res <- cooccurence_test(data)
#> iter 10 value 38.971224
#> final value 38.971219
#> converged
#> No id variables; using all as measure variables
The resultant plot suggests that independent infection events do not perfectly explain this data, with the predicted number of pf/pm being too high, with the boostrapped estimates from the best fitting model notincluding the observed data well (5%-95% quantile in blue, median in white and observed data in red).
We can then change what type of model we use to describe the acquisition
of species. For example maybe there is some interference between
falciparum and malariae and the inverse for falciparu and vivax. We can
specify starting values for how the probability of acquiring an
additional strains depends of the first strain acquired. e.g. k_12
is
the multiplication of the probability of the next infection being pm
given the first infection was pf
and vice versa, i.e. next infection
being pf
given the first infection was pm
. (The numbers refer to the
species by alphabetic
order):
res_interf <- cooccurence_test(data, density_func = icer:::interference,
k_12 = 0.5, k_13 = 2, k_23 = 1)
#> iter 10 value 15.580477
#> iter 20 value 13.643069
#> iter 30 value 13.516104
#> iter 40 value 13.188206
#> iter 50 value 13.092270
#> final value 13.082618
#> converged
#> No id variables; using all as measure variables
That’s a lot better! We can see what the best estimate of the frequencies of each species, the mean number of infections (different to number of different species, i.e. you could have 6 infections but be Pf/Pv if you have 4 Pf infections and 2 Pv infections), and the values for the interference.
res_interf$params
#> $params
#> pf pm pv mu size
#> 7.657616e-01 1.755459e-01 5.869248e-02 1.726855e+00 1.000043e+02
#> k_12 k_13 k_23
#> 7.764434e-03 5.867866e-02 1.000000e-02
#>
#> $multinom
#> pf pm pv pf/pm pf/pv
#> 0.7604943863 0.1687349573 0.0367985499 0.0076748673 0.0245925814
#> pm/pv pf/pm/pv
#> 0.0012177302 0.0004869276