Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on the implementation of the BDF ode solver #7

Open
lamprosbouranis opened this issue Feb 17, 2022 · 6 comments
Open

Question on the implementation of the BDF ode solver #7

lamprosbouranis opened this issue Feb 17, 2022 · 6 comments

Comments

@lamprosbouranis
Copy link

Hi @jriou,

Congrats on the very nice work. This is not an issue, but rather a question on some choices made in your analysis.

Problem statement

My analysis involves an age-structured SEIR transmission model implemented in Rstan, with a time-varying effective contact rate and daily observations. The population of the country is at the order of \approx 1.4e+07. I have previously attempted to implement the BDF solver assuming that the number of individuals in each compartment is at the original scale. The solver appears to struggle and this creates a computational bottleneck.

At Section 2.1 of the Supplementary appendix of this paper, it is stated that the number of individuals in each compartment is scaled by the total population of the area, so that the sum of all compartments is 1. For reference, I have successfully replicated the analysis for Spain (period of data collection = 46 days) under the same MCMC setting, so I know what to expect in terms of computational time in my machine, e.g. by simply looking at the rstan message "Gradient evaluation took ... seconds"). The same implementation with RK45 in my machine appears to increase the computational cost per itaration by \approx 3 times.

When I am working with the option of scaled compartments, the computational time for gradient evaluation is at the same order of magnitude as the reference analysis, so I get rid of the computational bottleneck.

I see that Version "10_bales" of the model is the first time you switch from the RK45 to the BDF ode solver, but still working with a compartmental model that assumes homogeneous mixing of the population. Version 11 is the first time where the age-structured transmission model is used.

Questions

  1. What is the reason for working with the scaled compartments, then reverting the ODE solutions to the original scale? Let's focus on the dummy compartment for the sake of the discussion.
  2. What is the reason for accounting for the initial values of (S, E) compartments in the calculation of the derivatives, e.g dydt[k] = - f_inf[k] * (y[k]+init[k]); (file model16.stan, line 76)?
  3. What is the reason for choosing the particular BDF ODE solver control parameters? Was their choice informed by the order of magnitude of the compartments? Did you ever face very small changes between consecutive time points and was this addressed by experienting with different control parameters?

All the best,
Lampros

@jriou
Copy link
Owner

jriou commented Feb 28, 2022

Dear Lampros,

Thank you for your interest! As you say, most of the things you mention are just modelling choices, I'm not claiming that it is the only way to proceed but I can explain the rationale we had at the time.

  1. Scaling compartments to sum to 1 is a way to help the sampler. It is a general advice I got from Stan developers: as much as possible, try to have all quantities close to 0 with scale 1. In the case of an ODE model, which have a tendancy to "blow up" with the wrong set of parameters, I suspect that this also helps avoiding numerical issues with very large numbers.

  2. This is another trick I got from Stan developers. Basically, this is important if you have several init values that actually depend on 1 parameter (e.g., the parameter you want to estimate is the initial prevalence $\pi$ that will determine the initial values in S and I). If you define the initial values of S and I from $\pi$ in the classic and pass them through init, you will artifically increase the number of parameters to solve and thus computation times. If you reparametrize everything so that init is 0 as in our code, you avoid this problem. This is explained in details in section 5.2 of this paper.

  3. The choice between BDF and RK45 is often unclear. BDF is slower in general, but can deal with "stiff" systems better than RK45 that can get stuck. There is no clear definition of stiffness, and some ODE system may become stiff only in some parameter space. My general approach is to start with the default RK45, and at some point try the BDF to check if it brings any gain in performance. About the tolerances, it also depends on the situation. As you say, it can be informed by the order of magnitude of the components. It also depends on the variability in the data: it's a bit silly to solve an ODE system to a precision of 1e-9 if you expect a lot of noise on top of the ODE output. This is also explained in more details in section 5.3 of the above paper.

Happy to continue the discussion!
Best regards,
Julien

@lamprosbouranis
Copy link
Author

Hello Julien,

Thank you for your thorough answer and pointing out to the Bayesian workflow paper. Incidentally, I was reading it a few days ago, when you replied, and its advice is very useful indeed.

  1. Going through the supplementary material of the covid_adjusted_cfr work, the likelihood uses daily aggregated data on reported cases and deaths. I immediately asked myself why you didn't fit the model over respective daily mortality/reported cases data per age group. Was there just lack of information at this granularity at that stage of the pandemic?

  2. In Stan's Negative Binomial 2 parameterization, instead of the provision of the overdisperion parameter phi, you provide output_deaths[..]/phi. Is this related to the model definition?
    [In related work, like the "Age groups that sustain resurging COVID-19 epidemics in the United States" paper, their Negative Binomial likelihood over the age-group specific deaths, simply considers phi.]

All the best,
Lampros

@jriou
Copy link
Owner

jriou commented Mar 7, 2022

Hi Lampros,

  1. You're exactly right, at this point we did not have access to daily reports of cases and deaths by age group, but only to the total number of cases and deaths by day, and separately to the age distribution of cases and deaths during a period. This is why we used this approach of fitting to the marginals.

  2. Yes this reparameterization of the negative binomial was something we were considering at the time, the point was that it let the variance scale linearly with the location, and not with the location squared. This made it easier to deal with both very low and very high numbers. Since then we reversed to the traditional formulation of using just phi, which is how it is implemented in the Bayesian workflow paper.

Thanks again for your interest, feel free to contact me with more questions!
Best
Julien

@bernadette-eu
Copy link

bernadette-eu commented Mar 18, 2022

Hi Julien,

Thanks for your input. This parameterization of the NB appears to be working nicely, especially for younger age groups where mortality is lower. Something else I wanted to ask/clarify is the following:

Each element $c_{ij}$ of a contact matrix is the average contact rate between individuals in group $i$ with individuals in group $j$. In Eq. (7) for the calculation of the force of infection for age group $i$, the summation is performed over the j = 1,..., A columns of the contact matrix. However, the vectorized contact matrix that is fed to Stan represents the contact matrix in major column order. Shouldn't we be feeding the vectorized matrix in row major order instead? R code from the expample of Spain provided below.

`contact_matrix_europe = c(5.13567073170732, 1.17274819632136, 0.982359525171638, 2.21715890088845, 1.29666356906914, 0.828866413937242, 0.528700773224482, 0.232116187961884, 0.0975205061876398, 1.01399087153423, 10.420788530466, 1.5084165224448, 1.46323525034693, 2.30050630727188, 1.0455742822567, 0.396916593664865, 0.276112578159939, 0.0867321859134207, 0.787940961549209, 1.39931415327149, 4.91448118586089, 2.39551550152373, 2.08291844616138, 1.67353143324194, 0.652483430981848, 0.263165822550241, 0.107498717856296, 1.53454251726848, 1.17129688889679, 2.06708280469829, 3.91165644171779, 2.74588910732349, 1.66499320847473, 1.02145416818956, 0.371633336270256, 0.112670158106901, 0.857264438638371, 1.7590640625625, 1.71686658407219, 2.62294018855816, 3.45916114790287, 1.87635185962704, 0.862205884832066, 0.523958801433231, 0.205791955532149, 0.646645383952458, 0.943424739130445, 1.62776721065554, 1.87677409215498, 2.21415705015835, 2.5920177383592, 1.10525460534109, 0.472961105423521, 0.282448363507455, 0.504954014454259, 0.438441714821823, 0.77694120330432, 1.40954408148402, 1.24556204828388, 1.35307720400585, 1.70385674931129, 0.812686154912104, 0.270111273681845, 0.305701280434649, 0.420580126969344, 0.432113761275257, 0.707170907986224, 1.04376196943771, 0.798427737704416, 1.12065725135372, 1.33035714285714, 0.322575366839763, 0.237578345845701, 0.24437789962337, 0.326505855457376, 0.396586297530862, 0.758318763302674, 0.881999483055259, 0.688988121391528, 0.596692087603768, 0.292682926829268)

data.frame(contacts = contact_matrix_europe) %>%
tbl_df() %>%
mutate(age1=rep(1:9,9),age2=rep(1:9,each=9)) %>%
ggplot() +
geom_tile(aes(x=age2,y=age1,fill=contacts))
`

The Stan part is
for (i in 1:A) ... to_vector(contact[(A * (i-1)+1):(i * A)]) );

Thanks again and have a nice weekend,
Lampros

@anthonyhauser
Copy link

Hi Lampros,
This is Anthony, another contributor to the project. Thanks for the question.
In Eq 7, we used the contact matrix F, where the element F_{k,l} represents the average number of contacts of any individual in age group k with individuals in age group l.
Therefore, to calculate the force of infection of an individual in group k, we need to consider the k^th row of the matrix F (see Eq 7 where the sum is applied over the rows of F - and not the columns).
As you said, the vectorized matrix is then fed to Stan in row major order, which is consistent with the notation in Eq7.
Does it help?
Have a nice weekend too.

@bernadette-eu
Copy link

Hi Anthony,

Thanks for the clarification, it helped. Have a great week ahead.

Best,
Lampros

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants