feature request: allow user to pass in a vector of values for the residual error in the gaussian outcomes case #76

mikejacktzen · 2022-09-27T00:12:30Z

From the readme,

https://github.com/fabsig/GPBoost#gpboost-and-lagaboost-algorithms

Focusing on the specific case of gaussian outcomes, Using the convention

y = F(X) + Zb + xi

where xi is an independent error term and X are predictor variables (aka covariates or features).

Is it possible to enable a feature where the user could supply in the values of xi. Therefore xi is not estimated, the user passes in a vector of values 'xi = c(1, ...., 10)' where the length of xi is equal to the length of the outcome

as,

# in R

gp_model <- fitGPModel(group_data=group_data, 
               xi=c(1,3,4, ... , 10) ,
               likelihood="gaussian",
               y=outcome_xgb, X=X)

If you think this is do-able, I could elaborate further on the motivation

Thank you.

The text was updated successfully, but these errors were encountered:

fabsig · 2022-09-27T06:04:48Z

@mikejacktzen: Thank you for your suggestion.

It seems to me that the easiest way to do this would be to

subtract \xi from y and pass y' = y - \xi as "new y" to GPBoost
and setting the error variance \sigma^2 to a very low value and not estimating it.

Unfortunately, the second point (telling GPBoost to hold the error variance fixed and not estimate it) is currently not possible. This would require some software engineering. I might implement it in the future, but it is not at the top of my priority list. Contributions are welcome.

As an alternative you might try my above-suggested approach (1.) without (2.) and check how large the estimated error variance is. Note that the \xi values themselves are anyway not estimated.

Note on the above-suggested approach: even if you assume that xi is known exactly and that the only source of randomness are the random effects Zb, you usually need to add a small number to the diagonal of the random effects covariance matrix in order to ensure that the covariance matrix is "numerically" strictly positive definite and can be inverted with the Cholesky decomposition without numerical problems. And this approach is equivalent to setting the error variance to a very small known value.

mikejacktzen · 2022-09-27T19:19:59Z

Thanks for the thoughtful reply. I understand to your agreeable points. Those numerical stability checks are always a fun surprise.

I think I rushed my original post (excited about the possibility), instead of editing it (so it doesn't confuse other readers when they get to your reply), i'll continue here. My rushed OP confused xi (the residual terms) with the covariance term of its distribution.

What I really mean is, for the normal outcomes case only, let the user provide the covariance matrix D of the residual errors xi

y = F(X) + Zb + xi
xi ~ Normal(0,D)

I would only use this for the special case when D is the diagonal matrix of a user supplied vector s_xi,
D = diagonal(s_xi) = s_xi .* I
the most general case is if you let the user provide arbitrary D, this puts the burden on the user to create D and do their Semi Positive Definite checks ( i think a good thing ). so you don't get flooded with requests for any new variant structure of D (like i'm doing here).

Maybe this is the limitation your make in point (2) in your reply above, which sounds like what I need.

gp_model <- fitGPModel(group_data=group_data, 
               s_xi=c(1,3,4, ... , 10) ,  # if you restrict to the special case in the underlying C code
               likelihood="gaussian",
               y=outcome_xgb, X=X)

gp_model <- fitGPModel(group_data=group_data, 
               D=diag(s_xi),  # if you let the user pass in any D
               likelihood="gaussian",
               y=outcome_xgb, X=X)

# Should only work for the gaussian case
# if ! is.null(D) & likelihood != 'gaussian' error

With this new corrected detail, do your points still apply?

When xi ~ Normal(0,D) ,

Are the gpboost internal algorithms crucially reliant on working with D = scalar .* Identity, (how I understand the current set up)

If not, I hope a easy next feature upgrade would allow for D = vector .* Identity. where the vector only has positive entries

If so, I can continue this via email, with some very important use case contexts that I think is ripe for co authoring papers.

fabsig · 2022-09-28T07:05:03Z

Thanks for the clarification. Providing the covariance matrix of the error term instead of the error term itself makes much more sense. This could be implemented but requires some software engineering. I would be willing to work on this if the goal is a joint publication in a collaboration. Just write me an email. You can find my email address on my public university homepage.

fabsig added the enhancement New feature or request label Sep 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature request: allow user to pass in a vector of values for the residual error in the gaussian outcomes case #76

feature request: allow user to pass in a vector of values for the residual error in the gaussian outcomes case #76

mikejacktzen commented Sep 27, 2022

fabsig commented Sep 27, 2022

mikejacktzen commented Sep 27, 2022 •

edited

fabsig commented Sep 28, 2022

feature request: allow user to pass in a vector of values for the residual error in the gaussian outcomes case #76

feature request: allow user to pass in a vector of values for the residual error in the gaussian outcomes case #76

Comments

mikejacktzen commented Sep 27, 2022

fabsig commented Sep 27, 2022

mikejacktzen commented Sep 27, 2022 • edited

fabsig commented Sep 28, 2022

mikejacktzen commented Sep 27, 2022 •

edited