Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow for resampling weights and for data weights #37

Open
sbrockhaus opened this issue Jun 12, 2016 · 0 comments
Open

allow for resampling weights and for data weights #37

sbrockhaus opened this issue Jun 12, 2016 · 0 comments

Comments

@sbrockhaus
Copy link
Member

In the current mboost() implementation only one argument weights exists that is used for data and for resampling weights.
Data weights can occur for example as weights in surveys or as integration weights for functional response. Resampling weights are important as resamplling is mostly used to find the optimal stopping iteration.
It is somewhat weird that the weights are rescaled in mboost_fit(), so that they sum up to 1 only when the weights are not integers, cf. rescale_weights()
https://github.com/boost-R/mboost/blob/master/R/helpers.R#L29
Thus, the rescaling is only done when the weights are not integers (assuming that resampling weights are integers and data weights are not?)

And one has to be careful what cv() does when creating folds to be used in cvrisk(), in the case thatmodel.weightsof the fitted object are not all equal to 1, see

library(mboost)

x <- sort(rnorm(10))
y <- x^2
y <- y - mean(y)
dat <- data.frame(y = y, x = x)

## model fit without weights
m <- mboost(y ~ bbs(x), data = dat)

## model fit with integer weights
myweights <- c(0, 2, 3, 1, 0, 3, 2, 1, 1, 2)
m_int <- mboost(y ~ bbs(x), data = dat, weights = myweights)

## model fit with non-integer weights
myweights3 <- myweights
myweights3[1] <- 0.1
m_int3 <- mboost(y ~ bbs(x), data = dat, weights = myweights3)

## look at model.weights
model.weights(m)
model.weights(m_int) ## weights are as specified
model.weights(m_int3) ## weights are rescaled 

####### look at cv() that generates resampling folds to be used in cvrisk()
## in cv() probability for each observation to enter the BS-fold is proportional to its weight
set.seed(123)
cv(weights = model.weights(m_int3), type = "bootstrap", B = 3)
## for cross-validation the folds are multiplied with the weights
cv(weights = model.weights(m_int3), type = "kfold", B = 3)

@fabian-s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant