function to summarize loss reductions achieved by each base-learner #27

fabian-s · 2016-01-18T15:18:07Z

Almond Stöcker und Tobi Kühn have started to write a function to extract the amount of risk reduction contributed by each base-learner from a fitted mboost model.
We'd like to see this included in mboost, with visualisation options, if possible, as it seems to answer a question that comes up frequently in postprocessing and interpreting boosting fits: which base-learners are the most important for the fit?
See code below.

# variable importance
RelRisk <- function( model, ohne1 = FALSE )
{
  baselearner <- names(model$baselearner)
  learner.type = sapply( strsplit(baselearner, "\\(") , "[[", 1)
  learner.type.new = as.vector(sapply(learner.type, FUN = function(x) 
    switch(x, bols = "(linear)",
              bbs  = "(nonlinear)",
              bspatial = "(spatial)")))
  var.names    = as.vector(variable.names(model))  

  baselearner.short = paste(var.names, learner.type.new) 
  n <- length(model$response)

  # Which Baselearners were selected while boosting:
  selected <- model$xselect()

  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  # Emp. Risk:
  # Initial Risk for the Intercept Model:
  Risk0 <- with( model, family@risk( response, offset ) )
  # Risk after the Boosting-Steps:
  Risk <- model$risk()
  # Risk loss per step:
  RiskDif <- c(Risk0, Risk[-length(Risk)]) - Risk
  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  # Falls gewuenscht, ohne den ersten Schritt:
  if(ohne1) 
  {
    RiskDif <- RiskDif[-1]
    selected <- selected[-1]
  }

  # Explained Risk attributed to Baselearners
  explained <- rep( 0, length(baselearner) )
  for( i in 1:length(baselearner)) explained[i] <- sum( RiskDif[which(selected==i)] )

  # Selection percentage of the baselearners
  frequence <- rep(0, length(baselearner))
  for( i in 1:length(baselearner)) frequence[i] <- mean( selected == i )

  par(mar = c(5, 13, 4, 2) + 0.1 )
  (b <- barplot(height = explained / n , names.arg = baselearner.short, las = 1, horiz = TRUE, main = paste("Endrisiko =", Risk[length(Risk)] / n) ) )
  text(x = max(explained/n)/10, y = b, labels = frequence)
  par(mar = c(5, 4, 4, 2) + 0.1 )
}

> cars.gb <- gamboost(dist ~ bols(speed) + bbs(speed, center=TRUE), data = cars,
+   control = boost_control(mstop = 50))
> RelRisk(cars.gb)

Any input on what such a function should or should not do would be highly appreciated!

The text was updated successfully, but these errors were encountered:

ja-thomas · 2016-02-04T15:59:09Z

Hi Fabian,

awesome!

Some notes/thoughts:

This does not work for binary response because of the risk implementation in mboost (note that factors are encoded with (-1,1)). Risk0 will be 0 and a negative importance is set for the first base-learner.
Support of glmboost would be nice
Decreasing order of the bars
fixed number of digits for frequence

Here is how it looks for my data:

RelRisk(zero)

RelRisk(zero, ohne1= FALSE)

and to compare my ggplot solution:
zero_importance.pdf

hofnerb added the enhancement label Mar 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

function to summarize loss reductions achieved by each base-learner #27

function to summarize loss reductions achieved by each base-learner #27

fabian-s commented Jan 18, 2016

ja-thomas commented Feb 4, 2016

function to summarize loss reductions achieved by each base-learner #27

function to summarize loss reductions achieved by each base-learner #27

Comments

fabian-s commented Jan 18, 2016

ja-thomas commented Feb 4, 2016