Something off with coefitients? #9

DemGrg · 2018-01-02T18:06:40Z

Hi, I don't understand how the broken function calculates the coefficients? (or something is off?)

In the lm function this is my test result:

summary(model)

Call:
lm(formula = TotalCharges ~ ., data = data_in_test)

Residuals:
Min 1Q Median 3Q Max
-1943.33 -453.71 -94.64 490.26 1887.26

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2162.4583 21.9717 -98.420 < 2e-16 ***
MonthlyCharges 36.1234 0.3080 117.301 < 2e-16 ***
tenure 65.3606 0.3683 177.476 < 2e-16 ***
SeniorCitizen -86.7050 24.3449 -3.562 0.000371 ***

Test user:
-2162.4583 + (data_in_test[analysed_user,]$MonthlyCharges * 36.1234) +
data_in_test[analysed_user,]$tenure65.3606 +
data_in_test[analysed_user,]$SeniorCitizen(-86.7050)

[1] 721.2045

While you get: (u can see that the intercept is different)

lm_br
contribution
(Intercept) 2283.300
tenure = 3 -1923.025
MonthlyCharges = 74.4 346.850
SeniorCitizen = 0 14.081
final_prognosis 721.206
baseline: 0

strangely the final prognosis is now the same for both lm and broken but broken does not have the same coefficients as the summary(model) when doing calculations

Obviously one would expect that contributions of a waterfall plot would be simply Y=intercept + beta*value ... etc. from the summary output?

pbiecek · 2018-01-03T00:30:51Z

Yes, actually there is a pretty cool reason why you do not want to have beta*value as separate contributions (see below)

In the broken object values that are calculated as beta*centered(value)

This is to make contributions resistant to shifting of an X variables.
Like you will get same brokenDown plots despite having temperature in celsius of fahrenheits.
Beta coefficients take care about scale, but location needs to be done separately.
Also, since values are centered, the intercept is shifted as well.

It is easy to get such individual contributions.
The way how this is implemented in the breakDown package is through (no extra calculations are needed)

predict.lm(model, newdata, type = "terms")

alathrop · 2018-02-09T22:03:35Z

Thank you for the explanation! May I suggest giving the user the option to use the centered or regular x values, as well as providing some explanation in the documentation. This is a great chart, but confusing without any explanation of using type = "terms"

pbiecek · 2018-02-09T22:16:16Z

Yes, some documentation is required. Winter semester has just ended so I will have some time to work on it.

larmarange · 2018-03-17T15:49:40Z

dear @pbiecek

Following @alathrop it would be great to have an option for having directly the application of the different terms rather than the centered values.

I completely understand for point of view. But in other context, such plot would be relevant, e.g. for pedagogic purpose. When teaching, I often need to explain to my students how a single prediction is obtained from a model, in particular when explaining how to interpret interactions.

Thanks for this package

larmarange · 2018-03-17T18:07:50Z

Maybe some code could be helpful. I have tried the following.

betas <- function (object, newdata)
{
  tt <- terms(object)
  Terms <- delete.response(tt)
  mm <- model.matrix(Terms, newdata)
  ass <- attr(mm, "assign")
  tl <- attr(Terms, "term.labels")

  co <- coef(object)
  pred <- co * mm

  ret <- matrix(rep_len(NA, length.out = length(tl) * nrow(newdata)), nrow = nrow(newdata))
  colnames(ret) <- tl
  rownames(ret) <- rownames(ret)

  for (i in 1:length(tl)) {
    ret[, i] <- rowSums(pred[, ass == i, drop = FALSE], na.rm = TRUE)
  }
  attr(ret, "constant") <- rowSums(pred[, ass == 0, drop = FALSE], na.rm = TRUE)

  ret
}

At the beginning of broken.glm, simply use ny <- betas(model, new_observation) instead of predict and all the rest of the function will still be working.

Would you consider adding such options?

larmarange · 2018-03-19T13:49:51Z

I have prepared a pull request, just in case

pbiecek · 2018-03-19T21:42:39Z

Thanks, merged.
Rendered examples are here:
https://pbiecek.github.io/breakDown/reference/broken.lm.html

larmarange · 2018-03-20T11:22:25Z

thanks

larmarange mentioned this issue Mar 19, 2018

a betas function compatible with broken #13

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Something off with coefitients? #9

Something off with coefitients? #9

DemGrg commented Jan 2, 2018

pbiecek commented Jan 3, 2018

alathrop commented Feb 9, 2018 •

edited

pbiecek commented Feb 9, 2018

larmarange commented Mar 17, 2018

larmarange commented Mar 17, 2018 •

edited

larmarange commented Mar 19, 2018

pbiecek commented Mar 19, 2018

larmarange commented Mar 20, 2018

Something off with coefitients? #9

Something off with coefitients? #9

Comments

DemGrg commented Jan 2, 2018

pbiecek commented Jan 3, 2018

alathrop commented Feb 9, 2018 • edited

pbiecek commented Feb 9, 2018

larmarange commented Mar 17, 2018

larmarange commented Mar 17, 2018 • edited

larmarange commented Mar 19, 2018

pbiecek commented Mar 19, 2018

larmarange commented Mar 20, 2018

alathrop commented Feb 9, 2018 •

edited

larmarange commented Mar 17, 2018 •

edited