Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Something off with coefitients? #9

Open
DemGrg opened this issue Jan 2, 2018 · 8 comments
Open

Something off with coefitients? #9

DemGrg opened this issue Jan 2, 2018 · 8 comments

Comments

@DemGrg
Copy link

DemGrg commented Jan 2, 2018

Hi, I don't understand how the broken function calculates the coefficients? (or something is off?)

In the lm function this is my test result:

summary(model)

Call:
lm(formula = TotalCharges ~ ., data = data_in_test)

Residuals:
Min 1Q Median 3Q Max
-1943.33 -453.71 -94.64 490.26 1887.26

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2162.4583 21.9717 -98.420 < 2e-16 ***
MonthlyCharges 36.1234 0.3080 117.301 < 2e-16 ***
tenure 65.3606 0.3683 177.476 < 2e-16 ***
SeniorCitizen -86.7050 24.3449 -3.562 0.000371 ***

Test user:
-2162.4583 + (data_in_test[analysed_user,]$MonthlyCharges * 36.1234) +
data_in_test[analysed_user,]$tenure65.3606 +
data_in_test[analysed_user,]$SeniorCitizen
(-86.7050)

[1] 721.2045

While you get: (u can see that the intercept is different)

lm_br
contribution
(Intercept) 2283.300
tenure = 3 -1923.025
MonthlyCharges = 74.4 346.850
SeniorCitizen = 0 14.081
final_prognosis 721.206
baseline: 0

  • strangely the final prognosis is now the same for both lm and broken but broken does not have the same coefficients as the summary(model) when doing calculations

Obviously one would expect that contributions of a waterfall plot would be simply Y=intercept + beta*value ... etc. from the summary output?

@pbiecek
Copy link
Owner

pbiecek commented Jan 3, 2018

Yes, actually there is a pretty cool reason why you do not want to have beta*value as separate contributions (see below)

In the broken object values that are calculated as beta*centered(value)

This is to make contributions resistant to shifting of an X variables.
Like you will get same brokenDown plots despite having temperature in celsius of fahrenheits.
Beta coefficients take care about scale, but location needs to be done separately.
Also, since values are centered, the intercept is shifted as well.

It is easy to get such individual contributions.
The way how this is implemented in the breakDown package is through (no extra calculations are needed)

predict.lm(model, newdata, type = "terms")

@alathrop
Copy link

alathrop commented Feb 9, 2018

Thank you for the explanation! May I suggest giving the user the option to use the centered or regular x values, as well as providing some explanation in the documentation. This is a great chart, but confusing without any explanation of using type = "terms"

@pbiecek
Copy link
Owner

pbiecek commented Feb 9, 2018

Yes, some documentation is required. Winter semester has just ended so I will have some time to work on it.

@larmarange
Copy link
Contributor

dear @pbiecek

Following @alathrop it would be great to have an option for having directly the application of the different terms rather than the centered values.

I completely understand for point of view. But in other context, such plot would be relevant, e.g. for pedagogic purpose. When teaching, I often need to explain to my students how a single prediction is obtained from a model, in particular when explaining how to interpret interactions.

Thanks for this package

@larmarange
Copy link
Contributor

larmarange commented Mar 17, 2018

Maybe some code could be helpful. I have tried the following.

betas <- function (object, newdata)
{
  tt <- terms(object)
  Terms <- delete.response(tt)
  mm <- model.matrix(Terms, newdata)
  ass <- attr(mm, "assign")
  tl <- attr(Terms, "term.labels")

  co <- coef(object)
  pred <- co * mm

  ret <- matrix(rep_len(NA, length.out = length(tl) * nrow(newdata)), nrow = nrow(newdata))
  colnames(ret) <- tl
  rownames(ret) <- rownames(ret)

  for (i in 1:length(tl)) {
    ret[, i] <- rowSums(pred[, ass == i, drop = FALSE], na.rm = TRUE)
  }
  attr(ret, "constant") <- rowSums(pred[, ass == 0, drop = FALSE], na.rm = TRUE)

  ret
}

At the beginning of broken.glm, simply use ny <- betas(model, new_observation) instead of predict and all the rest of the function will still be working.

Would you consider adding such options?

@larmarange
Copy link
Contributor

I have prepared a pull request, just in case

@pbiecek
Copy link
Owner

pbiecek commented Mar 19, 2018

Thanks, merged.
Rendered examples are here:
https://pbiecek.github.io/breakDown/reference/broken.lm.html

@larmarange
Copy link
Contributor

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants