Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using variable labels instead of variable names when available #24

Open
larmarange opened this issue Jul 10, 2020 · 19 comments
Open

Using variable labels instead of variable names when available #24

larmarange opened this issue Jul 10, 2020 · 19 comments

Comments

@larmarange
Copy link

Variable labels, stored as a label attributes and easily accessible with labelled::var_label(), are becoming quite common. Many packages (like gtsummary) producing graphs or tables are now adopting the following rule: if defined, use variable labels instead of variable names.

Such addition to forestmodel would allow to easily customize the names of variables displayed on forest plots.

@ShixiangWang
Copy link
Contributor

This package is not in active development, if you are interested in this feature, please implement it, then keep a fork or create a pull request to https://github.com/ShixiangWang/forestmodel

@larmarange
Copy link
Author

@ShixiangWang is it an official fork?

@NikNakk could you clarify if you still plan to maintain and develop forestmodel?

@ShixiangWang
Copy link
Contributor

@larmarange Nope, I don't say that. The author is nice, but he may be not active in GitHub, from my view.

@NikNakk
Copy link
Owner

NikNakk commented Jul 16, 2020

Hi @larmarange, @ShixiangWang,

I've not been very active in maintaining this package for a while because of being busy with other things, but I'm still aiming to get to the outstanding queries that have been raised including yours. There's also now a more pressing reason to attend to the package because it's erroring on CRAN so will be delisted if I don't fix that. I'll at least fix the current issue that would lead to delisting in the next few days, but if I can I'll try to fix any other outstanding issues and improvements.

@larmarange
Copy link
Author

Thanks @NikNakk for your feedback.

Regarding the proposed improvement, it should not be very difficult to implement once identified where variable names are taken into account.

I didn't have time to get into your code in details so I do not know yet how your code was organized. But as you are familiar with your package, you should have an idea on where to look at.

Best regards

@NikNakk
Copy link
Owner

NikNakk commented Jul 16, 2020

@larmarange I've made a new branch that has a simple implementation of this at https://github.com/NikNakk/forestmodel/tree/labels. You can test it using remotes::install_github("NikNakk/forestmodel@labels")

@larmarange
Copy link
Author

Thanks a lot

@NikNakk
Copy link
Owner

NikNakk commented Jul 17, 2020

@larmarange please let me know when you've had a chance to test this out.

@larmarange
Copy link
Author

@NikNakk I have done some quick tests. It works well with simple models. Thanks.

When I add interaction terms, labels are not taken into account for interaction terms, but it was already the case before (it seems that forstmodel was not treating them in a particular way).

library(questionr)
library(forestmodel)
library(labelled)

data(fertility)
women <- unlabelled(women)
mod <- glm(employed ~ age + residency * instruction, data = women, family = binomial())
forest_model(mod, exponentiate = TRUE)

image

Here a quick example with gtsummary to show this package handle interaction terams.

library(gtsummary)
tbl_regression(mod)
Characteristic log(OR) 95% CI p-value
Age at last anniversary (in years) 0.06 0.05, 0.07 <0.001
Urban / rural residency
urban
rural 0.28 0.00, 0.55 0.052
Level of instruction
none
primary 0.35 -0.02, 0.74 0.067
secondary -0.83 -1.2, -0.50 <0.001
higher -0.71 -1.3, -0.10 0.022
Urban / rural residency * Level of instruction
rural * primary -0.16 -0.67, 0.35 0.5
rural * secondary 0.19 -0.41, 0.80 0.5
rural * higher -1.5 -4.5, 0.56 0.2

@larmarange
Copy link
Author

But I know that managing interaction terms could be tricky and beyond the current issue.

Otherwise, it's perfect. Thanks a lot

@NikNakk
Copy link
Owner

NikNakk commented Jul 19, 2020

I’ll have a look at interaction terms when I get a chance. gtsummary looks like a good starting point. For now I’ve merged the labels branch into master and need to get the latest version on CRAN because otherwise it will be delisted.

@larmarange
Copy link
Author

Thanks

@NikNakk
Copy link
Owner

NikNakk commented Jul 20, 2020

FYI, this version is now on CRAN.

@proshano
Copy link

Variable labels still not showing up

@corneliushennch
Copy link

Variable labels still not showing up

Same here, it works fine with gtsummary::tbl_regression but not with forest_model from the forestmodel package that I just downloaded from Github.

@NikNakk
Copy link
Owner

NikNakk commented Apr 14, 2021

Sorry for the delayed response, @proshano and @corneliushennch. Could you please give me some example code that doesn't work as expected? I'm still planning to work on interaction terms since they're not currently properly supported with or without labels.

@larmarange
Copy link
Author

In case it could be useful for you, gtsummary::tbl_regression() now relies on broom.helpers package: https://larmarange.github.io/broom.helpers/

@corneliushennch
Copy link

corneliushennch commented Apr 22, 2021

EDIT:

The problem occurs with factor and character variables when using coxph(). Only the label of the numeric variable gets printed as you can see in the reprex. All variable types work fine if you use other models (just checked glm). So changing factors back to character – which would already be tedious as factors are pretty standard in this kind of data analysis – doesn't solve it, as I first thought. I'd very much appreciate if you could implement the proper use of labels also for the coxph objects, as there is so far no convenient function that can display Hazard ratios in clear forest plots with labels. I formerly used survminer::ggforest, but switched to forestmodel in order to be able to use labels...

library(survival)
library(dplyr)
library(forestmodel)

surv_data <- tibble(
  time = abs(rnorm(300, 50, 30)),
  event = sample(c(0,1), 300, prob = c(0.8, 0.2), replace = TRUE),
  gender = sample(c(0,1), 300, prob = c(0.6, 0.4), replace = TRUE),
  rx = sample(c("no","yes"), 300, prob = c(0.5, 0.5), replace = TRUE),
  gene = sample(c(0,1), 300, prob = c(0.9, 0.1), replace = TRUE)
)

surv_data <- surv_data %>% 
  mutate(gender = factor(gender, levels = c(0,1), labels = c("female", "male")))

labelled::var_label(surv_data) <- list(
  gender = "Gender (f/m)", #this variable is a factor -> doesn't work!
  rx = "Irradiation", # character -> label doesn't work!
  gene = "Gene of Interest" # numeric -> label works...
)

labelled::var_label(surv_data) # checking that labels are assigned
#> $time
#> NULL
#> 
#> $event
#> NULL
#> 
#> $gender
#> [1] "Gender (f/m)"
#> 
#> $rx
#> [1] "Irradiation"
#> 
#> $gene
#> [1] "Gene of Interest"
lapply(surv_data, class) # showing variable classes
#> $time
#> [1] "numeric"
#> 
#> $event
#> [1] "numeric"
#> 
#> $gender
#> [1] "factor"
#> 
#> $rx
#> [1] "character"
#> 
#> $gene
#> [1] "numeric"

# printing the coxph model -> only label of numeric variable works
print(forest_model(coxph(formula = Surv(time, event) ~ gender + rx + 
                           gene, data = surv_data)))

# ok it seems to be a specific problem of the coxph object -> labels get printed correctly 
# with glm...
mod <- glm(gender ~ gene + rx, data = surv_data, family = binomial())
forest_model(mod, exponentiate = TRUE)

Created on 2021-04-23 by the reprex package (v0.3.0)

@fabones1
Copy link

fabones1 commented Jun 22, 2021

I would also love to have the coxph factor label bug fixed as it would save a lot of time in my work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants