Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggcorr does not support "non syntactically valid" column names. #465

Open
trekonom opened this issue May 11, 2023 · 2 comments
Open

ggcorr does not support "non syntactically valid" column names. #465

trekonom opened this issue May 11, 2023 · 2 comments

Comments

@trekonom
Copy link

trekonom commented May 11, 2023

When answering this question on SO I stumbled over an issue with ggcorr when column names are not syntactically valid variable names. In that case the column names are converted to syntactically valid names and all "special" symbols are replaced by dots.

A minimal reproducible example of the issue:

ex_db <- structure(list(
  SAPS = c(11L, 14L, 14L, 15L, 14L, 13L, 14L, 13L, 12L, 15L),
  `e'` = c(119, 62, 74, 75, 111, 66, 102, 71, 100, 108),
  `E/e'` = c(50, 111, 82, 68, 78, 105, 60, 91, 61, 49)
), class = "data.frame", row.names = c(NA, -10L))

library(GGally)
#> Loading required package: ggplot2
#> Registered S3 method overwritten by 'GGally':
#>   method from   
#>   +.gg   ggplot2

ggcorr(ex_db)

Created on 2023-05-11 with reprex v2.0.2

IMHO the issue is that when converting the correlation matrix to a dataframe using data.frame() the default check.names=TRUE is used:

m = data.frame(m * lower.tri(m))

A "hacky" workaround would be to manipulate the ggplot object returned by ggcorr and to replace the diagLabel column containing the labels with the original column names.

Note: This requires to identify the correct geom_text layer which adds the labels stored in the diagLabel column, i.e. the one with mapping: label = ~diagLabel .

p <- ggcorr(ex_db)

p$layers[[2]]$data[c("diagLabel")] <- names(ex_db)

p

Created on 2023-05-11 with reprex v2.0.2

@92amartins
Copy link
Collaborator

Thanks for bringing the issue!

I'm not a big fan of the "hacky" way.

Alternatively, can we just let the user specify check.names=FALSE and propagate the arg to the data.frame function?

@trekonom
Copy link
Author

trekonom commented Apr 27, 2024

Haha. Not a fan of hacky approaches either. Was just meant as a workaround for people who can't wait until the issue is fixed.

From my understanding it would probably be safe to simply set check.names=FALSE when converting the correlation matrix to a data.frame. But I haven't checked whether this causes some issues downstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants