How to color code cnetplot nodes from enrichrResult data #667

hlnicholls · 2024-02-14T13:29:08Z

Thank you for developing this package, it is very useful.

I am trying to plot a cnetplot with a color scale relating to my foldchange values.

Running the example code works as expected:

set.seed(123)
x <- list(A = letters[1:10], B=letters[5:12], C=letters[sample(1:26, 15)])
d <- setNames(rnorm(26), letters)
p2 <- cnetplot(x, foldChange=d) + 
  scale_color_gradient2(name='associated data', low='darkgreen', high='firebrick')

However, my version of x is not a list but an enrichResult. When I try to run that, the code gets an error:

kegg_organism = "hsa"
kegg_enrich <-  enrichKEGG(gene   = d$entrezgene_id,
                           organism     = 'hsa',
                           pvalueCutoff = 0.05,
                           pAdjustMethod = 'BH')

kegg <- setReadable(kegg_enrich, 'org.Hs.eg.db', 'ENTREZID')
kegg_genes <- kegg[,]
keggtop5 <- kegg_genes[order(kegg_genes$p.adjust),]
keggtop5 <- keggtop5[1:5,]
top5all <- enrichDF2enrichResult(keggtop5)
filtered_kegg_genes <- kegg_genes[!grepl("cancer", kegg_genes$Description, ignore.case = TRUE), ]
top5paths <- head(filtered_kegg_genes[order(filtered_kegg_genes$pvalue), ], 5)

top5all <- enrichDF2enrichResult(top5paths, pAdjustMethod='BH')

scores <- setNames(d$Median_PoPS, d$entrezgene_id)

p = cnetplot(top5all, showCategory = 5,
              foldChange=scores,
              scale_color_gradient2(name='associated data', low='steelblue', high='firebrick'))

Error in `layout_to_table()`:
! Unknown `layout`
Run `rlang::last_trace()` to see where the error occurred.

I have also tried:

> p = cnetplot(top5all,showCategory = 5,
+              foldChange=scores)
Scale for size is already present.
Adding another scale for size, which will replace the existing scale.
Warning message:
In cnetplot.enrichResult(x, ...) :
  Use 'color.params = list(foldChange = your_value)' instead of 'foldChange'.
 The foldChange parameter will be removed in the next version.

> 
> p = cnetplot(top5all,showCategory = 5,
+              color.params = list(foldChange = scores)
+              
+ )
Scale for size is already present.
Adding another scale for size, which will replace the existing scale.

These run but do not color code the nodes.

The text was updated successfully, but these errors were encountered:

guidohooiveld · 2024-02-19T00:00:10Z

Your code in the 2nd code box, starting from the line kegg_genes <- kegg[,], looks very complex to me. So what exactly do you want to achieve here? Could you provide a reproducible example?
Also note that (AFAIK) the function enrichDF2enrichResult is not part of clusterProfiler, but rather taken from a GitHub page called multienrichjam (here)?

Yet, this thread may (still) be useful? https://support.bioconductor.org/p/p133748/

hlnicholls · 2024-02-19T09:19:24Z

Thank you for your reply and for your help. It is taking from multienrichjam. Sorry I should've also mentioned previously that maybe 1-2 years ago I was able to create a cnetplot with color scaling working as I needed for this type of data (I think I used that bioconductor link you sent to set it up with using scale_color_gradientn()). However, I've since reinstalled all the packages on a new computer and now I can't get that old code to work.

Ultimately I just want to color code my genes by their PoPs scores (in my d I have a column of genes and a column of their PoPs scores). I want to see this for my genes that are in the top 5 most enriched pathways.

Here is my full code with example data:

library(data.table)
library(clusterProfiler)
library(tidyverse)
library(multienrichjam)
library(RColorBrewer)
library(ggraph)
library(ggfittext)
library(tidyr)
library(msigdbr)
library(org.Hs.eg.db)
library(biomaRt)

d <- structure(list(Gene = c("ACOT7", "ACTA2", "ACTN2", "ACTN4", "ADCY6", 
"ADO", "ADPRHL1", "AFF4", "AGO2", "AHCY", "ALG10", "ALPK3", "AMOTL2", 
"ANAPC7", "ANKS1A", "ANXA4", "ARFGAP2", "ARHGAP27", "ARIH2", 
"ARMC9", "ARPC3", "ASIP", "ATP1B2", "ATP2B1", "ATXN2", "B3GNT7", 
"BAG3", "BAIAP3", "BANK1", "BCAS3", "BHLHE41", "BMP7", "BMPR2", 
"BRD2", "C1QTNF4", "C1orf21", "C1orf35", "C2orf42", "C2orf69", 
"C3orf18", "CACFD1", "CACNB2", "CACNB3", "CALCB", "CALU", "CAMK2D", 
"CAMK2G", "CASQ2", "CASZ1", "CCDC141"), Median_PoPS = c(0.040097016, 
0.066717199, 0.119115226, 0.565579881, 0.152149101, -0.002287785, 
0.030722876, 0.202042945, 0.121184633, 0.078079842, -0.098154534, 
0.242834185, 0.113019505, 0.022269027, -0.037821713, 0.152854103, 
0.054061764, 0.070453112, 0.112837399, 0.108880234, -0.203012964, 
0.077142703, 0.225070177, -0.01134475, 0.162986282, 0.490838522, 
0.45760657, 0.045157219, 0.114610058, 0.089620962, 0.193743901, 
0.110843458, 0.123453215, 0.264372212, 0.04456751, 0.107923069, 
-0.017319883, 0.08909364, -0.064415409, 0.203052356, 0.033044087, 
0.376682059, 0.049786121, 0.000317271, 0.077634837, 0.171648497, 
0.252097225, 0.440276142, 0.140784241, 1.090303171), entrezgene_id = c(11332L, 
59L, 88L, 81L, 112L, 84890L, 113622L, 27125L, 27161L, 191L, 84920L, 
57538L, 51421L, 51434L, 23294L, 307L, 84364L, 201176L, 10425L, 
80210L, 10094L, 434L, 482L, 490L, 6311L, 93010L, 9531L, 8938L, 
55024L, 54828L, 79365L, 655L, 659L, 6046L, 114900L, 81563L, 79169L, 
54980L, 205327L, 51161L, 11094L, 783L, 784L, 797L, 813L, 817L, 
818L, 845L, 54897L, 285025L)), class = c("data.table", "data.frame"
), row.names = c(NA, -50L), sorted = "Gene")

# Getting KEGG pathways for my genes:

kegg_organism = "hsa"
kegg_enrich <-  enrichKEGG(gene   = d$entrezgene_id,
                           organism     = 'hsa',
                           pvalueCutoff = 0.05,
                           pAdjustMethod = 'BH')

kegg <- setReadable(kegg_enrich, 'org.Hs.eg.db', 'ENTREZID')
kegg_genes <- kegg[,]

# Getting the top 5 most significantly enriched pathways to plot their network:

keggtop5 <- kegg_genes[order(kegg_genes$p.adjust),]
keggtop5 <- keggtop5[1:5,]
top5all <- enrichDF2enrichResult(keggtop5)
gene_list_scores<-na.omit(gene_list_scores)
filtered_kegg_genes <- kegg_genes[!grepl("cancer", kegg_genes$Description, ignore.case = TRUE), ]
top5paths <- head(filtered_kegg_genes[order(filtered_kegg_genes$pvalue), ], 5)
top5all <- enrichDF2enrichResult(top5paths, pAdjustMethod='BH')

scores <- setNames(d$Median_PoPS, d$entrezgene_id)

p <- cnetplot(top5all, showCategory = 5,
             foldChange=scores,
             scale_color_gradient2(name='associated data', low='steelblue', high='firebrick'))

# or

p <- cnetplot(top5all,showCategory = 5, color.params = list(foldChange = scores)

guidohooiveld · 2024-02-20T10:04:12Z

So if I understand you correctly, you would like to visualize the network of the top 5 most significant gene sets in a cnetplot, thereby using your preferred color scale for the values of Median_PoPS?

I would do it like below; note that you can change the default colors using options(enrichplot.colours = c("steelblue","firebrick")). See also: YuLab-SMU/enrichplot#268

Moreover, since the output is a ggplot2 object, this can (even) be further modified if needed.

Also note that the object gene_list_scores is not defined in your code snippet...

> library(clusterProfiler)
> library(ggplot2)
> 
> ## generate input data
> d <- structure(list(Gene = c("ACOT7", "ACTA2", "ACTN2", "ACTN4", "ADCY6", 
+ "ADO", "ADPRHL1", "AFF4", "AGO2", "AHCY", "ALG10", "ALPK3", "AMOTL2", 
+ "ANAPC7", "ANKS1A", "ANXA4", "ARFGAP2", "ARHGAP27", "ARIH2", 
+ "ARMC9", "ARPC3", "ASIP", "ATP1B2", "ATP2B1", "ATXN2", "B3GNT7", 
+ "BAG3", "BAIAP3", "BANK1", "BCAS3", "BHLHE41", "BMP7", "BMPR2", 
+ "BRD2", "C1QTNF4", "C1orf21", "C1orf35", "C2orf42", "C2orf69", 
+ "C3orf18", "CACFD1", "CACNB2", "CACNB3", "CALCB", "CALU", "CAMK2D", 
+ "CAMK2G", "CASQ2", "CASZ1", "CCDC141"), Median_PoPS = c(0.040097016, 
+ 0.066717199, 0.119115226, 0.565579881, 0.152149101, -0.002287785, 
+ 0.030722876, 0.202042945, 0.121184633, 0.078079842, -0.098154534, 
+ 0.242834185, 0.113019505, 0.022269027, -0.037821713, 0.152854103, 
+ 0.054061764, 0.070453112, 0.112837399, 0.108880234, -0.203012964, 
+ 0.077142703, 0.225070177, -0.01134475, 0.162986282, 0.490838522, 
+ 0.45760657, 0.045157219, 0.114610058, 0.089620962, 0.193743901, 
+ 0.110843458, 0.123453215, 0.264372212, 0.04456751, 0.107923069, 
+ -0.017319883, 0.08909364, -0.064415409, 0.203052356, 0.033044087, 
+ 0.376682059, 0.049786121, 0.000317271, 0.077634837, 0.171648497, 
+ 0.252097225, 0.440276142, 0.140784241, 1.090303171), entrezgene_id = c(11332L, 
+ 59L, 88L, 81L, 112L, 84890L, 113622L, 27125L, 27161L, 191L, 84920L, 
+ 57538L, 51421L, 51434L, 23294L, 307L, 84364L, 201176L, 10425L, 
+ 80210L, 10094L, 434L, 482L, 490L, 6311L, 93010L, 9531L, 8938L, 
+ 55024L, 54828L, 79365L, 655L, 659L, 6046L, 114900L, 81563L, 79169L, 
+ 54980L, 205327L, 51161L, 11094L, 783L, 784L, 797L, 813L, 817L, 
+ 818L, 845L, 54897L, 285025L)), class = c("data.table", "data.frame"
+ ), row.names = c(NA, -50L), sorted = "Gene")
> 
> ## check
> head(d)
Key: <Gene>
     Gene  Median_PoPS entrezgene_id
   <char>        <num>         <int>
1:  ACOT7  0.040097016         11332
2:  ACTA2  0.066717199            59
3:  ACTN2  0.119115226            88
4:  ACTN4  0.565579881            81
5:  ADCY6  0.152149101           112
6:    ADO -0.002287785         84890
> 
> ## perform KEGG-based overrepresentation analysis
> kegg_enrich <-  enrichKEGG(gene   = d$entrezgene_id,
+                            organism     = 'hsa',
+                            pvalueCutoff = 0.05,
+                            pAdjustMethod = 'BH')
> 
> ## converts geneids into symbols
> kegg <- setReadable(kegg_enrich, 'org.Hs.eg.db', 'ENTREZID')
> 
> kegg
#
# over-representation test
#
#...@organism    hsa 
#...@ontology    KEGG 
#...@keytype     ENTREZID 
#...@gene        chr [1:50] "11332" "59" "88" "81" "112" "84890" "113622" "27125" "27161" ...
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...19 enriched terms found
'data.frame':   19 obs. of  11 variables:
 $ category   : chr  "Organismal Systems" "Organismal Systems" "Organismal Systems" "Organismal Systems" ...
 $ subcategory: chr  "Circulatory system" "Endocrine system" "Digestive system" "Endocrine system" ...
 $ ID         : chr  "hsa04261" "hsa04925" "hsa04971" "hsa04921" ...
 $ Description: chr  "Adrenergic signaling in cardiomyocytes" "Aldosterone synthesis and secretion" "Gastric acid secretion" "Oxytocin signaling pathway" ...
 $ GeneRatio  : chr  "7/29" "5/29" "4/29" "5/29" ...
 $ BgRatio    : chr  "154/8662" "98/8662" "76/8662" "154/8662" ...
 $ pvalue     : num  5.51e-07 1.60e-05 1.10e-04 1.40e-04 1.78e-04 ...
 $ p.adjust   : num  6.44e-05 9.37e-04 3.63e-03 3.63e-03 3.63e-03 ...
 $ qvalue     : num  4.75e-05 6.91e-04 2.68e-03 2.68e-03 2.68e-03 ...
 $ geneID     : chr  "ADCY6/ATP1B2/ATP2B1/CACNB2/CACNB3/CAMK2D/CAMK2G" "ADCY6/ATP1B2/ATP2B1/CAMK2D/CAMK2G" "ADCY6/ATP1B2/CAMK2D/CAMK2G" "ADCY6/CACNB2/CACNB3/CAMK2D/CAMK2G" ...
 $ Count      : int  7 5 4 5 4 4 4 3 5 4 ...
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

> 
> ## prepare scores
> scores <- setNames(d$Median_PoPS, d$entrezgene_id)
> 
> 
> ####
> #### From now on I diverge from your code:
> ####
> 
> ## Get the top 5 most significantly enriched pathways.
> ## Note that by default the results are already sorted on significance.
> top.n <- 5
> keggtop5 <- as.data.frame(kegg_enrich)$Description[1:top.n]
> 
> 
> ## set/change the color to what you prefer
> options(enrichplot.colours = c("steelblue","firebrick"))
> 
> ## note the new way of what to use as color in the cnetplot
> color.params = list(foldChange = scores)
> p <- cnetplot(kegg,
+               showCategory = keggtop5,  # names of gene sets/pathways
+               color.params=color.params)
Scale for size is already present.
Adding another scale for size, which will replace the existing scale.
> 
> print(p)
> 
> ## since 'p' is a ggplot2 object, this can be further modified if needed.
> ## along the same lines as: https://support.bioconductor.org/p/p133748/
> library(scales)
> 
> min.value <- floor( min(p$data$color, na.rm = TRUE) )
> max.value <- ceiling( max(p$data$color, na.rm = TRUE) )
> 
> 
> #red-green colour scale
> p2 <- p + scale_color_gradientn(name = "fold change",
+           colours = c("green", "red"),
+           values = rescale(c(min.value, 0, max.value)),
+           limits= c(min.value, max.value), 
+           breaks=c(min.value , 0, max.value) )
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
> 
> print(p2)
> 
> #red-white-green colour scale, and legend box labelled 'Median PoPS'.
> p3 <- p + scale_color_gradientn(name = "Median PoPS",
+           colours = c("green", "white","red"),
+           values = rescale(c(min.value, 0, max.value)),
+           limits= c(min.value, max.value), 
+           breaks=c(min.value , 0, max.value) )
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
> 
> print(p3)
> 
>

Outputs:

p (colour scale set to steelblue - firebrick via options(...)):

p2 (colour scale set to red - green via scale_color_gradientn()):

p3 (colour scale set to red - white - green via scale_color_gradientn(), and legend box labelled Median PoPS)

hlnicholls · 2024-02-20T12:39:06Z

This is exactly what I needed, thank you very much for taking the time to look into this! This has completely resolved my issue.

guidohooiveld mentioned this issue Mar 21, 2024

heatplot function; color scale is switched #676

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to color code cnetplot nodes from enrichrResult data #667

How to color code cnetplot nodes from enrichrResult data #667

hlnicholls commented Feb 14, 2024

guidohooiveld commented Feb 19, 2024 •

edited

hlnicholls commented Feb 19, 2024 •

edited

guidohooiveld commented Feb 20, 2024 •

edited

hlnicholls commented Feb 20, 2024

How to color code cnetplot nodes from enrichrResult data #667

How to color code cnetplot nodes from enrichrResult data #667

Comments

hlnicholls commented Feb 14, 2024

guidohooiveld commented Feb 19, 2024 • edited

hlnicholls commented Feb 19, 2024 • edited

guidohooiveld commented Feb 20, 2024 • edited

hlnicholls commented Feb 20, 2024

guidohooiveld commented Feb 19, 2024 •

edited

hlnicholls commented Feb 19, 2024 •

edited

guidohooiveld commented Feb 20, 2024 •

edited