Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input structure of Gene Set Enrichment Analysis #10

Open
sajjad6al opened this issue Nov 10, 2016 · 6 comments
Open

Input structure of Gene Set Enrichment Analysis #10

sajjad6al opened this issue Nov 10, 2016 · 6 comments

Comments

@sajjad6al
Copy link

I'm trying to write a shiny app using this package, however, I am unable to identify the input structure required to run GSEA. More specifically, I need to know what is the first argument for gsePathway(). I am aware it must be an order rank geneList, but how would this translate into a sample .csv file to be used as input file?

For example, I'm using a list of genes (EntrezID) all in one column, written as .csv file to run enrichPathway() and that works perfectly. How would you prepare an input file for gsePathway()?

As a side note: I am able to run the analysis using the sample dataset embedded in the package.

Best regards,
Sajjad Abedian
New York City College of Technology

@GuangchuangYu
Copy link
Member

see https://github.com/GuangchuangYu/DOSE/wiki/how-to-prepare-your-own-geneList.

Please let me know if you finished your shiny app, I can add a link in ReactomePA homepage.

@sajjad6al
Copy link
Author

Dear Guangchuang Yu,

I've developed an app using the package and modified it based on what was needed at the time. The main modification is that I have enabled the users to input gene symbols (instead of EntrezID) when running pathway enrichment analysis or gene set enrichment analysis. I will include the sample input file for each as well to test the app.

I have a couple of questions regarding using the package.

  1. My main problem is, when implementing the package in a shiny app, the network figures will be practically useless if the user chooses to view too many categories of pathways. I understand when it is ran locally I am able to set the parameter "fix" as false and move them around on my computer. But the shiny app doesn't let me do as such online. What is your suggestion on over-crowded networks?

  2. When the user downloads GSEA plots generated in Gene Set Enrichment Analysis as png, only one portion of the two plots will be saved. Whereas, when it is downloaded as pdf, both of them will be saved. I'm trying to understand how those plots are generated within the package, and how can I solve this problem.

Please use the input files to test all the functionalities of the app, and let me know about those two problems, and I would love to hear your general idea about how to make the app better.

https://sajjadabedian.shinyapps.io/ReactomePA/

Input files.zip

@GuangchuangYu
Copy link
Member

GuangchuangYu commented Jan 5, 2017

for Q1, you may refer to YuLab-SMU/DOSE#12. I don't have time to develop D3Network version of these plots, but it can be done.

for Q2, I have no idea since I don't know how you implement that functionality.

The following code works for me in R console.

> require(ReactomePA)
> data(geneList)
> x = gsePathway(geneList)
> png("1474244.png")
> gseaplot(x, 1, title=x$Description[1])
> dev.off()

@sajjad6al
Copy link
Author

Thank you so much for your fast response, I will definitely change up the codes accordingly and will update you on the progress.

@aamarnani
Copy link

Hello GuangChuangYu and Sajjad6al,

Thank you so much for putting together ReactomePA (GCY) and for making it into a very useful ShinyApp that is immensely user friendly (Saj).

I have been using Kallisto and Sleuth for RNA Sequencing analysis and it has been useful to move from the Sleuth output to your tools for pathway enrichment analysis. One note about doing so in case it is helpful for anyone:

When going from Sleuth analysis to the ReactomePA ShinyApp, one needs to use gene symbols. However, when annotating transcript results directly from biomaRt, Uniprot gene ID and other annotations don't work. Instead, the solution became to annotate the transcript Ids with the "external_gene_name" from bioMart and then in excel, use the "UPPER()" function to turn the gene ids form Ext_gene_name into all caps.

The gene IDs need to be in all CAPS for them to work in the ReactomePA Shiny App.

Hopefully this is helpful for others trying to do something similar!

Kind Regards,

Abhi
MD/PhD Student
SUNY Downstate Medical Center
PS: Here is the code that I use when annotating the files while using the library("sleuth") package in R that I found most useful for downstream pathway analysis and other analyses.

source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt")
require(biomaRt)

mart <- biomaRt::useMart(biomart = "ENSEMBL_MART_ENSEMBL",
dataset = "mmusculus_gene_ensembl",
host = 'ensembl.org')
listAttributes(mart)
t2g <- biomaRt::getBM(attributes = c("ensembl_transcript_id", "ensembl_gene_id",
"external_gene_name","chromosome_name", "entrezgene",
"ucsc"), mart = mart)
t2g <- dplyr::rename(t2g, target_id = ensembl_transcript_id,
ens_gene = ensembl_gene_id, ext_gene = external_gene_name, Chrom_name = chromosome_name, entrezgene = entrezgene, ucsc = ucsc)

@Diango700
Copy link

Hello
I tried to create my geneList according to this r code:

setwd("C:/cygwin64/home/DIANGO/EXCELL/")
d = read.csv("PA_down_id.csv",sep = " ", header = F)
head(d)

output :`

  ID FLC    
1 PF3D7_0936800 -7.897314    
2 PF3D7_1478900 -1.709372    
3 PF3D7_1009700 -1.255239    
4 PF3D7_0508500 -1.137078    
5 PF3D7_1458700 -1.368088    
6 PF3D7_1124600 -1.259540
geneList =d[,2]
names(geneList) = as.character(d[,1])
geneList = sort(geneList, decreasing = TRUE)
head(geneList)


library(ReactomePA)
data(geneList)
de <- names(geneList)[abs(geneList) > 1.5]
head(de)

oupout :
geneList dataset not found [1] "PF3D7_0500600" "PF3D7_0221400" "PF3D7_0937300" "PF3D7_1478900" "PF3D7_0413300" "PF3D7_0421500"

x <- enrichPathway(gene=de,pvalueCutoff=0.05, readable=T)
head(as.data.frame(x))

output:

--> No gene can be mapped....
--> Expected input gene ID: PF3D7_1405400,PF3D7_1123800,PF3D7_0725200,PF3D7_1439000,PF3D7_0716100,PF3D7_1421900
--> return NULL...

Help me to create my dataset. I'm working on plamoduim Falciparum and my geneIDs have been generated from the plasmodb.org database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants