Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enrichPathway make readable does not work for yeast genome #12

Open
snystrom opened this issue Sep 5, 2017 · 7 comments
Open

enrichPathway make readable does not work for yeast genome #12

snystrom opened this issue Sep 5, 2017 · 7 comments

Comments

@snystrom
Copy link

snystrom commented Sep 5, 2017

Issue stems from enrichPathway looking for a column named SYMBOL which does not exist in the org.Sc.sgd.db object (see columns(org.Sc.sgd.db)). Instead for yeast this should use the "GENENAME" column. Alternately, it would be nice to be able to pass which column to use for the Readable option instead of just T/F (ie so I could say convert IDs to GENENAME or ENTREZID or whatever), and also give it a specific org.db instead of just "yeast"/"fly"/"human" so I know what database is explicitly being used.

EXAMPLE:

lookup <- select(org.Sc.sgd.db, keys = c("AAD10","AAD15","ALD3"), keytype = "GENENAME", columns = c("ENTREZID", "GENENAME"))

# works
enrichPathway(lookup$ENTREZID, organism = "yeast")

# does not work
enrichPathway(lookup$ENTREZID, organism = "yeast", readable = T)

Fails with:

Error in .testForValidCols(x, cols) : 
  Invalid columns: SYMBOL. Please use the columns method to see a listing of valid arguments.
@GuangchuangYu
Copy link
Member

not an issue of clusterProfiler.

this is the issue of org.Sc.sgd.db which doesn't contains SYMBOL while the GENENAME (expected full name that is descriptive and verbose) is actually the SYMBOL.

I think you need to contact the maintainer of org.Sc.sgd.db to correct this.

@GuangchuangYu
Copy link
Member

It's a good idea to enable ID conversion instead of just converting it to SYMBOL.

I will consider this in future release.

@GuangchuangYu GuangchuangYu reopened this Sep 5, 2017
@malcook
Copy link

malcook commented Mar 5, 2019

+1 - I would greatly appreciate this too - in the mean time I seek other workaround:
https://support.bioconductor.org/p/118647/
@snystrom did you find a good workaround?

@snystrom
Copy link
Author

snystrom commented Mar 7, 2019

I use the following solution for GO terms. I take the gene symbol ids but tell clusterProfiler to use "GENENAME" in keytype. For OrgDb I go ahead and pass the actual db object.

gene_symbols <- c("YFG1", "YFG2", "YFG3")
res_GO <- clusterProfiler::enrichGO(gene_symbols, OrgDb = org.Sc.sgd.db::org.Sc.sgd.db,
                                                keytype = "GENENAME", ont = "BP")

I don't have a workaround for pathway enrichment. Seems the easiest solution is to just allow id conversion or just let the user say what column the values are from.

@malcook
Copy link

malcook commented Mar 7, 2019

Hi @snystrom - this helps allot. I have tried something like this already but with your encouragment that it SHOULD work, have further characterize the underlying issue.

First, I should note that I am using later version of clusterProfiler than you are, one in which keytype (all lowercase) is deprecated in favor of keyType.

Still, with that change, your example fails in my hands as follows:

> gene_symbols <- c("YFG1", "YFG2", "YFG3")
> res_GO <- clusterProfiler::enrichGO(gene_symbols, OrgDb = org.Sc.sgd.db::org.Sc.sgd.db,
+                                                 keyType = "GENENAME", ont = "BP")
--> No gene can be mapped....
--> Expected input gene ID: MHF1,RMD1,MMS4,IMI1,SPS100,CST9
--> return NULL...

(edit: I know realize the "YFG" in your example stands for "Your favorite gene". Doh!)

However, I tried a different set of gene symbols taken from genes associated with GO term Observable: RNA modification and this approach does work:

> gene_symbols <- c("ABP140","ATS1","BUD32","CBF5")
> res_GO <- clusterProfiler::enrichGO(gene_symbols, OrgDb = org.Sc.sgd.db::org.Sc.sgd.db,  keyType = "GENENAME", ont = "BP")
> res_GO
#
# over-representation test
etc...

However, in my case, I did not have GENENAME but rather ORF.

So, let me try your approach using keyType="ORF" (which is what I did try in the first place)...

First, I find that I can use bitr to translate those gene_symbols to ORF identifiers

> orf<-bitr(gene_symbols,'GENENAME','ORF',"org.Sc.sgd.db")$ORF
'select()' returned 1:1 mapping between keys and columns
> orf
[1] "YOR239W" "YAL020C" "YGR262C" "YLR175W"

But using them with enrichGO fails:

> res_GO <- clusterProfiler::enrichGO(orf, OrgDb = org.Sc.sgd.db::org.Sc.sgd.db,  keyType = "ORF", ont = "BP")

No gene set have size > 10 ...
--> return NULL...

So... I do have a workaround, which is to translate my ORFs to GENENAME (using bitr) and use them.

However, I would have expected the last example to work, and think this is still a BUG.

@GuangchuangYu - do you see my point?

Thanks!

@snystrom
Copy link
Author

snystrom commented Mar 8, 2019

Just tracked down the root of this issue and it's that DOSE::enricher_internal requires EntrezID. Looks like the real fix will either have to come from modification of DOSE, or modify the internal codebase of this package to allow keytype conversion to ENTREZID so it's compatible with DOSE::enricher_internal. I'll leave it to @GuangchuangYu to decide on that design decision.

@sagarutturkar
Copy link

Hello, I am facing the similar issue. Perhaps someone has found a workaround for this? Any pointers are appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants