Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gseaAfterBoot enriched genes #154

Open
EBosi opened this issue Apr 13, 2021 · 5 comments
Open

gseaAfterBoot enriched genes #154

EBosi opened this issue Apr 13, 2021 · 5 comments

Comments

@EBosi
Copy link

EBosi commented Apr 13, 2021

Dear all,
thanks for the terrific tool. I've been analysing my dataset following the excellent MAIT tutorial (https://www.bioconductor.org/packages/release/bioc/vignettes/MAST/inst/doc/MAITAnalysis.html). Considering the gene set enrichment analysis, methods from the GSEA family (GSEA, fGSEA, etc) provide, in addition to the enrichment score and significance, also a list of genes contributing to the enrichment (leadingEdge). I was wondering if this kind of information can actually be derived using gseaAfterBoot.

I've been trying to run the source of gseaAfterBoot line by line but I encountered a number of dependency errors of internal functions of the library (eg functions from GSEA-by-boot.R cannot be found). Is there a better way to tinker with the MAST functions/items?

I hope I was clear, I'm looking forward to your reply.
Emanuele

@amcdavid
Copy link
Member

The function name is probably a misnomer, because it performs a competitive test like camera in the edgeR package. So there's not a direct way to perform the leading edge analysis as done in the GSEA of Subramanian, et al 2005. But I imagine you could do something reasonable that replicates this by intersecting genes in a set with the ranks from the bootstrap (signed log10 p values or Z scores). Sounds like an interesting and useful addition.

To step through gseaAfterBoot you might clone this repo from github, then devtools::load_all() to import its internal functions.

@EBosi
Copy link
Author

EBosi commented Nov 9, 2021

Hi Andrew,
sorry for coming back in the discussion after such long time.
Thank you very much for your reply, I wanted to work a bit on this issue, could you please clarify what do you mean by intersecting the genes in a set with the ranks from the bootstrap?
Thanks again,
Emanuele

@amcdavid
Copy link
Member

amcdavid commented Nov 10, 2021

As a proof-of-concept, you might use the signed hurdle p-values, eg, from the summary method, multipying the logFC with the -log10(p.value) before worrying about bootstrapping, which is only important if you want to deal with gene-gene correlations. There would be no complication to apply the typical leading edge analysis of Subramanian, et al 2005 with this quantity. I know clusterProfiler has this analysis implemented.

Working with the bootstraps, which would only be necessary if you are worried about gene-gene correlations, would be much more complicated--definitely beyond the scope of random thoughts on a github issues page. You would need to modify code to work bootstat the matrix of bootstrapped coefficients. Most of the rest of the code in this function is specialized to the case that we are summing across genes in the set rather than looking at individual genes. Deriving the impact of gene-gene correlations on variance of the rank-order of the genes in the set sound pretty complicated!

@EBosi
Copy link
Author

EBosi commented Nov 10, 2021

Hi Andrew,
thanks for the clarification, I will do as you suggest.
I tried to tackle the bootstrap array object, but with little knowledge of the embedded items I was really struggling. It would be nice to have this addition tho if that's something that can be done with ease (relatively), as it's less advantageous to use the set enrichment method of MAST over others available (GSEA, clusterProfiler, etc.).
Thank you so much for you support!
Best,
Emanuele

@amcdavid
Copy link
Member

Unfortunately, this would not be easy -- the competitive gene set test is really a much different beast than the GSEA of Subramanian. More like a project for a PhD student than a couple hours' work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants