Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling multiple classes (and FDR) #154

Open
Tomnl opened this issue Jul 22, 2022 · 1 comment
Open

Handling multiple classes (and FDR) #154

Tomnl opened this issue Jul 22, 2022 · 1 comment

Comments

@Tomnl
Copy link

Tomnl commented Jul 22, 2022

I was wondering if there was a recommended way of using GSEApy when the analysis involves comparing more than 2 sample classes. e.g. control Vs class1, control Vs class2 ..... etc

It would be quite straightforward to perform repeated analysis using either prerank or with the gsea function. But I believe the FDR result might not be appropriate...

Potentially I could just combine the resulting dataframes and recalculate the FDR value using the Benjamini-Hochberg approach - see below. But I just wanted to check if this is appropriate and/or if it could be done another way within GSEApy, potentially using the gsea_fdr function.

res1 = gp.prerank(rnk=rnk1, gene_sets=gene_sets)
res2 = gp.prerank(rnk=rnk2, gene_sets=gene_sets)
res_all = pd.concat([res1, res2])

qvals = fdr_bh(res_all['pval'])

def fdr_bh(p):
    """Benjamini-Hochberg p-value correction for multiple hypothesis testing.
    https://stackoverflow.com/questions/7450957/how-to-implement-rs-p-adjust-in-python/33532498#33532498
    """
    p = np.asfarray(p)
    by_descend = p.argsort()[::-1]
    by_orig = by_descend.argsort()
    steps = float(len(p)) / np.arange(len(p), 0, -1)
    q = np.minimum(1, np.minimum.accumulate(steps * p[by_descend]))
    return q[by_orig]

Setup

I am using GSEApy version 0.9.16, Python version 3.9.7, and operating system Ubuntu.

import sys; print(sys.version)
import platform; print(platform.python_implementation()); print(platform.platform())
import gseapy; print(gseapy.__version__)
3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) 
[GCC 9.4.0]
CPython
Linux-5.13.0-44-generic-x86_64-with-glibc2.31
0.9.16
@zqfang
Copy link
Owner

zqfang commented Jul 25, 2022

GSEA use permutation (gene label or class label) to generate background distribution and calculate p values.

From my perspective, I don't recommend you to combine differenct background distribution and perform BH correction ( just like you did (concat res1 and res2 and do BH correction), given rnk1 and rnk2 is not the same.

I have never try to compare the null distritbution using different rankings. if the null distribution are all similar, I think BH correction works perfect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants