Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High TF activity with few significant genes #116

Closed
laurie-tonon opened this issue Feb 22, 2024 · 3 comments
Closed

High TF activity with few significant genes #116

laurie-tonon opened this issue Feb 22, 2024 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@laurie-tonon
Copy link

Hi,

We use decoupleR with CollecTRI to quantify transcription factor activity in our bulk RNAseq data. We observe very strange results, where regulons have important activity with high statistical significance but when we look at the volcano plot of the genes only a handful are significant.
We use the DESeq2 stat value as input, and the consensus fonction.

Here is an example for a differential analysis:

image

We see that the MYC regulon is highly repressed in our analysis.
But it we look at the volcano plot of its genes:

image

Only 4 genes are significant.

Same thing if we look at SP1:

image

As a result we are not very confident in the results obtained.

Could you explain us how we can reach such enrichment score with so few genes?

Thanks a lot

@PauBadiaM
Copy link
Collaborator

Hi @laurie-tonon,

To correctly asses this you should plot the stat values instead of the Log2FC since this is what is used in the end to compute the enrichment score. Then, even if genes are not significantly changing, they are still used for the score calculation, this score just means that these genes are positively/negatively coordinated. One thing you could do is to filter by significance if it is important in your application, but I would advice against it since it reduces the background distribution of genes and results can become noisy.
Hope this is helpful!

@PauBadiaM PauBadiaM self-assigned this Feb 23, 2024
@PauBadiaM PauBadiaM added the question Further information is requested label Feb 23, 2024
@laurie-tonon
Copy link
Author

Hi @PauBadiaM,

Thanks for your help. You are right, we should plot the stat values to be correct, as these are the ones used by decoupleR. But that won't change our conclusion that we don't trust the results, as many regulons are found significantly altered while very few genes are.
We tried using another metric, such as -log10*p-value, but the results are also inconsistent.
Our conclusion is that we can't perform an analysis of transcription factor activity like this if we have too few differentially expressed genes. We can only calculate a score per sample and compare distributions between our conditions.
Do you agree with this conclusion, or is there something else we haven't tried?

Thanks a lot

@PauBadiaM
Copy link
Collaborator

Hi @laurie-tonon,

Yes, computing scores per sample and then comparing distributions is also a valid strategy. However, I would also expect few hits if the results contain so many non-significant DEG.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants