High TF activity with few significant genes #116

laurie-tonon · 2024-02-22T14:06:12Z

Hi,

We use decoupleR with CollecTRI to quantify transcription factor activity in our bulk RNAseq data. We observe very strange results, where regulons have important activity with high statistical significance but when we look at the volcano plot of the genes only a handful are significant.
We use the DESeq2 stat value as input, and the consensus fonction.

Here is an example for a differential analysis:

We see that the MYC regulon is highly repressed in our analysis.
But it we look at the volcano plot of its genes:

Only 4 genes are significant.

Same thing if we look at SP1:

As a result we are not very confident in the results obtained.

Could you explain us how we can reach such enrichment score with so few genes?

Thanks a lot

PauBadiaM · 2024-02-23T17:47:20Z

Hi @laurie-tonon,

To correctly asses this you should plot the stat values instead of the Log2FC since this is what is used in the end to compute the enrichment score. Then, even if genes are not significantly changing, they are still used for the score calculation, this score just means that these genes are positively/negatively coordinated. One thing you could do is to filter by significance if it is important in your application, but I would advice against it since it reduces the background distribution of genes and results can become noisy.
Hope this is helpful!

laurie-tonon · 2024-03-05T08:50:37Z

Hi @PauBadiaM,

Thanks for your help. You are right, we should plot the stat values to be correct, as these are the ones used by decoupleR. But that won't change our conclusion that we don't trust the results, as many regulons are found significantly altered while very few genes are.
We tried using another metric, such as -log10*p-value, but the results are also inconsistent.
Our conclusion is that we can't perform an analysis of transcription factor activity like this if we have too few differentially expressed genes. We can only calculate a score per sample and compare distributions between our conditions.
Do you agree with this conclusion, or is there something else we haven't tried?

Thanks a lot

PauBadiaM · 2024-03-05T10:04:58Z

Hi @laurie-tonon,

Yes, computing scores per sample and then comparing distributions is also a valid strategy. However, I would also expect few hits if the results contain so many non-significant DEG.

PauBadiaM self-assigned this Feb 23, 2024

PauBadiaM added the question Further information is requested label Feb 23, 2024

PauBadiaM closed this as completed May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High TF activity with few significant genes #116

High TF activity with few significant genes #116

laurie-tonon commented Feb 22, 2024

PauBadiaM commented Feb 23, 2024

laurie-tonon commented Mar 5, 2024

PauBadiaM commented Mar 5, 2024

High TF activity with few significant genes #116

High TF activity with few significant genes #116

Comments

laurie-tonon commented Feb 22, 2024

PauBadiaM commented Feb 23, 2024

laurie-tonon commented Mar 5, 2024

PauBadiaM commented Mar 5, 2024