differential expression analysis and GSEA #119

Sophon-0 · 2024-03-19T13:44:28Z

Hello,

I was using decoupleR for pathway analysis in single cell.

I tried a few lines:

msigdb <- decoupleR::get_resource("MSigDB")
msigdb = msigdb[msigdb$collection =='hallmark', ]


# Enrichment with Over Representation Analysis (ORA)
mat <- as.matrix(seurat_object@assays$RNA@data)
result <- run_ora(
  mat=mat,
  network=msigdb,
  .source='geneset',
  .target='genesymbol'
)

Where should I specify the case group and the control group ? Where do we specify the statistics chosen ?
Looking at the output, it seems to be considering every single cell as a condition.

PauBadiaM · 2024-03-19T17:47:00Z

Hi @AgentScientist,

Pathway analysis can be performed at two levels: at the observation (cell) or contrast. In your case, you seem interested in the difference between conditions so first you will have to generate contrast level gene statistics using your favorite differential expression framework (limma, deseq2, edger, etc.). Because you are working with single cell data, it is a good practice to perform DEA at the pseudobulk level to reduce the overinflation of p-values (check this ref if you want to read more about it). Once you have performed DEA on your pseudobulk profiles you can perform any enrichment analysis that you want. In this vignette, we show how to perform pseudobulking, DEA with DESeq2 and different enrichment analyses you can perform. Unfortunately, it is only available in the python version of decoupler, but should be easy to follow/adapt in R too. Hope this is helpful!

Sophon-0 · 2024-03-20T20:06:34Z

Hello,
So I have in total 6 samples from 3 donors. 2 conditions: normal versus treatment.
So for each donor, there are 2 samples, one normal, one treated.
I want to compare cell type X in treatment versus control.

"We know that single cells within a sample are not independent of each other, since they were isolated from the same environment. If we treat cells as samples, we are not testing the variation across a population of samples, rather the variation inside an individual one. "

Would this really be a problem ? I mean, that cell type X in treated group is in a totally different state than in control group, and it's a 3 vs 3. I got biological more sound result using Seurat::FindMarkers + PIANO than using the pseudobulk method. Maybe I'm missing something.

At the step of pseudo bulk generation. At the sample column, is it ok to choose sample, knowing that some samples come from the same donor?

pdata = dc.get_pseudobulk( adata, sample_col='sample', groups_col='cell_type', layer='counts', mode='sum', min_cells=10, min_counts=1000 )

PROGENy works perfectly fine. I'm just having some issues with the MsigDB analysis, which missed some key pathways that I got using Seurat::FindMarkers + PIANO

PauBadiaM · 2024-03-22T09:09:26Z

Hi @AgentScientist,

I would strongly advice against performing DEA at the single-cell level followed by enrichment analysis. Even if you have a balanced experimental design, still samples might contribute different number of cells which will bias the test, using single-cells as observations breaks the assumption of any DEA test that samples are independent from each other, and also it overinflates the obtained p-values (so you will get many false positives).

Regarding your pseudobulk results, one thing you might try is to be less strict with the gene filtering with dc.plot_filter_by_expr. Are you keeping enough genes after filtering for DEA testing?

Regarding your patient vs sample metadata, you can do two things: i) use the patient id as your sample col (you summarize per patient), ii) you include the patient id as a covariate to the DESeq2 model to correct for that.

Hope this is helpful!

Sophon-0 · 2024-03-22T14:04:54Z

Hello, it was a geneset issue ! I used the standard MSigDB 1329 pathways, and I got same result as before.

Sophon-0 · 2024-03-22T16:44:52Z

"Regarding your patient vs sample metadata, you can do two things: i) use the patient id as your sample col (you summarize per patient), ii) you include the patient id as a covariate to the DESeq2 model to correct for that."

So I have these 2 columns in my adata: "Donor" and "sample" (for each donor, there are 2 samples, one normal, one treated)

pdata = dc.get_pseudobulk(
adata,
sample_col='Donor',
groups_col='cell_type',
layer='counts',
mode='sum',
min_cells=10,
min_counts=1000
)

If I do this, in the resulting pdata, for some reason I lose a few columns: the conditions (treated/normal) columns for ex. I keep all the columns only when I do sample_col='sample'

I actually think using "sample" is fine, since we don't want to mix the cell type within donors.

Do you also mean to include "Donor" in the DeseqDataSet function ? I don't see any options for this.

I did this:

dds = DeseqDataSet(
adata=DCs,
design_factors=['conditions',"Donor"],
ref_level=['conditions','Normal'],
refit_cooks=True,
inference=inference
)

For the ANOVA: "In order to have a better overview of the association of PCs with sample metadata, let’s perform ANOVA on each PC and see whether they are significantly associated with any technical or biological annotations of our samples"

I see in the example that there is no association between any PC and the condition column ("disease"). Even in the PCA plot, we don't distinguish between COVID-19 and Normal. Shouldn't we see some differences ?

PauBadiaM self-assigned this Mar 19, 2024

PauBadiaM added the question Further information is requested label Mar 22, 2024

deeenes added a commit to saezlab/pypath-common that referenced this issue Mar 22, 2024

msigdb version 2023.1 → 2023.2 (related to saezlab/decoupleR#119)

26fd74f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

differential expression analysis and GSEA #119

differential expression analysis and GSEA #119

Sophon-0 commented Mar 19, 2024 •

edited by PauBadiaM

PauBadiaM commented Mar 19, 2024

Sophon-0 commented Mar 20, 2024 •

edited

PauBadiaM commented Mar 22, 2024

Sophon-0 commented Mar 22, 2024 •

edited

Sophon-0 commented Mar 22, 2024 •

edited

differential expression analysis and GSEA #119

differential expression analysis and GSEA #119

Comments

Sophon-0 commented Mar 19, 2024 • edited by PauBadiaM

PauBadiaM commented Mar 19, 2024

Sophon-0 commented Mar 20, 2024 • edited

PauBadiaM commented Mar 22, 2024

Sophon-0 commented Mar 22, 2024 • edited

Sophon-0 commented Mar 22, 2024 • edited

Sophon-0 commented Mar 19, 2024 •

edited by PauBadiaM

Sophon-0 commented Mar 20, 2024 •

edited

Sophon-0 commented Mar 22, 2024 •

edited

Sophon-0 commented Mar 22, 2024 •

edited