New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
differential expression analysis and GSEA #119
Comments
Hi @AgentScientist, Pathway analysis can be performed at two levels: at the observation (cell) or contrast. In your case, you seem interested in the difference between conditions so first you will have to generate contrast level gene statistics using your favorite differential expression framework (limma, deseq2, edger, etc.). Because you are working with single cell data, it is a good practice to perform DEA at the pseudobulk level to reduce the overinflation of p-values (check this ref if you want to read more about it). Once you have performed DEA on your pseudobulk profiles you can perform any enrichment analysis that you want. In this vignette, we show how to perform pseudobulking, DEA with DESeq2 and different enrichment analyses you can perform. Unfortunately, it is only available in the python version of decoupler, but should be easy to follow/adapt in R too. Hope this is helpful! |
Hello, "We know that single cells within a sample are not independent of each other, since they were isolated from the same environment. If we treat cells as samples, we are not testing the variation across a population of samples, rather the variation inside an individual one. " Would this really be a problem ? I mean, that cell type X in treated group is in a totally different state than in control group, and it's a 3 vs 3. I got biological more sound result using Seurat::FindMarkers + PIANO than using the pseudobulk method. Maybe I'm missing something. At the step of pseudo bulk generation. At the sample column, is it ok to choose sample, knowing that some samples come from the same donor?
PROGENy works perfectly fine. I'm just having some issues with the MsigDB analysis, which missed some key pathways that I got using Seurat::FindMarkers + PIANO |
Hi @AgentScientist, I would strongly advice against performing DEA at the single-cell level followed by enrichment analysis. Even if you have a balanced experimental design, still samples might contribute different number of cells which will bias the test, using single-cells as observations breaks the assumption of any DEA test that samples are independent from each other, and also it overinflates the obtained p-values (so you will get many false positives). Regarding your pseudobulk results, one thing you might try is to be less strict with the gene filtering with Regarding your patient vs sample metadata, you can do two things: i) use the patient id as your sample col (you summarize per patient), ii) you include the patient id as a covariate to the DESeq2 model to correct for that. Hope this is helpful! |
Hello, it was a geneset issue ! I used the standard MSigDB 1329 pathways, and I got same result as before. |
"Regarding your patient vs sample metadata, you can do two things: i) use the patient id as your sample col (you summarize per patient), ii) you include the patient id as a covariate to the DESeq2 model to correct for that."
pdata = dc.get_pseudobulk( If I do this, in the resulting pdata, for some reason I lose a few columns: the conditions (treated/normal) columns for ex. I keep all the columns only when I do sample_col='sample' I actually think using "sample" is fine, since we don't want to mix the cell type within donors.
I did this: dds = DeseqDataSet(
I see in the example that there is no association between any PC and the condition column ("disease"). Even in the PCA plot, we don't distinguish between COVID-19 and Normal. Shouldn't we see some differences ? |
Hello,
I was using decoupleR for pathway analysis in single cell.
I tried a few lines:
Where should I specify the case group and the control group ? Where do we specify the statistics chosen ?
Looking at the output, it seems to be considering every single cell as a condition.
The text was updated successfully, but these errors were encountered: