Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many DEGs? #35

Open
KunFang93 opened this issue Jun 21, 2022 · 1 comment
Open

Too many DEGs? #35

KunFang93 opened this issue Jun 21, 2022 · 1 comment

Comments

@KunFang93
Copy link

Hi,

I am a fresh user of glmGamPoi and thanks for providing this awesome tool! I tried to identify DEGs in scRNA and successfully run the following code:

load('/Volumes/kfsd/GSE200903_new_combined_cluster_FibroSamples_2022-01-14.RData')
combine_cluster <- as.SingleCellExperiment(combined_cluster)

set.seed(1)
colData(combine_cluster)
# Take highly expressed genes and proper cells:
combine_cluster_subset <- combine_cluster[rowSums(counts(combine_cluster)) > 100, 
                                          which( ! is.na(combine_cluster$CellType) & combine_cluster$CellType 
                                                 %in% c("fibroblasts"))]

# add a stim2 based on stim
colData(combine_cluster_subset)$stim2 <- colsplit(colData(combine_cluster_subset)$stim, 
                                                  split = "_", names = c('cell', 'trt','time'))$trt
# Convert counts to dense matrix
counts(combine_cluster_subset) <- as.matrix(counts(combine_cluster_subset))
# Remove empty levels because glm_gp() will complain otherwise
colData(combine_cluster_subset)$stim2 <- as.factor(colData(combine_cluster_subset)$stim2)
colData(combine_cluster_subset)$stim2 <- relevel(colData(combine_cluster_subset)$stim2, ref = "normal")

fit <- glm_gp(combine_cluster_subset, design = ~ stim2 ,
              reference_level = "normal")
summary(fit)
colnames(fit$Beta)

# The contrast argument specifies what we want to compare
# We test the expression difference of stimulated and control T-cells
#
# There is no sample label in the colData, so we create it on the fly
# from `stim` and `ind` columns in colData(fit$data).
de_res <- test_de(fit, contrast = `stim2PDAC`, 
                  pseudobulk_by = paste0(stim2, "-", rownames(colData(combine_cluster_subset)))) 


# The large `lfc` values come from groups were nearly all counts are 0
# Setting them to Inf makes the plots look nicer
de_res$lfc <- ifelse(abs(de_res$lfc) > 20, sign(de_res$lfc) * Inf, de_res$lfc)

# Most different genes
de_res<-de_res[order(de_res$pval), ]

However, I identified 11752 DEGs out of 14537 genes and I found many genes has df1=1.

head(de_res)
             name pval adj_pval f_statistic df1      df2        lfc
35           Rpl7    0        0    3244.666   1 20005.31 -0.5195890
44           Pi15    0        0    2180.603   1 20005.31 -1.6835958
124         Il1r1    0        0    3883.918   1 20005.31  2.2249821
132          Fhl2    0        0    5068.944   1 20005.31  4.8176423
158           Gls    0        0    1635.089   1 20005.31  0.9511179
164 1700019D03Rik    0        0    3380.809   1 20005.31 -2.7193859

I wondered how could I address this problem? Thanks in advance!

Best,
Kun

@const-ae
Copy link
Owner

Hi KunFang,

thank you for reaching out and for the detailed description of your problem. From a brief look over your code, it seems fine, but that is difficult to say for certain without access to the input data.
If you are concerned that the fitting is wrong, you could fit the model with DESeq2 or edgeR and compare if the results are similar.

Best,
Constantin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants