Problem in using sctransform in Seurat #8816

twb2540 · 2024-04-24T04:47:18Z

I've recently started merging 36 datasets using SCTransform and removing batch effects using harmony.

Here's what I have been up to:

# Creating a list of samples
sample_list <- list(sample1, sample2, sample3, ...)

# Applying SCTransform using lapply
sample_list <- lapply(sample_list, function(seurat_obj) {
  seurat_obj <- SCTransform(seurat_obj, vars.to.regress = "percent.mt", verbose = FALSE)
  return(seurat_obj)
})

gastric_merge <- merge(sample1, y = c(sample2, sample3, sample4, sample5, 
sample6, sample7, sample8, sample9, sample10, 
sample11, sample12, sample13, sample14, sample15,	
sample16, sample17, sample18, sample21, sample22, 
sample23, sample24, sample25, sample26, sample27, 
sample28, sample29, sample30, sample31, sample32, 
sample33, sample34, sample35, sample36, sample39, sample40), merge.data = TRUE)

# Selecting integration features
gc.features <- SelectIntegrationFeatures(object.list = sample.list, nfeatures = 5000)
VariableFeatures(gastric_merge) <- gc.features

# Running PCA, Harmony, UMAP, and clustering
gastric_merge <- RunPCA(object = gastric_merge, assay = "SCT", npcs = 50)
gastric_merge <- RunHarmony(object = gastric_merge, assay.use = "SCT", reduction = "pca", dims.use = 1:30, group.by.vars = "patient_id", plot_convergence = TRUE)
gastric_merge <- RunUMAP(object = gastric_merge, assay = "SCT", reduction = "harmony", dims = 1:30)
gastric_merge <- FindNeighbors(object = gastric_merge, assay = "SCT", reduction = "harmony", dims = 1:30)
gastric_merge <- FindClusters(object = gastric_merge, resolution = 0.4)

However, a few points remain unclear to me:

(1) Regarding visualization with the DoHeatmap function, I've noticed that it uses data from the scale.data slot, which only contains variable genes. However, not all of the genes on my list are included in the variable genes. As a result, these genes are missing from my heatmap.

Would it be appropriate to rescale the SCT assay data (using the ScaleData function) before using the DoHeatmap function to include these non-variable genes in the heatmap?

(2) Although I specified 5000 genes as integration features,

gc.features <- SelectIntegrationFeatures(object.list = sample.list, nfeatures = 5000)

the integration object finally contains scale.data with 5135 genes, which does not match the var.feature count that I had specified.

Could someone please clarify if there is an error on my part or if there's a misunderstanding?

(3) For additional analysis, such as running the AverageExpression function and conducting differential expression analysis with the FindMarkers function, should I specify the assay as RNA or SCT? Additionally, which slot data should I use for these functions?

Thank you,
BJ

The text was updated successfully, but these errors were encountered:

igrabski · 2024-05-03T20:03:39Z

Hi BJ, to answer your questions:

(1) To address this, when running SCTransform, you can set return.only.var.genes = F, so that you can also compute Pearson residuals for genes beyond the variable features.

(2) We will investigate this behavior!

(3) To clarify, if you are wanting to run pseudobulk-based DE via AggregateExpression followed by FindMarkers, you should be using the counts slot and set the test in FindMarkers to DESeq2.

igrabski closed this as completed May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem in using sctransform in Seurat #8816

Problem in using sctransform in Seurat #8816

twb2540 commented Apr 24, 2024 •

edited

igrabski commented May 3, 2024

Problem in using sctransform in Seurat #8816

Problem in using sctransform in Seurat #8816

Comments

twb2540 commented Apr 24, 2024 • edited

igrabski commented May 3, 2024

twb2540 commented Apr 24, 2024 •

edited