Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem in using sctransform in Seurat #8816

Closed
twb2540 opened this issue Apr 24, 2024 · 1 comment
Closed

Problem in using sctransform in Seurat #8816

twb2540 opened this issue Apr 24, 2024 · 1 comment

Comments

@twb2540
Copy link

twb2540 commented Apr 24, 2024

I've recently started merging 36 datasets using SCTransform and removing batch effects using harmony.

Here's what I have been up to:

# Creating a list of samples
sample_list <- list(sample1, sample2, sample3, ...)

# Applying SCTransform using lapply
sample_list <- lapply(sample_list, function(seurat_obj) {
  seurat_obj <- SCTransform(seurat_obj, vars.to.regress = "percent.mt", verbose = FALSE)
  return(seurat_obj)
})

gastric_merge <- merge(sample1, y = c(sample2, sample3, sample4, sample5, 
sample6, sample7, sample8, sample9, sample10, 
sample11, sample12, sample13, sample14, sample15,	
sample16, sample17, sample18, sample21, sample22, 
sample23, sample24, sample25, sample26, sample27, 
sample28, sample29, sample30, sample31, sample32, 
sample33, sample34, sample35, sample36, sample39, sample40), merge.data = TRUE)

# Selecting integration features
gc.features <- SelectIntegrationFeatures(object.list = sample.list, nfeatures = 5000)
VariableFeatures(gastric_merge) <- gc.features

# Running PCA, Harmony, UMAP, and clustering
gastric_merge <- RunPCA(object = gastric_merge, assay = "SCT", npcs = 50)
gastric_merge <- RunHarmony(object = gastric_merge, assay.use = "SCT", reduction = "pca", dims.use = 1:30, group.by.vars = "patient_id", plot_convergence = TRUE)
gastric_merge <- RunUMAP(object = gastric_merge, assay = "SCT", reduction = "harmony", dims = 1:30)
gastric_merge <- FindNeighbors(object = gastric_merge, assay = "SCT", reduction = "harmony", dims = 1:30)
gastric_merge <- FindClusters(object = gastric_merge, resolution = 0.4)

However, a few points remain unclear to me:

(1) Regarding visualization with the DoHeatmap function, I've noticed that it uses data from the scale.data slot, which only contains variable genes. However, not all of the genes on my list are included in the variable genes. As a result, these genes are missing from my heatmap.

Would it be appropriate to rescale the SCT assay data (using the ScaleData function) before using the DoHeatmap function to include these non-variable genes in the heatmap?

(2) Although I specified 5000 genes as integration features,

gc.features <- SelectIntegrationFeatures(object.list = sample.list, nfeatures = 5000)

the integration object finally contains scale.data with 5135 genes, which does not match the var.feature count that I had specified.

1

Could someone please clarify if there is an error on my part or if there's a misunderstanding?

(3) For additional analysis, such as running the AverageExpression function and conducting differential expression analysis with the FindMarkers function, should I specify the assay as RNA or SCT? Additionally, which slot data should I use for these functions?

Thank you,
BJ

@igrabski
Copy link
Contributor

igrabski commented May 3, 2024

Hi BJ, to answer your questions:

(1) To address this, when running SCTransform, you can set return.only.var.genes = F, so that you can also compute Pearson residuals for genes beyond the variable features.

(2) We will investigate this behavior!

(3) To clarify, if you are wanting to run pseudobulk-based DE via AggregateExpression followed by FindMarkers, you should be using the counts slot and set the test in FindMarkers to DESeq2.

@igrabski igrabski closed this as completed May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants