Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When Sctransform should not be used? #104

Open
ElyasMo opened this issue May 7, 2021 · 3 comments
Open

When Sctransform should not be used? #104

ElyasMo opened this issue May 7, 2021 · 3 comments

Comments

@ElyasMo
Copy link

ElyasMo commented May 7, 2021

Dear ChristophH,
Do you have a tip when we should not proceed with SCTransform. I have several spatial transcriptomics dataset and it has been a few weeks that I am trying to normalize my data through SCTransform. I just figured out that this method does not lead to a correct clustering of the spots while simple normalization works much better after regressing out the ratio of mitochondrial genes. So, I was wondering why SCTransform does not work well for my data while it is a powerful and highly recommended method of normalization.

One more thing is that, I have followed exactly the same workflow and codes in windows and mac and, surprisingly, I received slightly different clusters. How is it possible?

I am just adding my codes which is based on the standard tutorials and I have not added any new things.

Thank you very much in advance.

library(Seurat) 
library(sctransform)


#####Reading in the 10xVisium objects

setwd("D:/Poland/PHD/spatial/Second_set/HE_sample/")
Directory="D:/Poland/PHD/spatial/Second_set/HE_sample/"
HE_sample_2<-Load10X_Spatial(Directory,filename = "filtered_feature_bc_matrix.h5",
                             assay = "Spatial",
                             slice = "HEsmpl2",
                             filter.matrix = TRUE,
                             to.upper = FALSE)

setwd("D:/Poland/PHD/spatial/Second_set/HE_control/")
Directory="D:/Poland/PHD/spatial/Second_set/HE_control/"
HE_control_2<-Load10X_Spatial(Directory,filename = "filtered_feature_bc_matrix.h5",
                              assay = "Spatial",
                              slice = "HEctr2",
                              filter.matrix = TRUE,
                              to.upper = FALSE)

setwd("D:/Poland/PHD/spatial/project/HE/HE_sample_2248/")
Directory="D:/Poland/PHD/spatial/project/HE/HE_sample_2248/"
HE_sample_1<-Load10X_Spatial(Directory,filename = "filtered_feature_bc_matrix.h5",
                             assay = "Spatial",
                             slice = "HEsmpl1",
                             filter.matrix = TRUE,
                             to.upper = FALSE)

setwd("D:/Poland/PHD/spatial/project/HE/HE_control_2339/")
Directory="D:/Poland/PHD/spatial/project/HE/HE_control_2339/"
HE_control_1<-Load10X_Spatial(Directory,filename = "filtered_feature_bc_matrix.h5",
                              assay = "Spatial",
                              slice = "HEctr1",
                              filter.matrix = TRUE,
                              to.upper = FALSE)
#####integration
setwd("D:/Poland/PHD/spatial/project/CR/CR_sample_2248/")
Directory="D:/Poland/PHD/spatial/project/CR/CR_sample_2248/"
CR_sample_1<-Load10X_Spatial(Directory,filename = "filtered_feature_bc_matrix.h5",
                             assay = "Spatial",
                             slice = "CRsmpl1",
                             filter.matrix = TRUE,
                             to.upper = FALSE)

setwd("D:/Poland/PHD/spatial/project/CR/CR_control_2339/")
Directory="D:/Poland/PHD/spatial/project/CR/CR_control_2339/"
CR_control_1<-Load10X_Spatial(Directory,filename = "filtered_feature_bc_matrix.h5",
                              assay = "Spatial",
                              slice = "CRctr1",
                              filter.matrix = TRUE,
                              to.upper = FALSE)
setwd("D:/Poland/PHD/spatial/Second_set/CR_sample/")
Directory="D:/Poland/PHD/spatial/Second_set/CR_sample/"
CR_sample_2<-Load10X_Spatial(Directory,filename = "filtered_feature_bc_matrix.h5",
                             assay = "Spatial",
                             slice = "CRsmpl2",
                             filter.matrix = TRUE,
                             to.upper = FALSE)
setwd("D:/Poland/PHD/spatial/Second_set/CR_control/")
Directory="D:/Poland/PHD/spatial/Second_set/CR_control/"
CR_control_2<-Load10X_Spatial(Directory,filename = "filtered_feature_bc_matrix.h5",
                              assay = "Spatial",
                              slice = "CRctr2",
                              filter.matrix = TRUE,
                              to.upper = FALSE)



##adding the percentage of MT genes to metadata
CR_control_1[["percent.mt"]] <- PercentageFeatureSet(CR_control_1, pattern = "^MT-")
CR_sample_1[["percent.mt"]] <- PercentageFeatureSet(CR_sample_1, pattern = "^MT-")
CR_control_2[["percent.mt"]] <- PercentageFeatureSet(CR_control_2, pattern = "^MT-")
CR_sample_2[["percent.mt"]] <- PercentageFeatureSet(CR_sample_2, pattern = "^MT-")
HE_control_1[["percent.mt"]] <- PercentageFeatureSet(HE_control_1, pattern = "^MT-")
HE_sample_1[["percent.mt"]] <- PercentageFeatureSet(HE_sample_1, pattern = "^MT-")
HE_control_2[["percent.mt"]] <- PercentageFeatureSet(HE_control_2, pattern = "^MT-")
HE_sample_2[["percent.mt"]] <- PercentageFeatureSet(HE_sample_2, pattern = "^MT-")


##Filtering the metadata basd on various criteria
CR_control_1 <- subset(CR_control_1, subset = nFeature_Spatial > 200 & nFeature_Spatial < 7000 & percent.mt < 15)
CR_sample_1 <- subset(CR_sample_1, subset = nFeature_Spatial > 200 & nFeature_Spatial < 7000 & percent.mt < 15 )
CR_control_2 <- subset(CR_control_2, subset = nFeature_Spatial > 200 & nFeature_Spatial < 7000 & percent.mt < 15)
CR_sample_2 <- subset(CR_sample_2, subset = nFeature_Spatial > 200 & nFeature_Spatial < 7000 & percent.mt < 15 )
HE_control_1 <- subset(HE_control_1, subset = nFeature_Spatial > 200 & nFeature_Spatial < 7000 & percent.mt < 15)
HE_sample_1 <- subset(HE_sample_1, subset = nFeature_Spatial > 200 & nFeature_Spatial < 7000 & percent.mt < 15)
HE_control_2 <- subset(HE_control_2, subset = nFeature_Spatial > 200 & nFeature_Spatial < 7000 & percent.mt < 15)
HE_sample_2 <- subset(HE_sample_2, subset = nFeature_Spatial > 200 & nFeature_Spatial < 7000 & percent.mt < 15)
HE_sample_1<-SCTransform(HE_sample_1, assay = "Spatial", verbose = FALSE)
HE_sample_1<- FindVariableFeatures(HE_sample_1, selection.method = "vst", 
                                    nfeatures = 3000, verbose = FALSE)

HE_control_1<-SCTransform(HE_control_1, assay = "Spatial", verbose = FALSE)
HE_control_1<- FindVariableFeatures(HE_control_1, selection.method = "vst", 
                                     nfeatures = 3000, verbose = FALSE)
HE_sample_2<-SCTransform(HE_sample_2, assay = "Spatial", verbose = FALSE)
HE_sample_2<- FindVariableFeatures(HE_sample_2, selection.method = "vst", 
                                    nfeatures = 3000, verbose = FALSE)

HE_control_2<-SCTransform(HE_control_2, assay = "Spatial", verbose = FALSE)
HE_control_2<- FindVariableFeatures(HE_control_2, selection.method = "vst", 
                                     nfeatures = 3000, verbose = FALSE)

CR_sample_1 <- SCTransform(CR_sample_1, assay = "Spatial", verbose = FALSE)
CR_sample_1<- FindVariableFeatures(CR_sample_1, selection.method = "vst", 
                                    nfeatures = 3000, verbose = FALSE)

CR_control_1<-SCTransform(CR_control_1, assay = "Spatial", verbose = FALSE)
CR_control_1<- FindVariableFeatures(CR_control_1, selection.method = "vst", 
                                     nfeatures = 3000, verbose = FALSE)

CR_sample_2 <- SCTransform(CR_sample_2, assay = "Spatial", verbose = FALSE)
CR_sample_2<- FindVariableFeatures(CR_sample_2, selection.method = "vst", 
                                    nfeatures = 3000, verbose = FALSE)

CR_control_2<-SCTransform(CR_control_2, assay = "Spatial", verbose = FALSE)
CR_control_2<- FindVariableFeatures(CR_control_2, selection.method = "vst", 
                                     nfeatures = 3000, verbose = FALSE)
HE_sample_1 <- RunPCA(HE_sample_1, verbose = FALSE)
HE_sample_1 <- RunUMAP(HE_sample_1, dims = 1:30, verbose = FALSE)
HE_sample_1 <- FindNeighbors(HE_sample_1, dims = 1:30, verbose = FALSE)
HE_sample_1 <- FindClusters(HE_sample_1, verbose = FALSE,resolution = 0.3) 

HE_control_1 <- RunPCA(HE_control_1, verbose = FALSE)
HE_control_1 <- RunUMAP(HE_control_1, dims = 1:30, verbose = FALSE)
HE_control_1 <- FindNeighbors(HE_control_1, dims = 1:30, verbose = FALSE)
HE_control_1 <- FindClusters(HE_control_1, verbose = FALSE, resolution = 0.3)

HE_sample_2 <- RunPCA(HE_sample_2, verbose = FALSE)
HE_sample_2 <- RunUMAP(HE_sample_2, dims = 1:30, verbose = FALSE)
HE_sample_2 <- FindNeighbors(HE_sample_2, dims = 1:30, verbose = FALSE)
HE_sample_2 <- FindClusters(HE_sample_2, verbose = FALSE,resolution = 0.3) 

HE_control_2 <- RunPCA(HE_control_2, verbose = FALSE)
HE_control_2 <- RunUMAP(HE_control_2, dims = 1:30, verbose = FALSE)
HE_control_2 <- FindNeighbors(HE_control_2, dims = 1:30, verbose = FALSE)
HE_control_2 <- FindClusters(HE_control_2, verbose = FALSE, resolution = 0.3)

CR_sample_1 <- RunPCA(CR_sample_1, verbose = FALSE)
CR_sample_1 <- RunUMAP(CR_sample_1, dims = 1:30, verbose = FALSE)
CR_sample_1 <- FindNeighbors(CR_sample_1, dims = 1:30, verbose = FALSE)
CR_sample_1 <- FindClusters(CR_sample_1, verbose = FALSE, resolution = 0.3)

CR_control_1 <- RunPCA(CR_control_1, verbose = FALSE)
CR_control_1 <- RunUMAP(CR_control_1, dims = 1:30, verbose = FALSE)
CR_control_1 <- FindNeighbors(CR_control_1, dims = 1:30, verbose = FALSE)
CR_control_1 <- FindClusters(CR_control_1, verbose = FALSE, resolution = 0.3) 

CR_sample_2 <- RunPCA(CR_sample_2, verbose = FALSE)
CR_sample_2 <- RunUMAP(CR_sample_2, dims = 1:30, verbose = FALSE)
CR_sample_2 <- FindNeighbors(CR_sample_2, dims = 1:30, verbose = FALSE)
CR_sample_2 <- FindClusters(CR_sample_2, verbose = FALSE, resolution = 0.3)

CR_control_2 <- RunPCA(CR_control_2, verbose = FALSE)
CR_control_2 <- RunUMAP(CR_control_2, dims = 1:30, verbose = FALSE)
CR_control_2 <- FindNeighbors(CR_control_2, dims = 1:30, verbose = FALSE)
CR_control_2 <- FindClusters(CR_control_2, verbose = FALSE, resolution = 0.3) 
DimPlot(HE_sample_1, reduction = "umap", label = TRUE, pt.size = 1.3) + NoLegend()
DimPlot(HE_control_1, reduction = "umap", label = TRUE, pt.size = 1.3) + NoLegend()
DimPlot(HE_sample_2, reduction = "umap", label = TRUE, pt.size = 1.3) + NoLegend()
DimPlot(HE_control_2, reduction = "umap", label = TRUE, pt.size = 1.3) + NoLegend()
DimPlot(CR_sample_1, reduction = "umap", label = TRUE, pt.size = 1.3) + NoLegend()
DimPlot(CR_control_1, reduction = "umap", label = TRUE, pt.size = 1.3) + NoLegend()
DimPlot(CR_sample_2, reduction = "umap", label = TRUE, pt.size = 1.3) + NoLegend()
DimPlot(CR_control_2, reduction = "umap", label = TRUE, pt.size = 1.3) + NoLegend()

SpatialDimPlot(HE_sample_1, label = FALSE, label.size =2)
SpatialDimPlot(HE_control_1, label = FALSE, label.size =2)
SpatialDimPlot(HE_sample_2, label = FALSE, label.size =2)
SpatialDimPlot(HE_control_2, label = FALSE, label.size =2)
SpatialDimPlot(CR_sample_1, label = FALSE, label.size =2)
SpatialDimPlot(CR_control_1, label = FALSE, label.size =2)
SpatialDimPlot(CR_sample_2, label = FALSE, label.size =2)
SpatialDimPlot(CR_control_2, label = FALSE, label.size =2)

Here is the information from my datasets after pre-processing/filtration:

An object of class Seurat 
36601 features across 3091 samples within 1 assay 
Active assay: Spatial (36601 features, 0 variable features)

An object of class Seurat 
36601 features across 2470 samples within 1 assay 
Active assay: Spatial (36601 features, 0 variable features)

An object of class Seurat 
36601 features across 2328 samples within 1 assay 
Active assay: Spatial (36601 features, 0 variable features)

An object of class Seurat 
36601 features across 2707 samples within 1 assay 
Active assay: Spatial (36601 features, 0 variable features)

An object of class Seurat 
36601 features across 3328 samples within 1 assay 
Active assay: Spatial (36601 features, 0 variable features)

An object of class Seurat 
36601 features across 2439 samples within 1 assay 
Active assay: Spatial (36601 features, 0 variable features)

An object of class Seurat 
36601 features across 2266 samples within 1 assay 
Active assay: Spatial (36601 features, 0 variable features)

An object of class Seurat 
36601 features across 2754 samples within 1 assay 
Active assay: Spatial (36601 features, 0 variable features)

And here is my sessionInfo:

R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cowplot_1.1.1      RCurl_1.98-1.2     forcats_0.5.1      stringr_1.4.0      dplyr_1.0.5        purrr_0.3.4       
 [7] readr_1.4.0        tidyr_1.1.3        tibble_3.1.0       ggplot2_3.3.3      tidyverse_1.3.0    sctransform_0.3.2 
[13] SeuratObject_4.0.0 Seurat_4.0.0      

loaded via a namespace (and not attached):
  [1] Rtsne_0.15           colorspace_2.0-0     deldir_0.2-10        ellipsis_0.3.1       ggridges_0.5.3       fs_1.5.0            
  [7] rstudioapi_0.13      spatstat.data_2.0-0  leiden_0.3.7         listenv_0.8.0        farver_2.1.0         ggrepel_0.9.1       
 [13] bit64_4.0.5          lubridate_1.7.10     RSpectra_0.16-0      fansi_0.4.2          xml2_1.3.2           codetools_0.2-16    
 [19] splines_4.0.2        knitr_1.31           polyclip_1.10-0      jsonlite_1.7.2       broom_0.7.5          ica_1.0-2           
 [25] dbplyr_2.1.0         cluster_2.1.1        png_0.1-7            uwot_0.1.10          shiny_1.6.0          compiler_4.0.2      
 [31] httr_1.4.2           backports_1.2.1      assertthat_0.2.1     Matrix_1.2-18        fastmap_1.1.0        lazyeval_0.2.2      
 [37] cli_2.3.1            limma_3.44.3         later_1.1.0.1        htmltools_0.5.1.1    tools_4.0.2          igraph_1.2.6        
 [43] gtable_0.3.0         glue_1.4.2           RANN_2.6.1           reshape2_1.4.4       Rcpp_1.0.6           spatstat_1.64-1     
 [49] scattermore_0.7      cellranger_1.1.0     vctrs_0.3.6          nlme_3.1-148         lmtest_0.9-38        xfun_0.21           
 [55] globals_0.14.0       rvest_1.0.0          mime_0.10            miniUI_0.1.1.1       lifecycle_1.0.0      irlba_2.3.3         
 [61] goftest_1.2-2        future_1.21.0        MASS_7.3-51.6        zoo_1.8-8            scales_1.1.1         hms_1.0.0           
 [67] promises_1.1.1       spatstat.utils_2.0-0 parallel_4.0.2       RColorBrewer_1.1-2   yaml_2.2.1           reticulate_1.18     
 [73] pbapply_1.4-3        gridExtra_2.3        rpart_4.1-15         stringi_1.5.3        bitops_1.0-6         rlang_0.4.10        
 [79] pkgconfig_2.0.3      matrixStats_0.58.0   evaluate_0.14        lattice_0.20-41      ROCR_1.0-11          tensor_1.5          
 [85] patchwork_1.1.1      htmlwidgets_1.5.3    labeling_0.4.2       bit_4.0.4            tidyselect_1.1.0     parallelly_1.23.0   
 [91] RcppAnnoy_0.0.18     plyr_1.8.6           magrittr_2.0.1       R6_2.5.0             generics_0.1.0       DBI_1.1.1           
 [97] withr_2.4.1          haven_2.3.1          pillar_1.5.1         mgcv_1.8-31          fitdistrplus_1.1-3   survival_3.1-12     
[103] abind_1.4-5          future.apply_1.7.0   modelr_0.1.8         crayon_1.4.1         hdf5r_1.3.3          KernSmooth_2.23-17  
[109] utf8_1.2.1           plotly_4.9.3         rmarkdown_2.7        readxl_1.3.1         grid_4.0.2           data.table_1.14.0   
[115] reprex_1.0.0         digest_0.6.27        xtable_1.8-4         httpuv_1.5.4         munsell_0.5.0        viridisLite_0.3.0
@ChristophH
Copy link
Collaborator

Hi,

I cannot comment on your specific results without actually seeing them and understanding why you think sctransform normalization does not lead "to a correct clustering of the spots".

But I notice that you run FindVariableFeatures after SCTransform. That step is not necessary (and might be a disadvantage) since SCTransform already finds and sets the variable features.

@jiangpuxuan
Copy link

jiangpuxuan commented Jan 13, 2022

I met the very simmilar problem with you. My dataset have many doublets, and the heterotype doublets formed a long 'bridge' between clusters.
The sctransform pipeline cannot tell 'bridge' from normal clusters, but the seurat intrinstic 'Normalize' worked well.

@oghzzang
Copy link

oghzzang commented Aug 7, 2023

Dear @ElyasMo

Why didn't you use SCT integration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants