-
Notifications
You must be signed in to change notification settings - Fork 0
Home
STIE: Single-cell level deconvolution, convolution and clustering in spatial transcriptomics by aligning spatial transcriptome to the nuclear morphology
- STIE dependent packages
- STIE input
- Nucleus segmentation based on H&E image
- Single-cell level deconvolution in Spatial transcriptomics
- Single-cell level clustering in Spatial transcriptomics
- Spatially resolved cell-cell interaction
- Bona fide spot size
𝝲
- Grid search for
𝝺
and nuclear morphological features
The 10X Visium spot-based spatial transcriptome profiling, due to its higher throughput, less custom protocol, and steadily increasing spot resolution, has been the most predominant commercially available technique. However, by using both real and simulated spatial transcriptomic data, we found that regardless of resolution (up to 5 μm in spot diameter), a large proportion of spots still partially cover multiple cells simultaneously, suggesting the intrinsic non-single-cell level of the spot-based spatial transcriptomics. To this end, we present STIE, an EM algorithm that aligns the spatial transcriptome data to the matched histology image-based nucleus segmentation, mutually draws information from spatial transcriptome and nuclear morphology to optimize their models jointly, and recovers missing cells from up to ~70% gap area between spots via neighborhood information and morphological similarity, thereby enabling the real single-cell level and whole-slide scale deconvolution, convolution and clustering for both low- and high-resolution spots. As opposed to the spot/subspot level clustering, whose signature is a mixture of cell types and can even miss cell types, the signature by STIE clustering largely matches the single cell type derived from single-cell RNAseq data. Further, the resolved single cells with spatial information retained, enabled us to address the critical relevant questions, including bona fide spot-capturing area, additional contribution of cellular morphology to cell typing, spatial cell type colocalization, and spatial cell-cell interaction.
library(STIE)
#### for manipulating images
library(magick)
library(EBImage)
#### for manipulating ST gene expression
library(Seurat)
#### for the quadratic programming
library("quadprog")
#### for spatially resolved cell-cell interaction
library(CellChat)
library(NMF)
library(ggalluvial)
STIE has two hyperparameters, 𝝺 and 𝝲, representing the shrinkage penalty for nuclear morphology, and the area surrounding the center of the spot, respectively.
-
𝝺
this parameter penalizes the difference between gene expression- and nuclear morphology-predicted cell type proportion. STIE chooses 𝝺 to tweak the contribution of gene expression and cellular morphology to the final model fitting. Please see its selection from here. -
𝝲
by modulating this parameter, STIE aims to find the group of cells which are covered by the spot and can best fit the gene expression of the spot. Please see its selection from here.
STIE takes the follows as input:
-
nuclear coordinates and morphology
the spatial coordinates and morphological features of the nuclei -
spot-level gene expression
the gene expression on spots -
cell-type transcriptomic signature
the cell-type transcriptomic signature derived from scRNA-seq data
Before running cell segmentation and spatial transcriptome data preprocessing, if users only want to test the code but have not installed imageJ or spaceranger yet, we have provided the testing data, and users can directly jump to Section of deconvolution.
STIE takes as input (but is not limited to) the nucleus segmentation implemented in the DeepImageJ, called “Multi-Organ Nucleus Segmentation model.” Please see here for the installation.
The location of ImageJ/Fiji.app/ImageJ-linux64
in the above installation will be used in the run_imageJ_plugin()
function to run the imageJ macro. Moreover, the user should specify a folder split_dir
to save the intermediate results.
Two high-resolution images used as examples, CytAssist_FFPE_Mouse_Brain_Rep1_tissue_image.tif
and 'CytAssist_FFPE_Mouse_Brain_Rep2_tissue_image.tif' of two mouse brain consecutive sections are downloaded from the 10X Genomics public data: section 1 and for the mouse brain section 2.
library(magick)
library(EBImage)
## split image & run imageJ plugin
split_image(image=image_path, split_dir=split_dir,
w=3000, h=3000, margin=100, x_scale=1 )
## if spot_coordinates specified, then only split image covered by the spots
# split_image(image=image_path, split_dir=split_dir,
# spot_coordinates = spot_coordinates,
# w=3000, h=3000, margin=100, x_scale=1 )
run_imageJ_plugin(
imageJ = "/archive/SCCC/Hoshida_lab/s184554/Code/github/ImageJ/Fiji.app/ImageJ-linux64",
plugin_macro = NULL,
split_image_dir = split_dir,
pattern="jpg$" )
cell_info <- merge_feature( split_dir )
morphology_fts <- cell_info$cell_feature
cell_contour <- cell_info$cell_contour
morphology_fts$pixel_x <- morphology_fts$X
morphology_fts$pixel_y <- morphology_fts$Y
## merge_feature may take some time, so, it is recommended to save it,
## and load it once the users rerun the STIE later on
save(cell_info, file=paste0(split_dir,"/cell_info.RData"))
We take the mouse brain hippocampus as an example. The 10X CytAssist Spatial Gene Expression (V2 Chemistry) FFPE data is downloaded from the 10x Genomics public dataset: section 1 and section 2. The raw data 10X Visium FFPE has been preprocessed using spaceranger (>=2.0). The preprocessed data will be generated in the count/sample/outs directory. As a testing, we have saved the spot-level gene expression, nuclear morphological information, and the cell-type transcriptomic signature in the STIE package.
STIE.dir = system.file(package = "STIE")
nn = load( paste0(STIE.dir,"/data/MouseBrainHippocampus_10xV2ChemistryCytAssistFFPE_section1n2.RData") )
nn
#> nn
#> [1] "ST_expr" "spot_coordinates" "cell_contour" "morphology_fts" "cells_on_spot" # the data for the section1
#> [2] "ST_expr_s2" "spot_coordinates_s2" "cell_contour_s2" "morphology_fts_s2" "cells_on_spot_s2" # the data for the section2
#> [3] "Signature" # the transcriptomic signature
Alternatively, users can load the data from the spaceranger output using the following code:
library(Seurat)
### the location of spot coordinates and spot gene expression
spotfile <- "outs/spatial/tissue_positions_list.csv"
matrix_path <- "outs/filtered_feature_bc_matrix.h5"
spot_coordinates <- read.csv(file = spotfile, header = FALSE,
col.names=c("barcode","tissue","row","col","imagerow","imagecol"))
#### switch X and Y coordinates to match the direction of the image
spot_coordinates <- data.frame(spot_coordinates, pixel_x=spot_coordinates$imagecol, pixel_y=spot_coordinates$imagerow)
ST_expr <- as.data.frame(t(as.matrix(Read10X_h5(matrix_path))))
spot_coordinates <- subset(spot_coordinates, barcode%in%rownames(ST_expr) )
ST_expr <- ST_expr[match( as.character(spot_coordinates$barcode), rownames(ST_expr) ),]
head(spot_coordinates, 3)
#> barcode tissue row col imagerow imagecol pixel_x pixel_y
#> 1492 CTGGTGATCGCCGTAG-1 1 23 39 12468 21839 379.5887 216.7092
#> 1493 CACTCAATAAGCCGAA-1 1 23 41 12469 21474 373.2445 216.7265
#> 1494 GCTATGCGTGAACGGT-1 1 23 43 12469 21109 366.9004 216.7265
dim(ST_expr)
#>[1] 189 19465
Based on the cell spatial location and the spot coordinates, we can find the cells covered by each spot. Each row in the data frame cells_on_spot
is a single cell, where you can find its nuclear spatial location, the values of nuclear morphological features, and the spot id covering that cell.
#### calculate spot radius based on fct = spot_radius / spot_center_distance
fct <- (55/2)/100
spot_radius <- calculate_spot_radius(spot_coordinates, fct)
#### get cells within each spot based on cellular coordinates and spot coordinates
cells_on_spot <- get_cells_on_spot( cell_coordinates=morphology_fts, spot_coordinates, 2.5*spot_radius)
head(cells_on_spot, 3)
#> cell_id X.1 Area Mean StdDev Mode Min Max X Y XM
#> 24419 7 0.116 192.194 16.883 200 75 226 21760.58 12399.34 7.917
#> 24434 22 0.113 185.348 19.670 195 36 216 21869.53 12456.56 9.054
#> 24449 37 0.148 202.901 17.459 212 102 237 21737.52 12541.18 7.676
#> YM Perim. BX BY Width Height Major Minor Angle Circ. Feret
#> 4.156 1.221 7.719 3.958 0.396 0.385 0.410 0.360 36.679 0.977 0.418
#> 4.746 1.219 8.865 4.531 0.375 0.417 0.414 0.349 58.532 0.959 0.426
#> 5.629 1.383 7.469 5.385 0.427 0.469 0.459 0.410 70.572 0.973 0.471
#> IntDen Median Skew Kurt X.Area RawIntDen Ch FeretX FeretY FeretAngle
#> 22.252 196 -2.110 8.002 100 205071 1 745.14 412.86 45.000
#> 21.017 189 -2.136 8.560 100 193689 1 857.63 473.27 61.655
#> 30.030 205 -2.308 8.134 100 276757 1 724.64 559.74 61.770
#> MinFeret AR Round Solidity Eccentricity pixel_x pixel_y size
#> 0.364 1.139 0.878 0.988 0.4785711 378.2255 215.5158 -3.250686
#> 0.350 1.186 0.843 1.005 0.5379211 380.1194 216.5104 -3.265909
#> 0.407 1.119 0.893 1.005 0.4495678 377.8248 217.9812 -4.792136
#> shape angle spot dist
#> -1.1183285 1.3427841 CTGGTGATCGCCGTAG-1 1.8117090
#> -0.8075642 0.8167307 CTGGTGATCGCCGTAG-1 0.5667226
#> -1.1569367 0.6528751 CTGGTGATCGCCGTAG-1 2.1746697
Load mouse brain hippocampus scRNA-seq-derived cell type transcriptomic signatures. We run STIE deconvolution by setting known_signature=TRUE
and known_cell_types=FALSE
. We set lambda=0
and steps=30
for EM algorithm iteration.
#### Users could define the morphological features based on specific prior knowledge
#### In this example, we use "shape" as the morphological feature
features = c("shape")
#### run STIE
library("quadprog")
result_deconv <- STIE(ST_expr, Signature, cells_on_spot, features,
lambda=0, steps=30, known_signature=TRUE, known_cell_types=FALSE)
result_deconv2 <- STIE(ST_expr_s2, Signature, cells_on_spot_s2, features,
lambda=0, steps=30, known_signature=TRUE, known_cell_types=FALSE)
names(result_deconv)
#> [1] "lambda" "mu" "sigma" "PE_on_spot" "PM_on_cell"
#> [6] "PME_uni_cell" "cell_types" "uni_cell_types" "Signature" "cells_on_spot"
names(result_deconv2)
#> [1] "lambda" "mu" "sigma" "PE_on_spot" "PM_on_cell"
#> [6] "PME_uni_cell" "cell_types" "uni_cell_types" "Signature" "cells_on_spot"
Users can use the plot_sub_image()
function to overlay the single cells along with their cell types onto the image (downloaded from section 1 and section 2). Some of the useful visualization parameters are listed as following:
-
x_scale
a numeric value from 0 to 1, representing the scaling factor for resizing image -
w
a numeric value representing the width for image cropping -
h
a numeric value representing the height for image cropping -
xoff
a numeric value representing the x-axis offset for image cropping -
yoff
a numeric value representing the y-axis offset for image cropping -
plot_cell
a boolean value representing whether to plot the cell contour -
color_use
a vector of character values representing the colors to plot the cell contour -
plot_spot
a boolean value representing whether to plot the spot -
spot_cols
a vector of character values representing the colors of spots -
fill_spot
a boolean value representing whether to fill in the spot -
axis_tick
a numeric values representing the interval between two ticks -
axis_col
a character value representing the color of axis
library(EBImage)
library(magick)
#### read image
# setwd("/archive/SCCC/Hoshida_lab/shared/fastq/SpatialTranscriptome/10X_public_dataset/V2_MouseBrainCoronalSection1_FFPE/")
image_path = 'CytAssist_FFPE_Mouse_Brain_Rep1_tissue_image.tif'
im <- image_read(image_path)[1]
#### the STIE-obtained cell types
cell_types = result_deconv$cell_types
#### subset the cell contour
contour = cell_contour[ match(names(cell_types), names(cell_contour)) ]
#### plot the single cells along with their cell types onto the image
colors = c("magenta", "blue", "green", "black", "orange")
# plot spot coordinates
plot_sub_image(im=im, w=9000, h=5000, xoff=15500, yoff=11500, x_scale=0.2, spot_coordinates=spot_coordinates,
contour=contour, cell_types=cell_types, color_use=colors, plot_spot=T, plot_cell=F)
# plot single cells
plot_sub_image(im=im, w=9000, h=5000, xoff=15500, yoff=11500, x_scale=0.2, spot_coordinates=spot_coordinates,
contour=contour, cell_types=cell_types, color_use=colors, plot_spot=F, plot_cell=T)
Plot the mouse brain section2
#### read image
# setwd("/archive/SCCC/Hoshida_lab/shared/fastq/SpatialTranscriptome/10X_public_dataset/V2_MouseBrainCoronalSection2_FFPE/")
image_path_s2 = 'CytAssist_FFPE_Mouse_Brain_Rep2_tissue_image.tif'
im_s2 <- image_read(image_path_s2)[1]
#### the STIE-obtained cell types
cell_types_s2 = result_deconv2$cell_types
#### subset the cell contour
contour_s2 = cell_contour_s2[ match(names(cell_types_s2), names(cell_contour_s2)) ]
#### plot the single cells along with their cell types onto the image
colors = c("magenta", "blue", "green", "black", "orange")
# plot spot coordinates
plot_sub_image(im=im_s2, w=8200, h=4500, xoff=11000, yoff=7300, x_scale=0.2, spot_coordinates=spot_coordinates_s2,
contour=contour_s2, cell_types=cell_types_s2, color_use=colors, plot_spot=T, plot_cell=F)
# plot single cells
plot_sub_image(im=im_s2, w=8200, h=4500, xoff=11000, yoff=7300, x_scale=0.2, spot_coordinates=spot_coordinates,
contour=contour_s2, cell_types=cell_types_s2, color_use=colors, plot_spot=F, plot_cell=T)
The cells comprise two parts: cells inside and outside the spot. The cells outside the spot are recovered based on the neighborhood information by enlarging the bona file spot size.
#### plot cells inside spots
cells_inside <- get_cells_on_spot( cell_coordinates=morphology_fts, spot_coordinates, spot_radius)
cell_types_inside <- cell_types[ match(cells_inside$cell_id, names(cell_types)) ]
contour_inside = cell_contour[ match(names(cell_types_inside), names(cell_contour)) ]
plot_sub_image(im=im, w=9000, h=5000, xoff=15500, yoff=11500,
x_scale=0.2, spot_coordinates=spot_coordinates,
contour=contour_inside, cell_types=cell_types_inside,
color_use=colors, plot_spot=T, plot_cell=T )
#### plot cells outside spots
cells_outside <- cells_on_spot[ !cells_on_spot$cell_id%in%cells_inside$cell_id, ]
cell_types_outside <- cell_types[ match(cells_outside$cell_id, names(cell_types)) ]
contour_outside = cell_contour[ match(names(cell_types_outside), names(cell_contour)) ]
plot_sub_image(im=im, w=9000, h=5000, xoff=15500, yoff=11500,
x_scale=0.2, spot_coordinates=spot_coordinates,
contour=contour_outside, cell_types=cell_types_outside,
color_use=colors, plot_spot=T, plot_cell=T )
We can investigate the nuclear morphological feature distribution for each cell type.
library(vioplot)
fs = c("Area", "Major", "Minor", "Width", "Height", "Feret", "Perim.", "Round", "Circ.")
par( mfrow=c(3,3) )
par(mar = c(2, 2, 2, 2))
lapply( fs, function(f) {
x = split( cells_on_spot[,f], result_deconv$cell_types )
vioplot(cells_on_spot[,f]~result_deconv$cell_types, col=colors, main=f, xlab=f, rectCol="darkgrey", las=3)
})
Given no cell type transcriptomic signature, STIE can perform cell type clustering at the single-cell level, and meanwhile, estimate the gene expression signature for clusters.
The initial values of clusters are first given using the spot-level clustering, e.g., K-means, Louvain clustering, or SpaGCN, the cells within the spot are assigned the same initial cluster, and the initial value of cluster signature was set to be the average gene expression of spots belonging to the cluster. In each iteration, the cluster signature was re-estimated in the M-step, and the cluster of each single cell was re-assigned in the E-step.
In the following example, we take the spot-level cluster at k=5 as the initial value and run STIE with by setting known_signature
and known_cell_types
as FALSE
.
#### choose Kmeans (k=5) on spot-level gene expression
pc = prcomp(ST_expr)$x[,1:10]
set.seed(1234)
cluster = kmeans(pc,5)$cluster
cluster = data.frame( Barcode=names(cluster), Cluster=cluster )
cluster = cluster[ match( as.character(spot_coordinates$barcode), as.character(cluster$Barcode)), ]
head(cluster)
#> Barcode Cluster
#> CTGGTGATCGCCGTAG-1 3
#> CACTCAATAAGCCGAA-1 3
#> GCTATGCGTGAACGGT-1 3
#> CAGTTGCTCACGTGTC-1 3
#> TCAGTATGTAGGACAA-1 3
#> ACCAAGTGATGGTGAG-1 3
#### plot the Kmeans clustering on the spot
colors2 = c("green", "black", "magenta", "orange", "blue")
spot_cols = colors2[ cluster$Cluster ]
plot_sub_image(im=im, w=9000, h=5000, xoff=15500, yoff=11500,
x_scale=0.2, spot_coordinates=spot_coordinates,
plot_spot=T, plot_cell=F, spot_cols=spot_cols, fill_spot=T )
#### take the cluster average gene expression as the initial value of the cluster signature
ST_expr_ini = ST_expr[ match(as.character(cluster[,1]),rownames(ST_expr)), ]
Signature_ini = t(apply(ST_expr_ini, 2, function(x) tapply(x,cluster[,2],mean) ))
#### run STIE using "Signature_ini" as an initial value and iteratively refine "Signature"
#### by setting both "known_signature=FALSE" and "known_cell_types=FALSE"
result_cluster = STIE(ST_expr, Signature_ini, cells_on_spot, features, lambda=0, steps=30,
known_signature=FALSE, known_cell_types=FALSE)
#### the STIE-obtained cell types
cell_types = result_cluster$cell_types
#### subset the cell contour
contour2 = cell_contour[ match(names(cell_types), names(cell_contour)) ]
#### plot the single cells along with their cell types onto the image
plot_sub_image(im=im, w=9000, h=5000, xoff=15500, yoff=11500,
x_scale=0.2, spot_coordinates=spot_coordinates,
contour=contour2, cell_types=cell_types, color_use=colors2,
plot_spot=F, plot_cell=T)
Given the STIE-obtained single cells with spatial information retained, we further investigated the spatially resolved cell-cell interaction on the tissue. We adapted CellChat, a toolkit and a database of interactions among ligands, receptors, and their cofactors, to quantitatively infer the cell-cell interaction, and meanwhile, take into account the cell spatial information.
We first calculate the distance between cells. Here, since the cells were detected very sparsely, we use a relatively large cutoff, 20um to incorporate more potential cell-cell interactions. Users can specify other cutoffs depending on the specific biological mechanism.
#### get cells within each spot based on cellular coordinates and spot coordinates
ref_1um = spot_radius*2/55
#### calculate cell-cell distance
STIE_result = result_deconv
cell_dist = calculate_cell_dist(cells_on_spot = STIE_result$cells_on_spot,
dist_type="boundary",
dist_cutoff=20*ref_1um,
axis="Major" )
#### convert the distance into the unit of nm
cell_dist$d_nm = cell_dist$d/ref_1um
We adapted the cellchat to fit the spatial transcriptomics data to determine the spatially resolved cell type interaction. Here, we set database="mouse"
, to specify both the mouse STRINGdb protein-protein interaction database and the mouse CellChat ligand-receptor database.
library(CellChat)
#### run cellchat on the resolved single-cell level spatial transcriptome data
cellchat <- get_STcellchat(STIE_result, ST_expr, database="mouse", db_category=NULL, max_reps=NULL )
#### show the top sending and receiving pathways, through which the cell types communicate
ht1 <- netAnalysis_signalingRole_heatmap(cellchat, color.use=colors, pattern = "outgoing")
ht2 <- netAnalysis_signalingRole_heatmap(cellchat, color.use=colors, pattern = "incoming")
ht1[1:30,] + ht2[1:30,]
## draw the Circos plot of static cell-cell interaction weights/strength
## and spatially resolved cell-cell interaction weights/strength
netVisual_STcellchat(cellchat, cell_dist, STIE_result$cell_types, color_use=colors)
To simplify the interpretation of complex intercellular interactions, we used non-negative matrix factorization (NMF) implemented by CellChat to identify the global interaction patterns beyond only exploring individual pathways. Both Cophenetic and Silhouette metrics begin to drop suddenly when the number of sending and receiving patterns are 5, suggesting the suitable number of patterns.
library(NMF)
library(ggalluvial)
## plot Cophenetic and Silhouette metrics to select the suitable number of patterns
selectK(cellchat, pattern = "outgoing")
selectK(cellchat, pattern = "incoming")
The heatmap of the contribution of pathways (up) and cell types (bottom) to the pattern of sender and receiver.
## The sending patterns at k=5
plot.new()
cellchat <- identifyCommunicationPatterns(cellchat, pattern = "outgoing", k = 5, color.use=colors)
## The receiving patterns at k=5
plot.new()
cellchat <- identifyCommunicationPatterns(cellchat, pattern = "incoming", k = 5, color.use=colors)
We plot the global sending and receiving patterns.
## plot the sending patterns, where patterns c(5,1,4,3,2) correspond
## to the cell type, CA1, CA2, CA3, DG and Glia, respectively
for(i in 1:c(5,1,4,3,2))
{
plot_STcellchat(cellchat, cell_dist, w=9000, h=5000, xoff=15500, yoff=11500,
im, x_scale=0.1, lwd=20, plot_object=paste0("Pattern ",i),
direction="outgoing", color_use=colors )
}
## plot the receiving patterns, where patterns c(4,1,5,3,2) correspond
## to the cell type, CA1, CA2, CA3, DG and Glia, respectively
for(i in 1:c(4,1,5,3,2))
{
plot_STcellchat(cellchat, cell_dist, w=9000, h=5000, xoff=15500, yoff=11500,
im, x_scale=0.1, lwd=20, plot_object=paste0("Pattern ",i),
direction="incoming", color_use=colors )
}
## We plot the sending and receiving signals for specific pathways of interest.
pathway.show <- c("NRXN", "CADM", "PTN")
## sending pathway signal
for(i in 1:length(pathway.show))
{
plot_STcellchat(cellchat, cell_dist,
w=9000, h=5000, xoff=15500, yoff=11500,
im, x_scale=0.1, lwd=20,
plot_object=pathway.show[i],
direction="outgoing", color_use=colors)
## receiving pathway signal
for(i in 1:length(pathway.show))
{
plot_STcellchat(cellchat, cell_dist,
w=9000, h=5000, xoff=15500, yoff=11500,
im, x_scale=0.1, lwd=20,
plot_object=pathway.show[i],
direction="outgoing", color_use=colors)
}
The hyperparameter 𝝲
aims to model the actual area covered by the spot, which may differ from the reported spot size. The difference might result from the sample leaking from the spot area to the surroundings. We select 𝝲
as the area giving the best concordance between the prediction and true spatial gene expression, which is measured by the RMSE. We evaluated 𝝲
on the 10X Visium V2 Chemistry CytAssist mouse brain hippocampus spatial transcriptomics and observed similar patterns across different λ
, all of which reach the minimum around γ=2.5x reported spot size (55 μm in diameter).
ratios = seq(0.5,5,0.5) # range of 𝝲
lambdas=c(0,1e1,1e2,1e3,1e4,1e5,1e6) # range of λ
scores_las = list()
for(i in 1:length(lambdas))
{
results = list()
for(j in 1:length(ratios))
{
lai = lambdas[i]
spotj = ratios[j]*spot_radius
# based on the bona fide spot size, we recalculate the cells on each spot
cells_on_spotj <- get_cells_on_spot( cell_coordinates=morphology_fts, spot_coordinates=spot_coordinates, spot_radius=spotj)
# run STIE deconvolution based on the new cells_on_spot
results[[j]] <- STIE(ST_expr, Signature, cells_on_spotj, features=c('shape'),
lambda=lai, steps=30, known_signature=TRUE, known_cell_types=FALSE )
}
# recalculate the scores for each result
scores_las[[i]] = lapply( results, function(x) get_summary(x, ST_expr) )
names(scores_las[[i]]) = names(results) = ratios
}
mses = lapply(scores_las, function(scores) lapply(scores, function(s) sqrt(s$mse) ))
############################################################
##### Visualization
############################################################
par(mfrow=c(1,1))
par(mar=c(3,3,3,3))
cols = get_my_colors(length(scores_las), mode=2)
barx = as.numeric(names(scores_las[[1]]))*2
errplot(mses[[1]],barx=barx, col=cols[1], xlab='bona file spot size (x55um)', ylab="RMSE")
lapply( 2:length(mses), function(i) errplot(mses[[i]],barx=barx+(i-1)*0.1, line=F, col=cols[i]) )
legend( "topright", legend=lambdas, col=cols, pch=16 )
The hyperparameter λ balances the information from gene expression and morphological features.
- Given the predefined morphological features, we select λ by evaluating the balance between two criteria: RMSE of the spatial gene expression fitting and log-likelihood of the morphological feature fitting.
- To select the morphological features, under a predefined λ, we rank the nuclear morphological features based on the RMSE by running STIE on each feature individually. Next, based on their ranking, we used a greedy strategy to gradually add more features to the model. Like the selection of λ, the best morphological features are selected by evaluating the gene expression and morphological fittings simultaneously.
To select the best combination of morphological features and λ, we investigated the morphological features over different λ.
In the real datasets, we extracted multiple morphological features for the nuclei. These features are found to be highly correlated, which is consistent with their definition and mathematical calculation. We focused on two large categories: size (Area, Major, Minor, Width, Height, Feret, and Perimeter) and shape (Round and Circular). To reduce the redundancy and improve efficiency, we performed PCA for each category and took the 1st PC as the surrogate of each category.
We select the best combination of the morphological feature and λ simultaneously via the grid search, which is implemented using the R function STIE_search()
. In the mouse brain hippocampus deconvolution and clustering, the feature ‘shape’ was ranked before ‘size’, and ‘shape’ gives the lower RMSE and higher morphological likelihood at small λ, so we used ‘shape’ as the morphological feature and λ=0 (red triangle).
# searching range
lambdas=c(0,1e1,1e2,1e3,1e4,1e5,1e6)
# set RMSE as the criterion to rank the morphological features
# grid search for STIE deconvolution
paths = STIE_search(ST_expr, Signature, cells_on_spot,
steps=30, known_signature=TRUE, known_cell_types=FALSE,
lambdas=lambdas, criterion = "rmse" )
names(paths)
#> [1] "0" "10" "100" "1000" "10000" "1e+05" "1e+06"
names(paths[["0"]])
#> [1] "ordered_features" "results" "scores" "results_ord1" "scores_ord1" "values" "logLik" "logLikMorph" "criterion"
paths[["0"]]$ordered_features
#> [1] "shape" "size"