Skip to content
Shijia Zhu edited this page Feb 26, 2023 · 114 revisions

STIE: Single-cell level deconvolution, convolution and clustering in spatial transcriptomics by aligning spatial transcriptome to the nuclear morphology

Contents

  1. STIE dependent packages
  2. STIE input
  3. Nucleus segmentation based on H&E image
  4. Single-cell level deconvolution in Spatial transcriptomics
  5. Single-cell level clustering in Spatial transcriptomics
  6. Spatially resolved cell-cell interaction
  7. Bona fide spot size 𝝲
  8. Grid search for 𝝺 and nuclear morphological features

Description

The 10X Visium spot-based spatial transcriptome profiling, due to its higher throughput, less custom protocol, and steadily increasing spot resolution, has been the most predominant commercially available technique. However, by using both real and simulated spatial transcriptomic data, we found that regardless of resolution (up to 5 μm in spot diameter), a large proportion of spots still partially cover multiple cells simultaneously, suggesting the intrinsic non-single-cell level of the spot-based spatial transcriptomics. To this end, we present STIE, an EM algorithm that aligns the spatial transcriptome data to the matched histology image-based nucleus segmentation, mutually draws information from spatial transcriptome and nuclear morphology to optimize their models jointly, and recovers missing cells from up to ~70% gap area between spots via neighborhood information and morphological similarity, thereby enabling the real single-cell level and whole-slide scale deconvolution, convolution and clustering for both low- and high-resolution spots. As opposed to the spot/subspot level clustering, whose signature is a mixture of cell types and can even miss cell types, the signature by STIE clustering largely matches the single cell type derived from single-cell RNAseq data. Further, the resolved single cells with spatial information retained, enabled us to address the critical relevant questions, including bona fide spot-capturing area, additional contribution of cellular morphology to cell typing, spatial cell type colocalization, and spatial cell-cell interaction.

1. Load STIE dependent tools and R packages

library(STIE)

#### for manipulating images
library(magick)
library(EBImage)

#### for manipulating ST gene expression 
library(Seurat)

#### for the quadratic programming 
library("quadprog")

#### for spatially resolved cell-cell interaction
library(CellChat)
library(NMF)
library(ggalluvial)

Back to above

2. STIE input

STIE has two hyperparameters, 𝝺 and 𝝲, representing the shrinkage penalty for nuclear morphology, and the area surrounding the center of the spot, respectively.

  1. 𝝺 this parameter penalizes the difference between gene expression- and nuclear morphology-predicted cell type proportion. STIE chooses 𝝺 to tweak the contribution of gene expression and cellular morphology to the final model fitting. Please see its selection from here.
  2. 𝝲 by modulating this parameter, STIE aims to find the group of cells which are covered by the spot and can best fit the gene expression of the spot. Please see its selection from here.

STIE takes the follows as input:

  • nuclear coordinates and morphology the spatial coordinates and morphological features of the nuclei
  • spot-level gene expression the gene expression on spots
  • cell-type transcriptomic signature the cell-type transcriptomic signature derived from scRNA-seq data

Back to above


Before running cell segmentation and spatial transcriptome data preprocessing, if users only want to test the code but have not installed imageJ or spaceranger yet, we have provided the testing data, and users can directly jump to Section of deconvolution.

3. Nucleus segmentation based on H&E image

STIE takes as input (but is not limited to) the nucleus segmentation implemented in the DeepImageJ, called “Multi-Organ Nucleus Segmentation model.” Please see here for the installation.

The location of ImageJ/Fiji.app/ImageJ-linux64 in the above installation will be used in the run_imageJ_plugin() function to run the imageJ macro. Moreover, the user should specify a folder split_dir to save the intermediate results.

Two high-resolution images used as examples, CytAssist_FFPE_Mouse_Brain_Rep1_tissue_image.tif and 'CytAssist_FFPE_Mouse_Brain_Rep2_tissue_image.tif' of two mouse brain consecutive sections are downloaded from the 10X Genomics public data: section 1 and for the mouse brain section 2.

library(magick)
library(EBImage)

## split image & run imageJ plugin
split_image(image=image_path, split_dir=split_dir, 
            w=3000, h=3000, margin=100, x_scale=1 )

## if spot_coordinates specified, then only split image covered by the spots
# split_image(image=image_path, split_dir=split_dir, 
#             spot_coordinates = spot_coordinates,
#             w=3000, h=3000, margin=100, x_scale=1 )

run_imageJ_plugin(
        imageJ = "/archive/SCCC/Hoshida_lab/s184554/Code/github/ImageJ/Fiji.app/ImageJ-linux64",
        plugin_macro = NULL,
        split_image_dir = split_dir, 
        pattern="jpg$" )

cell_info <- merge_feature( split_dir )
morphology_fts <- cell_info$cell_feature
cell_contour <- cell_info$cell_contour
morphology_fts$pixel_x <- morphology_fts$X
morphology_fts$pixel_y <- morphology_fts$Y

## merge_feature may take some time, so, it is recommended to save it,
## and load it once the users rerun the STIE later on
save(cell_info, file=paste0(split_dir,"/cell_info.RData"))

Back to above

4. Single-cell level deconvolution in spatial transcriptomics

We take the mouse brain hippocampus as an example. The 10X CytAssist Spatial Gene Expression (V2 Chemistry) FFPE data is downloaded from the 10x Genomics public dataset: section 1 and section 2. The raw data 10X Visium FFPE has been preprocessed using spaceranger (>=2.0). The preprocessed data will be generated in the count/sample/outs directory. As a testing, we have saved the spot-level gene expression, nuclear morphological information, and the cell-type transcriptomic signature in the STIE package.

STIE.dir = system.file(package = "STIE")
nn = load( paste0(STIE.dir,"/data/MouseBrainHippocampus_10xV2ChemistryCytAssistFFPE_section1n2.RData") )
nn
#>  nn
#>  [1] "ST_expr" "spot_coordinates" "cell_contour" "morphology_fts" "cells_on_spot"                        # the data for the section1
#>  [2] "ST_expr_s2" "spot_coordinates_s2" "cell_contour_s2"     "morphology_fts_s2" "cells_on_spot_s2"     # the data for the section2
#>  [3] "Signature"                                                                                         # the transcriptomic signature

Alternatively, users can load the data from the spaceranger output using the following code:

library(Seurat)

### the location of spot coordinates and spot gene expression
spotfile <- "outs/spatial/tissue_positions_list.csv"
matrix_path <- "outs/filtered_feature_bc_matrix.h5"

spot_coordinates <- read.csv(file = spotfile, header = FALSE,
                             col.names=c("barcode","tissue","row","col","imagerow","imagecol"))
#### switch X and Y coordinates to match the direction of the image
spot_coordinates <- data.frame(spot_coordinates, pixel_x=spot_coordinates$imagecol, pixel_y=spot_coordinates$imagerow)

ST_expr <- as.data.frame(t(as.matrix(Read10X_h5(matrix_path))))

spot_coordinates <- subset(spot_coordinates, barcode%in%rownames(ST_expr) )
ST_expr <- ST_expr[match( as.character(spot_coordinates$barcode), rownames(ST_expr) ),]

head(spot_coordinates, 3)
#>                  barcode tissue row col imagerow imagecol  pixel_x  pixel_y
#>  1492 CTGGTGATCGCCGTAG-1      1  23  39    12468    21839 379.5887 216.7092
#>  1493 CACTCAATAAGCCGAA-1      1  23  41    12469    21474 373.2445 216.7265
#>  1494 GCTATGCGTGAACGGT-1      1  23  43    12469    21109 366.9004 216.7265

dim(ST_expr)
#>[1]   189 19465

Based on the cell spatial location and the spot coordinates, we can find the cells covered by each spot. Each row in the data frame cells_on_spot is a single cell, where you can find its nuclear spatial location, the values of nuclear morphological features, and the spot id covering that cell.

#### calculate spot radius based on fct = spot_radius / spot_center_distance
fct <- (55/2)/100 
spot_radius <- calculate_spot_radius(spot_coordinates, fct)

#### get cells within each spot based on cellular coordinates and spot coordinates
cells_on_spot <- get_cells_on_spot( cell_coordinates=morphology_fts, spot_coordinates, 2.5*spot_radius)
head(cells_on_spot, 3)

#> cell_id X.1  Area    Mean StdDev Mode Min Max        X        Y    XM
#>   24419   7 0.116 192.194 16.883  200  75 226 21760.58 12399.34 7.917
#>   24434  22 0.113 185.348 19.670  195  36 216 21869.53 12456.56 9.054
#>   24449  37 0.148 202.901 17.459  212 102 237 21737.52 12541.18 7.676
#>    YM Perim.    BX    BY Width Height Major Minor  Angle Circ. Feret
#> 4.156  1.221 7.719 3.958 0.396  0.385 0.410 0.360 36.679 0.977 0.418
#> 4.746  1.219 8.865 4.531 0.375  0.417 0.414 0.349 58.532 0.959 0.426
#> 5.629  1.383 7.469 5.385 0.427  0.469 0.459 0.410 70.572 0.973 0.471
#> IntDen Median   Skew  Kurt X.Area RawIntDen Ch FeretX FeretY FeretAngle
#> 22.252    196 -2.110 8.002    100    205071  1 745.14 412.86     45.000
#> 21.017    189 -2.136 8.560    100    193689  1 857.63 473.27     61.655
#> 30.030    205 -2.308 8.134    100    276757  1 724.64 559.74     61.770
#> MinFeret    AR Round Solidity Eccentricity  pixel_x  pixel_y      size
#>    0.364 1.139 0.878    0.988    0.4785711 378.2255 215.5158 -3.250686
#>    0.350 1.186 0.843    1.005    0.5379211 380.1194 216.5104 -3.265909
#>    0.407 1.119 0.893    1.005    0.4495678 377.8248 217.9812 -4.792136
#>      shape     angle               spot      dist
#> -1.1183285 1.3427841 CTGGTGATCGCCGTAG-1 1.8117090
#> -0.8075642 0.8167307 CTGGTGATCGCCGTAG-1 0.5667226
#> -1.1569367 0.6528751 CTGGTGATCGCCGTAG-1 2.1746697

Load mouse brain hippocampus scRNA-seq-derived cell type transcriptomic signatures. We run STIE deconvolution by setting known_signature=TRUE and known_cell_types=FALSE. We set lambda=0 and steps=30 for EM algorithm iteration.

#### Users could define the morphological features based on specific prior knowledge
#### In this example, we use "shape" as the morphological feature
features = c("shape")

#### run STIE
library("quadprog")
result_deconv <- STIE(ST_expr, Signature, cells_on_spot, features, 
                       lambda=0, steps=30, known_signature=TRUE, known_cell_types=FALSE)

result_deconv2 <- STIE(ST_expr_s2, Signature, cells_on_spot_s2, features, 
                       lambda=0, steps=30, known_signature=TRUE, known_cell_types=FALSE)

names(result_deconv)
#> [1] "lambda"         "mu"             "sigma"          "PE_on_spot"     "PM_on_cell"    
#> [6] "PME_uni_cell"   "cell_types"     "uni_cell_types" "Signature"      "cells_on_spot" 

names(result_deconv2)
#> [1] "lambda"         "mu"             "sigma"          "PE_on_spot"     "PM_on_cell"    
#> [6] "PME_uni_cell"   "cell_types"     "uni_cell_types" "Signature"      "cells_on_spot" 

Users can use the plot_sub_image() function to overlay the single cells along with their cell types onto the image (downloaded from section 1 and section 2). Some of the useful visualization parameters are listed as following:

  • x_scale a numeric value from 0 to 1, representing the scaling factor for resizing image
  • w a numeric value representing the width for image cropping
  • h a numeric value representing the height for image cropping
  • xoff a numeric value representing the x-axis offset for image cropping
  • yoff a numeric value representing the y-axis offset for image cropping
  • plot_cell a boolean value representing whether to plot the cell contour
  • color_use a vector of character values representing the colors to plot the cell contour
  • plot_spot a boolean value representing whether to plot the spot
  • spot_cols a vector of character values representing the colors of spots
  • fill_spot a boolean value representing whether to fill in the spot
  • axis_tick a numeric values representing the interval between two ticks
  • axis_col a character value representing the color of axis
library(EBImage)
library(magick)

#### read image
# setwd("/archive/SCCC/Hoshida_lab/shared/fastq/SpatialTranscriptome/10X_public_dataset/V2_MouseBrainCoronalSection1_FFPE/")
image_path = 'CytAssist_FFPE_Mouse_Brain_Rep1_tissue_image.tif'
im <- image_read(image_path)[1]

#### the STIE-obtained cell types
cell_types = result_deconv$cell_types

#### subset the cell contour 
contour = cell_contour[ match(names(cell_types), names(cell_contour)) ]

#### plot the single cells along with their cell types onto the image
colors = c("magenta", "blue", "green", "black", "orange")

# plot spot coordinates
plot_sub_image(im=im, w=9000, h=5000, xoff=15500, yoff=11500, x_scale=0.2, spot_coordinates=spot_coordinates,
               contour=contour, cell_types=cell_types, color_use=colors, plot_spot=T, plot_cell=F)
# plot single cells
plot_sub_image(im=im, w=9000, h=5000, xoff=15500, yoff=11500, x_scale=0.2, spot_coordinates=spot_coordinates,
               contour=contour, cell_types=cell_types, color_use=colors, plot_spot=F, plot_cell=T)

Plot the mouse brain section2

#### read image
# setwd("/archive/SCCC/Hoshida_lab/shared/fastq/SpatialTranscriptome/10X_public_dataset/V2_MouseBrainCoronalSection2_FFPE/")
image_path_s2 = 'CytAssist_FFPE_Mouse_Brain_Rep2_tissue_image.tif'
im_s2 <- image_read(image_path_s2)[1]

#### the STIE-obtained cell types
cell_types_s2 = result_deconv2$cell_types

#### subset the cell contour 
contour_s2 = cell_contour_s2[ match(names(cell_types_s2), names(cell_contour_s2)) ]

#### plot the single cells along with their cell types onto the image
colors = c("magenta", "blue", "green", "black", "orange")
# plot spot coordinates
plot_sub_image(im=im_s2, w=8200, h=4500, xoff=11000, yoff=7300, x_scale=0.2, spot_coordinates=spot_coordinates_s2, 
               contour=contour_s2, cell_types=cell_types_s2, color_use=colors, plot_spot=T, plot_cell=F)
# plot single cells
plot_sub_image(im=im_s2, w=8200, h=4500, xoff=11000, yoff=7300, x_scale=0.2, spot_coordinates=spot_coordinates, 
               contour=contour_s2, cell_types=cell_types_s2, color_use=colors, plot_spot=F, plot_cell=T)
Screen Shot 2023-02-18 at 11 48 13 PM

The cells comprise two parts: cells inside and outside the spot. The cells outside the spot are recovered based on the neighborhood information by enlarging the bona file spot size.

#### plot cells inside spots
cells_inside <- get_cells_on_spot( cell_coordinates=morphology_fts, spot_coordinates, spot_radius)
cell_types_inside <- cell_types[ match(cells_inside$cell_id, names(cell_types)) ]
contour_inside = cell_contour[ match(names(cell_types_inside), names(cell_contour)) ]

plot_sub_image(im=im, w=9000, h=5000, xoff=15500, yoff=11500, 
               x_scale=0.2, spot_coordinates=spot_coordinates, 
               contour=contour_inside, cell_types=cell_types_inside, 
               color_use=colors, plot_spot=T, plot_cell=T )

#### plot cells outside spots
cells_outside <- cells_on_spot[ !cells_on_spot$cell_id%in%cells_inside$cell_id,  ]
cell_types_outside <- cell_types[ match(cells_outside$cell_id, names(cell_types)) ]
contour_outside = cell_contour[ match(names(cell_types_outside), names(cell_contour)) ]

plot_sub_image(im=im, w=9000, h=5000, xoff=15500, yoff=11500, 
               x_scale=0.2, spot_coordinates=spot_coordinates, 
               contour=contour_outside, cell_types=cell_types_outside, 
               color_use=colors, plot_spot=T, plot_cell=T )
Screen Shot 2023-02-18 at 11 57 48 PM

We can investigate the nuclear morphological feature distribution for each cell type.

library(vioplot)
fs = c("Area", "Major", "Minor", "Width", "Height", "Feret", "Perim.", "Round", "Circ.")
par( mfrow=c(3,3) )
par(mar = c(2, 2, 2, 2))
lapply( fs, function(f) {
    x = split( cells_on_spot[,f], result_deconv$cell_types )
    vioplot(cells_on_spot[,f]~result_deconv$cell_types, col=colors, main=f, xlab=f, rectCol="darkgrey", las=3)
})
Screen Shot 2023-02-20 at 8 39 35 PM

Back to above

5. Single-cell level clustering in spatial transcriptomics

Given no cell type transcriptomic signature, STIE can perform cell type clustering at the single-cell level, and meanwhile, estimate the gene expression signature for clusters. The initial values of clusters are first given using the spot-level clustering, e.g., K-means, Louvain clustering, or SpaGCN, the cells within the spot are assigned the same initial cluster, and the initial value of cluster signature was set to be the average gene expression of spots belonging to the cluster. In each iteration, the cluster signature was re-estimated in the M-step, and the cluster of each single cell was re-assigned in the E-step. In the following example, we take the spot-level cluster at k=5 as the initial value and run STIE with by setting known_signature and known_cell_types as FALSE.

#### choose Kmeans (k=5) on spot-level gene expression
pc = prcomp(ST_expr)$x[,1:10]
set.seed(1234)
cluster = kmeans(pc,5)$cluster
cluster = data.frame( Barcode=names(cluster), Cluster=cluster )
cluster = cluster[ match( as.character(spot_coordinates$barcode), as.character(cluster$Barcode)), ]

head(cluster)
#>            Barcode Cluster
#> CTGGTGATCGCCGTAG-1       3
#> CACTCAATAAGCCGAA-1       3
#> GCTATGCGTGAACGGT-1       3
#> CAGTTGCTCACGTGTC-1       3
#> TCAGTATGTAGGACAA-1       3
#> ACCAAGTGATGGTGAG-1       3
#### plot the Kmeans clustering on the spot
colors2 = c("green", "black", "magenta", "orange", "blue")
spot_cols = colors2[ cluster$Cluster ]
plot_sub_image(im=im, w=9000, h=5000, xoff=15500, yoff=11500, 
               x_scale=0.2, spot_coordinates=spot_coordinates, 
               plot_spot=T, plot_cell=F, spot_cols=spot_cols, fill_spot=T  )
#### take the cluster average gene expression as the initial value of the cluster signature
ST_expr_ini = ST_expr[ match(as.character(cluster[,1]),rownames(ST_expr)), ]
Signature_ini = t(apply(ST_expr_ini, 2, function(x) tapply(x,cluster[,2],mean) ))

#### run STIE using "Signature_ini" as an initial value and iteratively refine "Signature" 
#### by setting both "known_signature=FALSE" and "known_cell_types=FALSE"
result_cluster = STIE(ST_expr, Signature_ini, cells_on_spot, features, lambda=0, steps=30, 
              known_signature=FALSE, known_cell_types=FALSE)
#### the STIE-obtained cell types
cell_types = result_cluster$cell_types

#### subset the cell contour 
contour2 = cell_contour[ match(names(cell_types), names(cell_contour)) ]

#### plot the single cells along with their cell types onto the image
plot_sub_image(im=im, w=9000, h=5000, xoff=15500, yoff=11500, 
               x_scale=0.2, spot_coordinates=spot_coordinates, 
               contour=contour2, cell_types=cell_types, color_use=colors2, 
               plot_spot=F, plot_cell=T)
Screen Shot 2023-02-19 at 12 06 49 AM

Back to above

6. Spatially resolved cell-cell interaction

Given the STIE-obtained single cells with spatial information retained, we further investigated the spatially resolved cell-cell interaction on the tissue. We adapted CellChat, a toolkit and a database of interactions among ligands, receptors, and their cofactors, to quantitatively infer the cell-cell interaction, and meanwhile, take into account the cell spatial information.

We first calculate the distance between cells. Here, since the cells were detected very sparsely, we use a relatively large cutoff, 20um to incorporate more potential cell-cell interactions. Users can specify other cutoffs depending on the specific biological mechanism.

#### get cells within each spot based on cellular coordinates and spot coordinates
ref_1um = spot_radius*2/55

#### calculate cell-cell distance
STIE_result = result_deconv
cell_dist = calculate_cell_dist(cells_on_spot = STIE_result$cells_on_spot, 
                                dist_type="boundary", 
                                dist_cutoff=20*ref_1um,
                                axis="Major" )

#### convert the distance into the unit of nm
cell_dist$d_nm = cell_dist$d/ref_1um

We adapted the cellchat to fit the spatial transcriptomics data to determine the spatially resolved cell type interaction. Here, we set database="mouse", to specify both the mouse STRINGdb protein-protein interaction database and the mouse CellChat ligand-receptor database.

library(CellChat)

#### run cellchat on the resolved single-cell level spatial transcriptome data
cellchat <- get_STcellchat(STIE_result, ST_expr, database="mouse", db_category=NULL, max_reps=NULL )

#### show the top sending and receiving pathways, through which the cell types communicate
ht1 <- netAnalysis_signalingRole_heatmap(cellchat, color.use=colors, pattern = "outgoing")
ht2 <- netAnalysis_signalingRole_heatmap(cellchat, color.use=colors, pattern = "incoming")
ht1[1:30,] + ht2[1:30,]
Screen Shot 2023-02-20 at 7 14 29 PM
## draw the Circos plot of static cell-cell interaction weights/strength 
## and spatially resolved cell-cell interaction weights/strength
netVisual_STcellchat(cellchat, cell_dist, STIE_result$cell_types, color_use=colors)
Screen Shot 2023-02-20 at 7 14 28 PM

To simplify the interpretation of complex intercellular interactions, we used non-negative matrix factorization (NMF) implemented by CellChat to identify the global interaction patterns beyond only exploring individual pathways. Both Cophenetic and Silhouette metrics begin to drop suddenly when the number of sending and receiving patterns are 5, suggesting the suitable number of patterns.

library(NMF)
library(ggalluvial)
## plot Cophenetic and Silhouette metrics to select the suitable number of patterns
selectK(cellchat, pattern = "outgoing")
selectK(cellchat, pattern = "incoming")
Screen Shot 2023-02-20 at 7 18 59 PM

The heatmap of the contribution of pathways (up) and cell types (bottom) to the pattern of sender and receiver.

## The sending patterns at k=5
plot.new()
cellchat <- identifyCommunicationPatterns(cellchat, pattern = "outgoing", k = 5, color.use=colors)
Screen Shot 2023-02-20 at 7 19 01 PM
## The receiving patterns at k=5
plot.new()
cellchat <- identifyCommunicationPatterns(cellchat, pattern = "incoming", k = 5, color.use=colors)
Screen Shot 2023-02-20 at 7 19 01 PM 1

We plot the global sending and receiving patterns.

## plot the sending patterns, where patterns c(5,1,4,3,2) correspond 
## to the cell type, CA1, CA2, CA3, DG and Glia, respectively
for(i in 1:c(5,1,4,3,2))
{
    plot_STcellchat(cellchat, cell_dist, w=9000, h=5000, xoff=15500, yoff=11500, 
                    im, x_scale=0.1, lwd=20, plot_object=paste0("Pattern ",i), 
                    direction="outgoing", color_use=colors )
}

## plot the receiving patterns, where patterns c(4,1,5,3,2) correspond 
## to the cell type, CA1, CA2, CA3, DG and Glia, respectively
for(i in 1:c(4,1,5,3,2))
{
    plot_STcellchat(cellchat, cell_dist, w=9000, h=5000, xoff=15500, yoff=11500, 
                    im, x_scale=0.1, lwd=20, plot_object=paste0("Pattern ",i), 
                    direction="incoming", color_use=colors )
}
Screen Shot 2023-02-20 at 7 04 49 PM
## We plot the sending and receiving signals for specific pathways of interest. 
pathway.show <- c("NRXN", "CADM", "PTN")

## sending pathway signal
for(i in 1:length(pathway.show))
{
    plot_STcellchat(cellchat, cell_dist, 
                    w=9000, h=5000, xoff=15500, yoff=11500,
                    im, x_scale=0.1, lwd=20,
                    plot_object=pathway.show[i], 
                    direction="outgoing", color_use=colors)

## receiving pathway signal
for(i in 1:length(pathway.show))
{    
    plot_STcellchat(cellchat, cell_dist, 
                    w=9000, h=5000, xoff=15500, yoff=11500,
                    im, x_scale=0.1, lwd=20,
                    plot_object=pathway.show[i], 
                    direction="outgoing", color_use=colors)
}
Screen Shot 2023-02-20 at 7 04 51 PM

7 Bona fide spot size 𝝲

The hyperparameter 𝝲 aims to model the actual area covered by the spot, which may differ from the reported spot size. The difference might result from the sample leaking from the spot area to the surroundings. We select 𝝲 as the area giving the best concordance between the prediction and true spatial gene expression, which is measured by the RMSE. We evaluated 𝝲 on the 10X Visium V2 Chemistry CytAssist mouse brain hippocampus spatial transcriptomics and observed similar patterns across different λ, all of which reach the minimum around γ=2.5x reported spot size (55 μm in diameter).

ratios = seq(0.5,5,0.5)               # range of 𝝲
lambdas=c(0,1e1,1e2,1e3,1e4,1e5,1e6)  # range of λ

scores_las = list()
for(i in 1:length(lambdas)) 
{
    results = list()
    for(j in 1:length(ratios))
    {
        lai = lambdas[i]
        spotj = ratios[j]*spot_radius
        # based on the bona fide spot size, we recalculate the cells on each spot
        cells_on_spotj <- get_cells_on_spot( cell_coordinates=morphology_fts, spot_coordinates=spot_coordinates, spot_radius=spotj)
        # run STIE deconvolution based on the new cells_on_spot
        results[[j]]  <- STIE(ST_expr, Signature, cells_on_spotj, features=c('shape'), 
                              lambda=lai, steps=30, known_signature=TRUE, known_cell_types=FALSE )
    }
    # recalculate the scores for each result
    scores_las[[i]] = lapply( results, function(x) get_summary(x, ST_expr) )
    names(scores_las[[i]]) = names(results) = ratios
}
mses = lapply(scores_las, function(scores) lapply(scores, function(s) sqrt(s$mse) ))

############################################################
##### Visualization
############################################################
par(mfrow=c(1,1))
par(mar=c(3,3,3,3))
cols = get_my_colors(length(scores_las), mode=2)
barx = as.numeric(names(scores_las[[1]]))*2
errplot(mses[[1]],barx=barx, col=cols[1], xlab='bona file spot size (x55um)', ylab="RMSE")
lapply( 2:length(mses), function(i) errplot(mses[[i]],barx=barx+(i-1)*0.1, line=F, col=cols[i]) )
legend( "topright", legend=lambdas, col=cols, pch=16 )

Screen Shot 2023-02-21 at 12 04 56 AM

Back to above

8 Grid search for 𝝺 and nuclear morphological features

The hyperparameter λ balances the information from gene expression and morphological features.

  • Given the predefined morphological features, we select λ by evaluating the balance between two criteria: RMSE of the spatial gene expression fitting and log-likelihood of the morphological feature fitting.
  • To select the morphological features, under a predefined λ, we rank the nuclear morphological features based on the RMSE by running STIE on each feature individually. Next, based on their ranking, we used a greedy strategy to gradually add more features to the model. Like the selection of λ, the best morphological features are selected by evaluating the gene expression and morphological fittings simultaneously.

To select the best combination of morphological features and λ, we investigated the morphological features over different λ.

In the real datasets, we extracted multiple morphological features for the nuclei. These features are found to be highly correlated, which is consistent with their definition and mathematical calculation. We focused on two large categories: size (Area, Major, Minor, Width, Height, Feret, and Perimeter) and shape (Round and Circular). To reduce the redundancy and improve efficiency, we performed PCA for each category and took the 1st PC as the surrogate of each category.

We select the best combination of the morphological feature and λ simultaneously via the grid search, which is implemented using the R function STIE_search(). In the mouse brain hippocampus deconvolution and clustering, the feature ‘shape’ was ranked before ‘size’, and ‘shape’ gives the lower RMSE and higher morphological likelihood at small λ, so we used ‘shape’ as the morphological feature and λ=0 (red triangle).

# searching range 
lambdas=c(0,1e1,1e2,1e3,1e4,1e5,1e6)

# set RMSE as the criterion to rank the morphological features
# grid search for STIE deconvolution
paths = STIE_search(ST_expr, Signature, cells_on_spot, 
                    steps=30, known_signature=TRUE, known_cell_types=FALSE, 
                    lambdas=lambdas, criterion = "rmse" )    

names(paths)
#> [1] "0" "10" "100" "1000" "10000" "1e+05" "1e+06"
names(paths[["0"]])
#> [1] "ordered_features" "results" "scores" "results_ord1" "scores_ord1" "values" "logLik" "logLikMorph" "criterion"          
paths[["0"]]$ordered_features
#> [1] "shape" "size"
Screen Shot 2023-02-21 at 12 12 44 AM

Back to above