-
Notifications
You must be signed in to change notification settings - Fork 11
/
README.Rmd
222 lines (170 loc) · 6.45 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
dpi = 70
)
```
## Overview
```{r, eval=T, include=F}
start.time <- Sys.time()
```
BANKSY is a method for clustering spatial omics data by augmenting the
features of each cell with both an average of the features of its spatial
neighbors along with neighborhood feature gradients. By incorporating
neighborhood information for clustering, BANKSY is able to
- improve cell-type assignment in noisy data
- distinguish subtly different cell-types stratified by microenvironment
- identify spatial domains sharing the same microenvironment
BANKSY is applicable to a wide array of spatial technologies (e.g. 10x Visium,
Slide-seq, MERFISH, CosMX, CODEX) and scales well to large datasets. For more
details, check out:
- the [paper](https://www.nature.com/articles/s41588-024-01664-3),
- the [peer review file](https://static-content.springer.com/esm/art%3A10.1038%2Fs41588-024-01664-3/MediaObjects/41588_2024_1664_MOESM3_ESM.pdf),
- a [tweetorial](https://x.com/shyam_lab/status/1762648072360792479?s=20) on BANKSY,
- a set of [vignettes](https://prabhakarlab.github.io/Banksy) showing basic
usage,
- usage compatibility with Seurat ([here](https://github.com/satijalab/seurat-wrappers/blob/master/docs/banksy.md) and [here](https://satijalab.org/seurat/articles/visiumhd_analysis_vignette#identifying-spatially-defined-tissue-domains)),
- a [Python version](https://github.com/prabhakarlab/Banksy_py) of this package,
- a [Zenodo archive](https://zenodo.org/records/10258795) containing scripts to
reproduce the analyses in the paper, and the corresponding
[GitHub Pages](https://github.com/jleechung/banksy-zenodo)
(and [here](https://github.com/prabhakarlab/Banksy_py/tree/Banksy_manuscript) for analyses done in Python).
## Installation
The *Banksy* package can be installed via Bioconductor. This currently requires
R `>= 4.4.0`.
```{r, eval=F}
BiocManager::install('Banksy')
```
To install directly from GitHub instead, use
```{r, eval=F}
remotes::install_github("prabhakarlab/Banksy")
```
To use the legacy version of *Banksy* utilising the `BanksyObject` class, use
```{r, eval=F}
remotes::install_github("prabhakarlab/Banksy@legacy")
```
*Banksy* is also interoperable with [*Seurat*](https://satijalab.org/seurat/)
via [*SeuratWrappers*](https://github.com/satijalab/seurat-wrappers).
Documentation on how to run BANKSY on Seurat objects can be found [here](https://github.com/satijalab/seurat-wrappers/blob/master/docs/banksy.md).
For installation of *SeuratWrappers* with BANKSY version `>= 0.1.6`, run
```{r, eval=F}
remotes::install_github('satijalab/seurat-wrappers')
```
## Quick start
Load *BANKSY*. We'll also load *SpatialExperiment* and *SummarizedExperiment*
for containing and manipulating the data, *scuttle* for normalization
and quality control, and *scater*, *ggplot2* and *cowplot* for visualisation.
```{r, eval=T, warning=F, message=F}
library(Banksy)
library(SummarizedExperiment)
library(SpatialExperiment)
library(scuttle)
library(scater)
library(cowplot)
library(ggplot2)
```
Here, we'll run *BANKSY* on mouse hippocampus data.
```{r, eval=T}
data(hippocampus)
gcm <- hippocampus$expression
locs <- as.matrix(hippocampus$locations)
```
Initialize a SpatialExperiment object and perform basic quality control and
normalization.
```{r, eval=T, message=F}
se <- SpatialExperiment(assay = list(counts = gcm), spatialCoords = locs)
# QC based on total counts
qcstats <- perCellQCMetrics(se)
thres <- quantile(qcstats$total, c(0.05, 0.98))
keep <- (qcstats$total > thres[1]) & (qcstats$total < thres[2])
se <- se[, keep]
# Normalization to mean library size
se <- computeLibraryFactors(se)
aname <- "normcounts"
assay(se, aname) <- normalizeCounts(se, log = FALSE)
```
Compute the neighborhood matrices for *BANKSY*. Setting `compute_agf=TRUE`
computes both the weighted neighborhood mean ($\mathcal{M}$) and the azimuthal
Gabor filter ($\mathcal{G}$). The number of spatial neighbors used to compute
$\mathcal{M}$ and $\mathcal{G}$ are `k_geom[1]=15` and `k_geom[2]=30`
respectively. We run *BANKSY* at `lambda=0` corresponding to non-spatial
clustering, and `lambda=0.2` corresponding to *BANKSY* for cell-typing.
```{r, eval=T}
lambda <- c(0, 0.2)
k_geom <- c(15, 30)
se <- Banksy::computeBanksy(se, assay_name = aname, compute_agf = TRUE, k_geom = k_geom)
```
Next, run PCA on the BANKSY matrix and perform clustering. Setting
`use_agf=TRUE` uses both $\mathcal{M}$ and $\mathcal{G}$ to construct the
BANKSY matrix.
```{r, eval=T}
set.seed(1000)
se <- Banksy::runBanksyPCA(se, use_agf = TRUE, lambda = lambda)
se <- Banksy::runBanksyUMAP(se, use_agf = TRUE, lambda = lambda)
se <- Banksy::clusterBanksy(se, use_agf = TRUE, lambda = lambda, resolution = 1.2)
```
Different clustering runs can be relabeled to minimise their differences with
`connectClusters`:
```{r, eval=T}
se <- Banksy::connectClusters(se)
```
Visualise the clustering output for non-spatial clustering (`lambda=0`) and
BANKSY clustering (`lambda=0.2`).
```{r, eval=T, fig.height=5, fig.width=14}
cnames <- colnames(colData(se))
cnames <- cnames[grep("^clust", cnames)]
colData(se) <- cbind(colData(se), spatialCoords(se))
plot_nsp <- plotColData(se,
x = "sdimx", y = "sdimy",
point_size = 0.6, colour_by = cnames[1]
)
plot_bank <- plotColData(se,
x = "sdimx", y = "sdimy",
point_size = 0.6, colour_by = cnames[2]
)
plot_grid(plot_nsp + coord_equal(), plot_bank + coord_equal(), ncol = 2)
```
For clarity, we can visualise each of the clusters separately:
```{r, eval=T, fig.height=8, fig.width=18}
plot_grid(
plot_nsp + facet_wrap(~colour_by),
plot_bank + facet_wrap(~colour_by),
ncol = 2
)
```
Visualize UMAPs of the non-spatial and BANKSY embedding:
```{r, eval=T, fig.height=5, fig.width=14}
rdnames <- reducedDimNames(se)
umap_nsp <- plotReducedDim(se,
dimred = grep("UMAP.*lam0$", rdnames, value = TRUE),
colour_by = cnames[1]
)
umap_bank <- plotReducedDim(se,
dimred = grep("UMAP.*lam0.2$", rdnames, value = TRUE),
colour_by = cnames[2]
)
plot_grid(
umap_nsp,
umap_bank,
ncol = 2
)
```
<details>
<summary>Runtime for analysis</summary>
```{r, eval=T, echo=FALSE}
Sys.time() - start.time
```
</details>
<details>
<summary>Session information</summary>
```{r, sess}
sessionInfo()
```
</details>