README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    fig.path = "man/figures/README-",
    out.width = "100%", 
    dpi = 70
)
```

## Overview

```{r, eval=T, include=F}
start.time <- Sys.time()
```

BANKSY is a method for clustering spatial omics data by augmenting the
features of each cell with both an average of the features of its spatial 
neighbors along with neighborhood feature gradients. By incorporating 
neighborhood information for clustering, BANKSY is able to

- improve cell-type assignment in noisy data
- distinguish subtly different cell-types stratified by microenvironment
- identify spatial domains sharing the same microenvironment

BANKSY is applicable to a wide array of spatial technologies (e.g. 10x Visium, 
Slide-seq, MERFISH, CosMX, CODEX) and scales well to large datasets. For more 
details, check out:

- the [paper](https://www.nature.com/articles/s41588-024-01664-3),
- the [peer review file](https://static-content.springer.com/esm/art%3A10.1038%2Fs41588-024-01664-3/MediaObjects/41588_2024_1664_MOESM3_ESM.pdf),
- a [tweetorial](https://x.com/shyam_lab/status/1762648072360792479?s=20) on BANKSY,
- a set of [vignettes](https://prabhakarlab.github.io/Banksy) showing basic 
  usage,
- usage compatibility with Seurat ([here](https://github.com/satijalab/seurat-wrappers/blob/master/docs/banksy.md) and [here](https://satijalab.org/seurat/articles/visiumhd_analysis_vignette#identifying-spatially-defined-tissue-domains)),
- a [Python version](https://github.com/prabhakarlab/Banksy_py) of this package,
- a [Zenodo archive](https://zenodo.org/records/10258795) containing scripts to 
  reproduce the analyses in the paper, and the corresponding
  [GitHub Pages](https://github.com/jleechung/banksy-zenodo) 
  (and [here](https://github.com/prabhakarlab/Banksy_py/tree/Banksy_manuscript) for analyses done in Python). 

## Installation

The *Banksy* package can be installed via Bioconductor. This currently requires
R `>= 4.4.0`. 

```{r, eval=F}
BiocManager::install('Banksy')
```

To install directly from GitHub instead, use

```{r, eval=F}
remotes::install_github("prabhakarlab/Banksy")
```

To use the legacy version of *Banksy* utilising the `BanksyObject` class, use

```{r, eval=F}
remotes::install_github("prabhakarlab/Banksy@legacy")
```

*Banksy* is also interoperable with [*Seurat*](https://satijalab.org/seurat/) 
via [*SeuratWrappers*](https://github.com/satijalab/seurat-wrappers). 
Documentation on how to run BANKSY on Seurat objects can be found [here](https://github.com/satijalab/seurat-wrappers/blob/master/docs/banksy.md). 
For installation of *SeuratWrappers* with BANKSY version `>= 0.1.6`, run

```{r, eval=F}
remotes::install_github('satijalab/seurat-wrappers')
```

## Quick start

Load *BANKSY*. We'll also load *SpatialExperiment* and *SummarizedExperiment* 
for containing and manipulating the data, *scuttle* for normalization 
and quality control, and *scater*, *ggplot2* and *cowplot* for visualisation.

```{r, eval=T, warning=F, message=F}
library(Banksy)

library(SummarizedExperiment)
library(SpatialExperiment)
library(scuttle)

library(scater)
library(cowplot)
library(ggplot2)
```

Here, we'll run *BANKSY* on mouse hippocampus data. 

```{r, eval=T}
data(hippocampus)
gcm <- hippocampus$expression
locs <- as.matrix(hippocampus$locations)
```

Initialize a SpatialExperiment object and perform basic quality control and 
normalization. 

```{r, eval=T, message=F}
se <- SpatialExperiment(assay = list(counts = gcm), spatialCoords = locs)

# QC based on total counts
qcstats <- perCellQCMetrics(se)
thres <- quantile(qcstats$total, c(0.05, 0.98))
keep <- (qcstats$total > thres[1]) & (qcstats$total < thres[2])
se <- se[, keep]

# Normalization to mean library size
se <- computeLibraryFactors(se)
aname <- "normcounts"
assay(se, aname) <- normalizeCounts(se, log = FALSE)
```

Compute the neighborhood matrices for *BANKSY*. Setting `compute_agf=TRUE` 
computes both the weighted neighborhood mean ($\mathcal{M}$) and the azimuthal 
Gabor filter ($\mathcal{G}$). The number of spatial neighbors used to compute 
$\mathcal{M}$ and $\mathcal{G}$ are `k_geom[1]=15` and `k_geom[2]=30` 
respectively. We run *BANKSY* at `lambda=0` corresponding to non-spatial 
clustering, and `lambda=0.2` corresponding to  *BANKSY* for cell-typing.

```{r, eval=T}
lambda <- c(0, 0.2)
k_geom <- c(15, 30)

se <- Banksy::computeBanksy(se, assay_name = aname, compute_agf = TRUE, k_geom = k_geom)
```

Next, run PCA on the BANKSY matrix and perform clustering. Setting 
`use_agf=TRUE` uses both $\mathcal{M}$ and $\mathcal{G}$ to construct the 
BANKSY matrix.

```{r, eval=T}
set.seed(1000)
se <- Banksy::runBanksyPCA(se, use_agf = TRUE, lambda = lambda)
se <- Banksy::runBanksyUMAP(se, use_agf = TRUE, lambda = lambda)
se <- Banksy::clusterBanksy(se, use_agf = TRUE, lambda = lambda, resolution = 1.2)
```

Different clustering runs can be relabeled to minimise their differences with 
`connectClusters`:

```{r, eval=T}
se <- Banksy::connectClusters(se)
```

Visualise the clustering output for non-spatial clustering (`lambda=0`) and
BANKSY clustering (`lambda=0.2`).

```{r, eval=T, fig.height=5, fig.width=14}
cnames <- colnames(colData(se))
cnames <- cnames[grep("^clust", cnames)]
colData(se) <- cbind(colData(se), spatialCoords(se))

plot_nsp <- plotColData(se,
    x = "sdimx", y = "sdimy",
    point_size = 0.6, colour_by = cnames[1]
)
plot_bank <- plotColData(se,
    x = "sdimx", y = "sdimy",
    point_size = 0.6, colour_by = cnames[2]
)


plot_grid(plot_nsp + coord_equal(), plot_bank + coord_equal(), ncol = 2)
```

For clarity, we can visualise each of the clusters separately:

```{r, eval=T, fig.height=8, fig.width=18}
plot_grid(
    plot_nsp + facet_wrap(~colour_by),
    plot_bank + facet_wrap(~colour_by),
    ncol = 2
)
```

Visualize UMAPs of the non-spatial and BANKSY embedding:

```{r, eval=T, fig.height=5, fig.width=14}
rdnames <- reducedDimNames(se)

umap_nsp <- plotReducedDim(se,
    dimred = grep("UMAP.*lam0$", rdnames, value = TRUE),
    colour_by = cnames[1]
)
umap_bank <- plotReducedDim(se,
    dimred = grep("UMAP.*lam0.2$", rdnames, value = TRUE),
    colour_by = cnames[2]
)
plot_grid(
    umap_nsp,
    umap_bank,
    ncol = 2
)
```

<details>
    <summary>Runtime for analysis</summary>

```{r, eval=T, echo=FALSE}
Sys.time() - start.time
```

</details>

<details>
    <summary>Session information</summary>

```{r, sess}
sessionInfo()
```

</details>