SeuratData

SeuratData is a mechanism for distributing datasets in the form of Seurat objects using R's internal package and data management systems. It represents an easy way for users to get access to datasets that are used in the Seurat vignettes.

Installation

Installation of SeuratData can be accomplished through devtools

devtools::install_github('satijalab/seurat-data')

Getting Started

When loading SeuratData, a list of all available datasets will be displayed (this is similar to other metapackages like tidyverse along with the version of Seurat used to create each dataset). This message can be suppressed with suppressPackageStartupMessages

> library(SeuratData)
── Installed datasets ───────────────────────────────────────────────────────────── SeuratData v0.1.0 ──
✔ cbmc   3.0.0                                           ✔ panc8  3.0.0
✔ ifnb   3.0.0                                           ✔ pbmc3k 3.0.0

───────────────────────────────────────────────── Key ──────────────────────────────────────────────────
✔ Dataset loaded successfully

To see a manifest of all available datasets, use AvailableData; this manifest will update as new datasets are uploaded to our data repository.

> AvailableData()
                     Dataset Version                                                        Summary species            system ncells                                                            tech         notes Installed InstalledVersion
cbmc.SeuratData         cbmc   3.0.0                   scRNAseq and 13-antibody sequencing of CBMCs   human CBMC (cord blood)   8617                                                        CITE-seq          <NA>      TRUE            3.0.0
hcabm40k.SeuratData hcabm40k   3.0.0 40,000 Cells From the Human Cell Atlas ICA Bone Marrow Dataset   human       bone marrow  40000                                                          10x v2          <NA>     FALSE            3.0.0
ifnb.SeuratData         ifnb   3.0.0                              IFNB-Stimulated and Control PBMCs   human              PBMC  13999                                                          10x v1          <NA>      TRUE            3.0.0
panc8.SeuratData       panc8   3.0.0               Eight Pancreas Datasets Across Five Technologies   human Pancreatic Islets  14892                SMARTSeq2, Fluidigm C1, CelSeq, CelSeq2, inDrops          <NA>      TRUE            3.0.0
pbmc3k.SeuratData     pbmc3k   3.0.0                                     3k PBMCs from 10X Genomics   human              PBMC   2700                                                          10x v1          <NA>      TRUE            3.0.0
pbmcsca.SeuratData   pbmcsca   3.0.0           Broad Institute PBMC Systematic Comparative Analysis   human              PBMC  31021 10x v2, 10x v3, SMARTSeq2, Seq-Well, inDrops, Drop-seq, CelSeq2 HCA benchmark     FALSE            3.0.0

Installation of datasets can be done with InstallData; this function will accept either a dataset name (eg. pbmc3k) or the corresponding package name (eg. pbmc3k.SeuratData). InstallData will automatically attach the installed dataset package so one can immediately load and use the dataset.

> InstallData("pbmc3k")

Loading a dataset is done using the data function

> data("pbmc3k")
> pbmc3k
An object of class Seurat
13714 features across 2700 samples within 1 assay
Active assay: RNA (13714 features)

Dataset documentation and information

All datasets provided have help pages built for them. These pages are accessed using the standard help function

> ?pbmc3k
> ?ifnb

A full command list for the steps taken to generate each dataset is present in the examples section of these help pages.

Packages will also often have citation information bundled with the package. Citation information can be accessed by passing the package name, not the dataset name, to the citation function

> citation('cbmc.SeuratData')

To cite the CBMC dataset, please use:

  Stoeckius et al. Simultaneous epitope and transcriptome measurement in
  single cells. Nature Methods (2017)

A BibTeX entry for LaTeX users is

  @Article{,
    author = {Marlon Stoeckius and Christoph Hafemeister and William Stephenson and Brian Houck-Loomis and Pratip K Chattopadhyay and Harold Swerdlow and Rahul Satija and Peter Smibert},
    title = {Simultaneous epitope and transcriptome measurement in single cells},
    journal = {Nature Methods},
    year = {2017},
    doi = {10.1038/nmeth.4380},
    url = {https://www.nature.com/articles/nmeth.4380},
  }

Rationale and Implementation

We created SeuratData in order to distribute datasets for Seurat vignettes in as painless and reproducible a way as possible. We also wanted to give users the flexibility to selectively install and load datasets of interest, to minimize disk storage and memory use.

To accomplish this, we opted to distribute datasets through individual R packages. Under the hood, SeuratData uses and extends standard R functions, such as install.packages for dataset installation, available.packages for dataset listing, and data for dataset loading.

SeuratData therefore serves as a more specific package manager (similar to a metapackage) for R. We provide wrappers around R's package management functions, extend them to provide relevant metadata about each dataset, and set default settings (for example, the repository where data is stored) to facilitate easy installation.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.github		.github
R		R
exec		exec
inst/extdata		inst/extdata
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
seurat-data.Rproj		seurat-data.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

R

R

exec

exec

inst/extdata

inst/extdata

man

man

.Rbuildignore

.Rbuildignore

.gitignore

.gitignore

DESCRIPTION

DESCRIPTION

LICENSE

LICENSE

NAMESPACE

NAMESPACE

README.md

README.md

seurat-data.Rproj

seurat-data.Rproj

Repository files navigation

SeuratData

Installation

Getting Started

Dataset documentation and information

Rationale and Implementation

About

Releases

Contributors 9

Languages

License

satijalab/seurat-data

Folders and files

Latest commit

History

Repository files navigation

SeuratData

Installation

Getting Started

Dataset documentation and information

Rationale and Implementation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages