easy-data

Easy access to benchmark datasets. Add your benchmarking desiderata and your datasets below.

benchmark for what?

A discussion of the problems for which benchmark datasets would allow for experimentation.

cell type annotation and reannotation at various levels of ontological depth
building and validating cell type classifiers
manifold alignment and batch-effect-aware analyses
assessing the variability in gene expression of cell types present in many organs
measuring sex differences in gene expression
measuring the variability in biological claims (like which genes are differentially expressed between populations) to be expected between different studies of the same cell types

datasets

To add a dataset, just create a section with a description and links to download it.

How easy can you make it for someone to get started?

`tabula muris`

Tabula Muris contains about 100,000 cells from 20 organs and tissues in mouse. The study is sex-balanced, with four male and four female mice. The organs included are skin, fat, mammary gland, heart, bladder, brain, thymus, spleen, kidney, limb muscle, tongue, marrow, trachea, pancreas, lung, large intestine, and liver. Many of these organs were processed using two methods: SMART-seq2 on FACS-sorted cells and microfluidic droplets from 10X Genomics.

Below are instructions for getting four files: metadata (including annotations) and count data for each dataset.

metadata

Version-controlled metadata are available on github.

TM_droplet_metadata.csv

TM_facs_metadata.csv

count files for R

You can download complete count files as sparse matrices in .rds format for easy loading into R. Unzip TabulaMuris.zip. Load:

tm.droplet.matrix = readRDS(here("data", "TM_droplet_mat.rds"))
tm.droplet.metadata = read_csv(here("data", "TM_droplet_metadata.csv"))

count files for Python

You can download complete count files as sparse matrices in AnnData-formatted h5ad files for use in Python here. You can load them using the Scanpy library:

import pandas
import scanpy

tm_facs_metadata = pd.read_csv('data/TM_facs_metadata.csv')
tm_facs_data = scanpy.anndata.read_h5ad('data/TM_facs_mat.h5ad')

CSV and MTX files

The original data release is on FigShare.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

Repository files navigation

easy-data

benchmark for what?

datasets

`tabula muris`

metadata

count files for R

count files for Python

CSV and MTX files

About

Releases

Packages

License

gbader/easy-data

Folders and files

Latest commit

History

LICENSE

LICENSE

README.md

README.md

Repository files navigation

easy-data

benchmark for what?

datasets

tabula muris

metadata

count files for R

count files for Python

CSV and MTX files

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

`tabula muris`

Packages