Skip to content

sansomlab/tenx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Please note that these pipelines are no longer in development or use by our group. We recommend to use instead the pipelines in our "cellhub" repository. Please see: https://cellhub.readthedocs.io/en/latest/ for more information.

tenx

A collection of Python3 pipelines that call R and Python3 scripts for the analysis of data generated with the 10x Genomics platform. The pipelines are based on 10x's Cell Ranger pipeline and DropEst for mapping and quantitation.

Downstream analysis currently relies on both the R Seurat library and Python Scanpy package, and makes use of many excellent tools from the community including Scran, DropletUtils, SingleR, Clustree, Destiny (for diffusion maps), PHATE, PAGA and Scvelo(for velocity analysis). Automatic export of UCSC cell browser instances is also supported.

For geneset over representation analysis the pipelines use a bespoke R package called gsfisher, which can also be used interactively to analyse single-cell data.

update 5/11/2020

Please note that "pipeline_seurat.py" has been renamed to "pipeline_scxl.py" to better reflect the diverse toolchain that it has evolved to use over the last few years and also its recent refactoring to support analysis of large scale datasets. By default pipeline_scxl.py uses the HNSW algorithm to compute nearest neighbors and performs clustering with scanpy before marker analysis with Seurat. The pipeline has been used to process datasets with > 800,000 cells.

Examples

Interferon beta stimulated PBMCs

This example shows how the Seurat stimulated and control vignette can be reproduced by the pipeline.

Pancreatic embryogenesis

This is the scvelo Bastidas-Ponce et al. dataset

Microwell-seq Mouse Atlas (240k cells)

Here pipeline_scxl.py was run using the seurat object provided by the Seurat authors in their Guided Clustering of the Microwell-seq Mouse Cell Atlas vignette.

installation and dependencies

  1. Installation
  2. Dependencies

typical workflow

  1. Perform mapping, quantification, aggregation and down-sampling using: pipelines/pipeline_cellranger.py

    • Can be run either from cellranger mkfastq or cellranger aggr outputs.
    • Samples are mapped and quantitated with cellranger count.
    • Aggregation of sample matrices is performed with with cellranger aggr.
    • Cells with barcodes shared between cells can be removed (within sequencing batch) to mitigate index hopping.
    • Random down-sampling of the UMI-count matrix is supported.
    • Arbitrary subsets of the aggregated dataset can be generated.
  2. Perform downstream analysis using: pipelines/pipeline_scxl.py

    • This can be run either from count matrices (e.g. pipeline_cellranger.py output) or from saved Seurat object(s).
    • Analysis of multiple samples with different parameter combinations can be executed in parallel.
    • Supports testing for genes differently expressed between conditions.
    • Supports finding conserved markers (both between cluster and condition).
    • Support for basic geneset over-enrichment analysis (including of arbitrary "gmt" genesets e.g. from MSigDB) using gsfisher.
    • Support for visualising expression of arbitrary lists of genes on violin and UMAP plots.
    • The pipeline includes Clustree, PAGA, ScVelo, Diffusion maps, PHATE maps and SingleR.
    • The pipeline can automatically generate UCSC cell browser instances.

About

Pipelines for the analysis of 10x single-cell RNA-sequencing data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published