Skip to content

simslab/scHPF

Single-cell Hierarchical Poisson Factorization

About

scHPF is a tool for de novo discovery of both discrete and continuous expression patterns in single-cell RNA-sequencing (scRNA-seq). We find that scHPF’s sparse low-dimensional representations, non-negativity, and explicit modeling of variable sparsity across genes and cells produce highly interpretable factors.

Installation

Environment & Dependencies

scHPF requires Python >= 3.6 and the packages:

The easiest way to setup an environment for scHPF is with the Anaconda Python distribution in Miniconda or anaconda:

conda create -n schpf_p37 python=3.7 scikit-learn numba=0.50 pandas

# for newer anaconda versions
conda activate schpf_p37
# XOR older anaconda verstions
source activate schpf_p37

# Optional, for using loom files as input to preprocessing
pip install -U loompy

Installing from source

Once you have set up the environment, clone this repository and install.

git clone git@github.com:simslab/scHPF.git
cd scHPF
pip install .

Testing your installation

This step important because not all micro-versions of numba play nicely with all micro versions of Python or numpy, and sometimes issues vary across machines. Testing will catch some but not all such issues. From the scHPF base directory do:

conda install pytest
pytest

Please get in touch if tests fail, or if you get segmentation faults or very long train times that and no automatic parallelization, and I'm happy to try to help.

Quick Start: Command Line Interface

  1. Prepare your data.

  2. Train a model.

  3. Get gene and cell scores

API

scHPF has a scikit-learn like API. Trained models are stored in a serialized joblib format.

Help and support

If you have any questions/errors/issues, please open an issue and I be happy to to provide whatever help and guidance I can.

Contributing

Contributions to scHPF are welcome. Please get in touch if you would like to discuss/check it's something I've already done but haven't pushed to master yet. To contribute, please fork scHPF, make your changes, and submit a pull request.

References

Hanna Mendes Levitin, Jinzhou Yuan, Yim Ling Cheng, Francisco JR Ruiz, Erin C Bush, Jeffrey N Bruce, Peter Canoll, Antonio Iavarone, Anna Lasorella, David M Blei, Peter A Sims. "De novo gene signature identification from single‐cell RNA‐seq with hierarchical Poisson factorization." Molecular Systems Biology, 2019. [Open access article]

Peter A. Szabo*, Hanna Mendes Levitin*, Michelle Miron, Mark E. Snyder, Takashi Senda, Jinzhou Yuan, Yim Ling Cheng, Erin C. Bush, Pranay Dogra, Puspa Thapa, Donna L. Farber, Peter A. Sims. "Single-cell transcriptomics of human T cells reveals tissue and activation signatures in health and disease." Nature Communications, 2019. [Open access article] * Co-first authors