Skip to content

phipsonlab/SuperCellCyto-analysis

Repository files navigation

SuperCellCyto-analysis

This repository contains the code to reproduce all the analysis done for our paper introducing the SuperCellCyto R package: https://github.com/phipsonlab/SuperCellCyto.

SuperCellCyto is an adaptation of the SuperCell R package. Initially developed for scRNAseq data, the SuperCell package aggregates cells with similar transcriptomic profiles into "supercells" (also known as “metacells” in the scRNAseq literature).

The preprint of the paper is available on bioRxiv:

Putri, G. H., Howitt, G., Marsh-Wakefield, F., Ashhurst, T. M., & Phipson, B. (2023). SuperCellCyto: enabling efficient analysis of large scale cytometry datasets. bioRxiv; DOI: https://doi.org/10.1101/2023.08.14.553168

Contents

To reproduce all the figures in the paper, refer to the Rmd files in the analysis folder:

  • explore_supercell_purity_clustering for Supercells Preserve Biological Heterogeneity and Facilitate Efficient Cell Type Identification
  • b_cells_identification for Identifying Rare B Cells Subsets by Clustering Supercells
  • batch_correction for Mitigating Batch Effects in the Integration of Multi-Batch Cytometry Data at the Supercell Level
  • de_test for Recovery of Differentially Expressed Cell State Markers Across Stimulated and Unstimulated Human Peripheral Blood Cells
  • da_test for Identification of Differentially Abundant Rare Monocyte Subsets in Melanoma Patients
  • label_transfer for Efficient Cell Type Label Transfer Between CITEseq and Cytometry Data
  • run_time for measuring the run time of SuperCellCyto and clustering process applicable for the first 3 items above.

The code folder contains the scripts used to generate the results that are processed in the Rmd files in the analysis folder. Please note that running some of these scripts will take a long time. That's the reason why they are in separate R scripts. Otherwise, each rebuilding of the workflowr website will take hours.

The data and output folders are meant for storing raw data and processed data generated by the scripts in the code folder respectively. The content of these folders are purposely not committed into the repository as they are enormous (over 40GB in total). If you would like to reproduce our analysis, please download the content for the data and output folder from Zenodo: DOI.

Instruction after downloading the files:

  1. Uncompress data_20232308.tar.gz (using tar -zxvf <filename>.tar.gz). You should get one data folder. This is the data folder for the workflowr website.
  2. Uncompress each of the tar.gz files starting with the word output. Each file should uncompress into one folder.
  3. Create a new folder call output and place all the folders uncompressed in step 3 into it.
  4. Run wflow_build().