Skip to content

samuel-marsh/Marsh_et-al_2022_scRNAseq_Dissociation_Artifacts

Repository files navigation

Marsh_et-al_2022_scRNAseq_Dissociation_Artifacts

license DOI

Code to reproduce analysis objects for the data contained in:

Samuel E. Marsh1,* , Alec J. Walker, Tushar Kamath1, Lasse Dissing-Olesen, Timothy R. Hammond2, T. Yvanka de Soysa, Adam M.H. Young, Sarah Murphy, Abdulraouf Abdulraouf, Naeem Nadaf, Connor Dufort, Alicia C. Walker, Liliana E. Lucca, Velina Kozareva2, Charles Vanderburg, Soyon Hong, Harry Bulstrode, Peter J. Hutchinson, Daniel J. Gaffney, David A. Hafler, Robin J.M. Franklin, Evan Z. Macosko, & Beth Stevens.

1Performed analysis
2Assisted analysis
*Analysis lead (contact: samuel.marsh@childrens.harvard.edu)

*NOTE* If you do not have institutional access to the above article please use the request link in bibliography here to submit request for copy from corresponding authors.

bioRxiv Preprint
An earlier version of this work appeared in preprint form on bioRxiv. A link to preprint and zip folder of the GitHub repository from the preprint can be found below:
Link to the earlier preprint version of this manuscript here.
A copy of the code/prior repository which contained analyses from preprint can be downloaded in zip form here.

Code

Included is the code necessary to replicate the Seurat or LIGER (or both) objects used for analysis and plotting.

  • Each R file specifies version of Seurat/LIGER used for analysis/object creation.

    • Some analyses were performed across multiple versions of Seurat (V2 > V3). In this scenario objects were updated to V3 using UpdateSeuratObject
    • Scripts specify point of upgrade to V3 in regard to analysis or object modification.
    • Seurat V2.3.4 source package can be downloaded here from CRAN Archive and installed from local source.
    • To maintain consistency, Seurat V3.1.5 was downloaded from CRAN Archive and installed from local source when switching between V2 and V3 was necessary.
  • Where possible date of analysis performed prior to is specified. To replicate analyses performed on specific date the following actions are recommended or described in code:

    • Use of contained environment using packrat or renv packages. Followed by date-specific version installation of CRAN packages using versions package.
    • Archived source versions of specific packages may also be needed depending on version of R and can be downloaded from CRAN archives and installed from local source.
  • LIGER analyses were performed using the in development "online" branch, updating throughout analysis to accommodate bug fixes.

    • LIGER analyses also utilize multiple versions of Seurat as specified in code for some of the following situations:
      • Seurat V3 used used for data import, QC filtering (genes, UMIs, % mito), and majority of plotting.
      • Seurat V2 was used during LIGER analysis workflow to accommodate use of now deprecated clusterLouvainJaccard function which relied on Seurat V2 object structure.
      • Conversion between Seurat and LIGER objects was performed using built in LIGER functions seuratToLiger and ligerToSeurat.
  • scCustomize R package was used in a pre-release development form during analysis.

    • Some of the function names may be different in this repo compared to their public release form.
    • List of functions (and tutorials) for scCustomize can be found at website here.

Data

Original Data

The data in this project can be broadly divided into 2 categories (7 sub-projects). Please see SI Table 1 & 2 (SI Table 1: Mouse Experiments 1-4) and (SI Table 2; Human Experiments 4-7 & Human Literature Reanalysis) for breakdown by sample, metadata, and more information.

A brief overview with links to the raw data (fastqs) and processed data (Cell Ranger count Gene Expression Matrices) see table below

Experiment Species Seq Used Description Raw/Count Data
Exp. 1 Mouse scRNA-seq (10X 3' V2) scRNA-seq of microglia with 4 different dissociation protocols GSE152183
Exp. 2 Mouse scRNA-seq (10X 3' V2) scRNA-seq of all CNS cells with or without inhibitors GSE152182
Exp. 3 Mouse scRNA-seq (10X 3' V2) scRNA-seq of microglia (tail vein PBS injection) GSE152210
Exp. 4 Mouse scRNA-seq (10X 3' V3.0 & V3.1) scRNA-seq of microglia w or w/o Inhibitors (10X Version Analysis) GSE188441
Exp. 5 Human snRNA-seq (10X 3' V3.0) snRNA-seq of post-mortem brain tissue GSE157760
Exp. 6 Human snRNA-seq (10X 3' V3.0) snRNA-seq of surgically resected brain tissue with or without freezing time delay EGAD00001008541
Exp. 7 Human scRNA-seq (10X 5' V1) scRNA-seq phs002222.v2.p1

Processed Data

All proceesed data files represent the output from Cell Ranger count. Files provided are the "filtered_feature_bc_matrix" (i.e. only containing the barcodes that Cell Ranger called as cells during preprocessing). Information on Cell Ranger version and Genome/Annotation for each experiment can be found in SI Table 1 & 2 as well as individual repository meta data.

Experiments 1-4, 5 (NCBI GEO)
There are 3 processed data files per library:

  1. GSM*_Sample-Name_barcodes.tsv.gz: corresponds to the cell barcodes (i.e. column names).
  2. GSM*_Sample-Name_features.tsv.gz: corresponds to the gene identifiers (i.e. row names).
  3. GSM*_Sample-Name_matrix.mtx.gz: expression matrix in sparse format.

Raw fastq Files

All raw data fastq/BAM files can be downloaded from SRA linked from NCBI GEO records, or from EGA/dbGaP records.

Literature Reanalysis

Reanalyzed data from literature is summarized detailed in table below.

Dataset Species Seq Used Raw/Count Data Publication
Mathys Mouse scRNAseq (Smart-seq2) GEO103334
& Authorsa
Mathys et al., 2017
(Cell Reports)
Plemel Mouse scRNAseq (10X 3' V2) GSE115803 Plemel et al., 2020
(Science Advances)
Zywitza Mouse scRNAseq (Drop-Seq) GSE111527 Zywitza et al., 2018
(Cell Reports)
Mizrak Mouse scRNAseq (Microwell Seq) GSE109447 Mizrak et al., 2019
(Cell Reports)
Zeisel Mouse scRNAseq (10X 3' V1 & V2) mousebrain.org Zeisel et al., 2018
(Cell)
Hammond Mouse scRNAseq (10X 3' V1 & V2) GSE121654 Hammond et al., 2019
(Immunity)
Keren-Shaulb Mouse MARS-Seq GSE98969
Keren-Shaul et al., 2017
(Cell)
Pasciuto Mouse scRNAseq (10X 3' V2) GSE144038
& Mendeley Data
Pasciuto et al., 2020
(Cell)
Crinier Mouse scRNAseq (10X 3' V2) GSE119562 Crinier et al., 2018
(Cell)
Pasciuto Human scRNAseq (10X 3' V2) GSE146165
& Mendeley Data
Pasciuto et al., 2020
(Cell)
Zhou Human snRNAseq (10X 5' V1) syn21670836 Zhou et al., 2020
(Nature Medicine)
Morabitoi Human snRNAseq (10X 3' V3.0) syn18915937 Morabito et al., 2020
(Human Molecular Genetics)
Leng & Li Human snRNAseq (10X 3' V2) syn21788402c
& GSE147528
Leng & Li et al., 2021
(Nature Neuroscience)

aFPKM data and raw fastq files are available via GEO. Raw count matrix was obtained via personal communication with authors.
bOnly a specific subset of samples were used in reanalysis. See reanalysis code for more information.
cData on synapse are post-QC and were used for re-analysis. GEO records contain the all barcodes (unfiltered) HDF5 cellranger output files and fastqs.
iReanalysis of Morabito et al., was also used for calculation of cell type proportions in Liddelow, Marsh, & Stevens et al., 2020 (Trends in Immunology)

Human Data Reanalysis Meta Data

Meta data for human data was assembled from published SI Tables, public data on synapse, or restricted access data on synapse

  • Compiled publicly available meta data variables for each human dataset can be found in SI Table 2.
  • "DUC" in the table indicates data available from synapse following submission and approval of Data Use Certificate.

Acknowledgements:

This study was supported by funding from Cure Alzheimer's Fund (B.S.). Special thanks to authors Tushar Kamath, Tim Hammond, Alec Walker, Lasse-Dissing-Olesen, Velina Kozareva, Evan Macosko, as well other members of Stevens and Macosko labs for helpful discussions and assistance during the analysis phase of this project.

Data Acknowledgements:
The analysis and results published here from Zhou et al., 2020 in whole or in part are based on data obtained from the AMP-AD Knowledge Portal. Samples for this study were provided by the Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago. Data collection was supported through funding by NIA grants P30AG10161, R01AG15819, R01AG17917, R01AG30146, R01AG36836, U01AG32984, U01AG46152, the Illinois Department of Public Health, and the Translational Genomics Research Institute. Raw data used in analysis here are available from AMP-AD/Synapse database through links provided in table above. Additional ROSMAP data can be requested at https://www.radc.rush.edu.

The analysis and results published here for Morabito et al., 2020 are based on reanalysis of study data downloaded from Synapse as provided by Dr. Vivek Swarup, Institute for Memory Impairments and Neurological Disorders, University of California, Irvine. Data collection was supported through funding UCI Startup funds and American Federation of Aging Research. Raw data used in analysis here are available from the Synapse database through link provided in table above.