scHLApers

Code to run the scHLApers pipeline for quantifying single-cell HLA expression using personalized reference genomes (Kang et al., Nat Genetics 2023).

Requirements

R program requires (listed version or higher):

R=4.0.5
Biostrings=2.58.0
purrr=0.3.4
readr=2.1.2
stringi=1.7.8
stringr=1.4.0
tidyverse=1.3.1
rtracklayer=1.50.0

Other software:

STAR=2.7.10a https://github.com/alexdobin/STAR
samtools=1.4.1 http://www.htslib.org/download/

Data:

Reference genome (e.g. GRCh38.primary_assembly.genome.fa): available here
Gene annotation file (e.g. gencode.v38.annotation.gtf): available here
Cell barcode whitelist: more info here

Pipeline and example data

Each step has its own directory with necessary scripts and a tutorial walking through the steps. The example_data and example_output directories contain example input and output files for 2 samples. The raw scRNA-seq data for the example was obtained from Yazar et al. Science 2022 study, publicly available on GEO (GSE196830).

Input

The inputs to scHLApers are:

Raw scRNA-seq data (either FASTQ or BAM format)
HLA allele calls (in CSV format, labeled as "SampleX_alleles.csv", see example_data/inputs/alleles for format)

See the HLA analyses tutorial from Sakaue et al. for protocol for imputing HLA alleles from genotype array data.

Step 1: Prepare HLA allelic sequence database

We provide a pre-prepared database generated from IPD-IMGT/HLA version 3.47 that can be directly used in Step 2. Alternatively, you can prepare your own database using the latest IPD-IMGT/HLA verison following the tutorial.

Step 2: Make personalized reference and annotation files

The tutorial demonstrates how to generate personalized contigs (FASTA) and annotations (GTF) files (that will be combined with the masked reference) and how to mask the reference.

Step 3: Quantify single-cell expression with STARsolo

Example scripts for how to run STARsolo for read alignment and expression quantification in single-cell data. Script will need to be modified based on the specifics of your dataset (e.g. UMI length, input format, barcode whitelist path, STAR executable). Please see the STAR manual for all options.

Outputs

The output of scHLApers is a genes by cells expression matrix, with improved classical HLA expression estimates. In the example output, we have filtered the raw STARsolo counts matrix (to remove empty droplets) using a provided list of cell barcodes (see example_data/cell_meta_example.csv).

The raw counts matrix output by the pipeline for example Sample_1006_1007 can be found here: ../example_outputs/STARsolo_results/Sample_1006_1007_scHLApers/Sample_1006_1007_scHLApers_Solo.out/GeneFull_Ex50pAS/raw/UniqueAndMult-EM.mtx

A filtered version is located here (read into R using readRDS): ../example_outputs/STARsolo_results/Sample_1006_1007_scHLApers/exp_EM.rds

Note: The classical HLA genes are named IMGT_A, IMGT_C, IMGT_B, IMGT_DRB1, IMGT_DQA1, IMGT_DQB1, IMGT_DPA1, IMGT_DPB1.

Support

For questions and assistance not answered in tutorials, you can contact Joyce Kang (joyce_kang AT hms.harvard DOT edu).

Reproducing results from the manuscript

Code to reproduce the figures and analyses from Kang et al. will become available at https://github.com/immunogenomics/hla2023.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
1_make_HLA_database		1_make_HLA_database
2_make_personalized_refs		2_make_personalized_refs
3_run_alignment		3_run_alignment
example_data		example_data
example_outputs		example_outputs
images		images
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1_make_HLA_database

1_make_HLA_database

2_make_personalized_refs

2_make_personalized_refs

3_run_alignment

3_run_alignment

example_data

example_data

example_outputs

example_outputs

images

images

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

scHLApers

Requirements

Pipeline and example data

Input

Step 1: Prepare HLA allelic sequence database

Step 2: Make personalized reference and annotation files

Step 3: Quantify single-cell expression with STARsolo

Outputs

Support

Reproducing results from the manuscript

About

Releases 1

Packages

Languages

License

immunogenomics/scHLApers

Folders and files

Latest commit

History

Repository files navigation

scHLApers

Requirements

Pipeline and example data

Input

Step 1: Prepare HLA allelic sequence database

Step 2: Make personalized reference and annotation files

Step 3: Quantify single-cell expression with STARsolo

Outputs

Support

Reproducing results from the manuscript

About

Topics

Resources

License

Stars

Watchers

Forks

Languages