GitHub - GWW/sctx: scRNA-seq utlities for deconvoluting transplant samples

scTx

Is a method developed to simulate transplant scRNA-seq samples and to identify donor and recipient cells from transplant scRNA-seq samples. At the moment scTx relies on the annotated SNV output from scSNV. scTx also relies on the index file generated by scSNV for annotation data.

##Demultiplexing transplant samples

#Build the python module for demultiplexing

git clone --recurse-submodules https://github.com/GWW/sctx.git
cd sctx
pip install .

The only required output is the annotated pileup file from scSNV. There is an example file and notebook to demultiplex a kidney transplant sample in the example folder.

Simulationg transplant samples with ambient RNA and doublets

The code for mixing is now part of scSNV to simplify compiling. Instructions below assume the scsnv binary is on your path

scTx requires the HDF5 C and C++ Libraries to compile the simulation framework.

Simulating Transplant Samples

To simulate a mixture of two scRNA-seq samples we require three files.

A tab separated barcode mapping file listing which singlets and doublets are to be included in the pileup output.

The first column name is the barcode that will be assigned to the cell. The second column are the file indices and barcodes to be used when generating the cell. The file index is based on the order of the bam files specified on the command line. In the example table below the scRNA-seq lung sample would be the first bam file and the PBMC file would be the second. Doublets can be generated by specifying two barcodes instead of one.

name    barcodes
LUNG_1842       0:CGATGGCAGTTATCGC
LUNG_0334       0:GGACATTTCTGTTTGT
DBL_40          0:GTTAAGCCACACCGCA,1:GACCAATAGTGACTCT
LUNG_0517       0:ACGAGGACACTGCCAG
LUNG_1724       0:ACTTGTTAGGTGGGTT
LUNG_0025       0:GCGCGATGTGTTCGAT
LUNG_0900       0:TACCTTAAGAGATGAG
LUNG_1775       0:TGGCGCATCAGGCGAA
LUNG_1081       0:ACACCGGAGCTGCAAG
PBMC_1023       1:CTCTGGTAGGAGTAGA

The two collapsed bam files from scSNV to mix, for example, in the table above we would need the lung and PBMC bam files specified in that order.

The mixture command can be run as follows:

scsnv mixture -a 0 -i scsnv_index_path_prefix -r genome.fa -o mixture/pileup -m barcode_map.txt -t 4 lung_collapsed.bam pbmc_collapsed.bam

Important Arguments:

Option	Argument	Function	Required
-a, --ambient	float	Ambient RNA contamination level to simulate; for example, 0.10 for 10%, default 0.0	No
-i, --txidx	path	scSNV transcriptome index	Yes
-r, --reference	path	Genome reference fasta	Yes
-o, --out	path	Output prefix	Yes
-m, --mixture	path	Barcode mapping file	Yes
-l, --library	str	Library type (see below)
-h, --help	None	Print other command line options	No

There are some optional arguments to control SNV filtering as well that can be viewed with the sctx mixture -h command. The default values were used for all of the manuscript work.

Library types:

10X V2 3-prime:   -l V2
10X V3 3-prime:   -l V3
10X V2/V1 5-prime:   -l V2_5P
10X V3 5-prime:   -l V3_5P

Output files:

File	Contents
mixture/pileup_barcode_matrices.h5	Pileup data matrices (same as scSNV pileup)
mixture/pileup.txt.gz	Summary data for each SNV that passed the filtering criteria (same as scSNV pileup)
mixture/pileup_mix_counts.txt	The number of molecules used from each of the bam files
mixture/pileup_barcodes.txt.gz	Barcode specific molecule counts (ie. those lost and gained as ambient RNA molecules) and the total molecules from each genotype

The pileup output files can then be annotated using the scSNV annotate command to identify potential RNA edits etc.

The annotated pileup output can be converted to files suitable for Vireo and Souporcell using the sctxmisc script incuding with the sctx python package:

sctxmisc snv2vcfmtx -r ref_lenghts.txt -f genome.fa -o mixture/vireo -e -m mixture/pileup_annotated.h5

This will write the output files necessary for vireo and souporcell.

The -e option removes edits. There are some additional filtering options that can be viewed with sctxmisc snv2vcfmtx -h

The ref_lengths file is a tab deliminated file with the reference name, length and comment

1       248956422       dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF
10      133797422       dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF
11      135086622       dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF
12      133275309       dna:chromosome chromosome:GRCh38:12:1:133275309:1 REF
13      114364328       dna:chromosome chromosome:GRCh38:13:1:114364328:1 REF
14      107043718       dna:chromosome chromosome:GRCh38:14:1:107043718:1 REF
15      101991189       dna:chromosome chromosome:GRCh38:15:1:101991189:1 REF
16      90338345        dna:chromosome chromosome:GRCh38:16:1:90338345:1 REF
17      83257441        dna:chromosome chromosome:GRCh38:17:1:83257441:1 REF
18      80373285        dna:chromosome chromosome:GRCh38:18:1:80373285:1 REF
19      58617616        dna:chromosome chromosome:GRCh38:19:1:58617616:1 REF
2       242193529       dna:chromosome chromosome:GRCh38:2:1:242193529:1 REF

This file is automatically generated in the scSNV index folder with the suffix _lengths.txt

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
example		example
src/sctx		src/sctx
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

example

example

src/sctx

src/sctx

.gitignore

.gitignore

.gitmodules

.gitmodules

LICENSE

LICENSE

README.md

README.md

pyproject.toml

pyproject.toml

setup.py

setup.py

Repository files navigation

scTx

Simulationg transplant samples with ambient RNA and doublets

Simulating Transplant Samples

Important Arguments:

Output files:

About

Releases

Packages

Languages

License

GWW/sctx

Folders and files

Latest commit

History

Repository files navigation

scTx

Simulationg transplant samples with ambient RNA and doublets

Simulating Transplant Samples

Important Arguments:

Output files:

About

Resources

License

Stars

Watchers

Forks

Languages