Is a method developed to simulate transplant scRNA-seq samples and to identify donor and recipient cells from transplant scRNA-seq samples. At the moment scTx relies on the annotated SNV output from scSNV. scTx also relies on the index file generated by scSNV for annotation data.
##Demultiplexing transplant samples
#Build the python module for demultiplexing
git clone --recurse-submodules https://github.com/GWW/sctx.git
cd sctx
pip install .
The only required output is the annotated pileup file from scSNV. There is an example file and notebook to demultiplex a kidney transplant sample in the example folder.
The code for mixing is now part of scSNV to simplify compiling. Instructions below assume the scsnv binary is on your path
scTx requires the HDF5 C and C++ Libraries to compile the simulation framework.
To simulate a mixture of two scRNA-seq samples we require three files.
- A tab separated barcode mapping file listing which singlets and doublets are to be included in the pileup output.
The first column name
is the barcode that will be assigned to the cell. The second column are the file indices and barcodes to be used when generating the cell.
The file index is based on the order of the bam files specified on the command line. In the example table below the scRNA-seq lung sample would be the first bam file and the PBMC file would be the second.
Doublets can be generated by specifying two barcodes instead of one.
name barcodes
LUNG_1842 0:CGATGGCAGTTATCGC
LUNG_0334 0:GGACATTTCTGTTTGT
DBL_40 0:GTTAAGCCACACCGCA,1:GACCAATAGTGACTCT
LUNG_0517 0:ACGAGGACACTGCCAG
LUNG_1724 0:ACTTGTTAGGTGGGTT
LUNG_0025 0:GCGCGATGTGTTCGAT
LUNG_0900 0:TACCTTAAGAGATGAG
LUNG_1775 0:TGGCGCATCAGGCGAA
LUNG_1081 0:ACACCGGAGCTGCAAG
PBMC_1023 1:CTCTGGTAGGAGTAGA
- The two collapsed bam files from scSNV to mix, for example, in the table above we would need the lung and PBMC bam files specified in that order.
The mixture command can be run as follows:
scsnv mixture -a 0 -i scsnv_index_path_prefix -r genome.fa -o mixture/pileup -m barcode_map.txt -t 4 lung_collapsed.bam pbmc_collapsed.bam
Option | Argument | Function | Required |
---|---|---|---|
-a, --ambient | float | Ambient RNA contamination level to simulate; for example, 0.10 for 10%, default 0.0 | No |
-i, --txidx | path | scSNV transcriptome index | Yes |
-r, --reference | path | Genome reference fasta | Yes |
-o, --out | path | Output prefix | Yes |
-m, --mixture | path | Barcode mapping file | Yes |
-l, --library | str | Library type (see below) | |
-h, --help | None | Print other command line options | No |
There are some optional arguments to control SNV filtering as well that can be viewed with the sctx mixture -h
command. The default values were used for all of the manuscript work.
Library types:
10X V2 3-prime: -l V2
10X V3 3-prime: -l V3
10X V2/V1 5-prime: -l V2_5P
10X V3 5-prime: -l V3_5P
File | Contents |
---|---|
mixture/pileup_barcode_matrices.h5 | Pileup data matrices (same as scSNV pileup) |
mixture/pileup.txt.gz | Summary data for each SNV that passed the filtering criteria (same as scSNV pileup) |
mixture/pileup_mix_counts.txt | The number of molecules used from each of the bam files |
mixture/pileup_barcodes.txt.gz | Barcode specific molecule counts (ie. those lost and gained as ambient RNA molecules) and the total molecules from each genotype |
The pileup output files can then be annotated using the scSNV annotate command to identify potential RNA edits etc.
The annotated pileup output can be converted to files suitable for Vireo and Souporcell using the sctxmisc script incuding with the sctx python package:
sctxmisc snv2vcfmtx -r ref_lenghts.txt -f genome.fa -o mixture/vireo -e -m mixture/pileup_annotated.h5
This will write the output files necessary for vireo and souporcell.
The -e
option removes edits. There are some additional filtering options that can be viewed with sctxmisc snv2vcfmtx -h
The ref_lengths file is a tab deliminated file with the reference name, length and comment
1 248956422 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF
10 133797422 dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF
11 135086622 dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF
12 133275309 dna:chromosome chromosome:GRCh38:12:1:133275309:1 REF
13 114364328 dna:chromosome chromosome:GRCh38:13:1:114364328:1 REF
14 107043718 dna:chromosome chromosome:GRCh38:14:1:107043718:1 REF
15 101991189 dna:chromosome chromosome:GRCh38:15:1:101991189:1 REF
16 90338345 dna:chromosome chromosome:GRCh38:16:1:90338345:1 REF
17 83257441 dna:chromosome chromosome:GRCh38:17:1:83257441:1 REF
18 80373285 dna:chromosome chromosome:GRCh38:18:1:80373285:1 REF
19 58617616 dna:chromosome chromosome:GRCh38:19:1:58617616:1 REF
2 242193529 dna:chromosome chromosome:GRCh38:2:1:242193529:1 REF
This file is automatically generated in the scSNV index folder with the suffix _lengths.txt