Skip to content

GWW/sctx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scTx

Is a method developed to simulate transplant scRNA-seq samples and to identify donor and recipient cells from transplant scRNA-seq samples. At the moment scTx relies on the annotated SNV output from scSNV. scTx also relies on the index file generated by scSNV for annotation data.

##Demultiplexing transplant samples

#Build the python module for demultiplexing

git clone --recurse-submodules https://github.com/GWW/sctx.git
cd sctx
pip install .

The only required output is the annotated pileup file from scSNV. There is an example file and notebook to demultiplex a kidney transplant sample in the example folder.

Simulationg transplant samples with ambient RNA and doublets

The code for mixing is now part of scSNV to simplify compiling. Instructions below assume the scsnv binary is on your path

scTx requires the HDF5 C and C++ Libraries to compile the simulation framework.

Simulating Transplant Samples

To simulate a mixture of two scRNA-seq samples we require three files.

  1. A tab separated barcode mapping file listing which singlets and doublets are to be included in the pileup output.

The first column name is the barcode that will be assigned to the cell. The second column are the file indices and barcodes to be used when generating the cell. The file index is based on the order of the bam files specified on the command line. In the example table below the scRNA-seq lung sample would be the first bam file and the PBMC file would be the second. Doublets can be generated by specifying two barcodes instead of one.

name    barcodes
LUNG_1842       0:CGATGGCAGTTATCGC
LUNG_0334       0:GGACATTTCTGTTTGT
DBL_40          0:GTTAAGCCACACCGCA,1:GACCAATAGTGACTCT
LUNG_0517       0:ACGAGGACACTGCCAG
LUNG_1724       0:ACTTGTTAGGTGGGTT
LUNG_0025       0:GCGCGATGTGTTCGAT
LUNG_0900       0:TACCTTAAGAGATGAG
LUNG_1775       0:TGGCGCATCAGGCGAA
LUNG_1081       0:ACACCGGAGCTGCAAG
PBMC_1023       1:CTCTGGTAGGAGTAGA
  1. The two collapsed bam files from scSNV to mix, for example, in the table above we would need the lung and PBMC bam files specified in that order.

The mixture command can be run as follows:

scsnv mixture -a 0 -i scsnv_index_path_prefix -r genome.fa -o mixture/pileup -m barcode_map.txt -t 4 lung_collapsed.bam pbmc_collapsed.bam
Important Arguments:
Option Argument Function Required
-a, --ambient float Ambient RNA contamination level to simulate; for example, 0.10 for 10%, default 0.0 No
-i, --txidx path scSNV transcriptome index Yes
-r, --reference path Genome reference fasta Yes
-o, --out path Output prefix Yes
-m, --mixture path Barcode mapping file Yes
-l, --library str Library type (see below)
-h, --help None Print other command line options No

There are some optional arguments to control SNV filtering as well that can be viewed with the sctx mixture -h command. The default values were used for all of the manuscript work.

Library types:

10X V2 3-prime:   -l V2
10X V3 3-prime:   -l V3
10X V2/V1 5-prime:   -l V2_5P
10X V3 5-prime:   -l V3_5P
Output files:
File Contents
mixture/pileup_barcode_matrices.h5 Pileup data matrices (same as scSNV pileup)
mixture/pileup.txt.gz Summary data for each SNV that passed the filtering criteria (same as scSNV pileup)
mixture/pileup_mix_counts.txt The number of molecules used from each of the bam files
mixture/pileup_barcodes.txt.gz Barcode specific molecule counts (ie. those lost and gained as ambient RNA molecules) and the total molecules from each genotype

The pileup output files can then be annotated using the scSNV annotate command to identify potential RNA edits etc.

The annotated pileup output can be converted to files suitable for Vireo and Souporcell using the sctxmisc script incuding with the sctx python package:

sctxmisc snv2vcfmtx -r ref_lenghts.txt -f genome.fa -o mixture/vireo -e -m mixture/pileup_annotated.h5

This will write the output files necessary for vireo and souporcell.

The -e option removes edits. There are some additional filtering options that can be viewed with sctxmisc snv2vcfmtx -h

The ref_lengths file is a tab deliminated file with the reference name, length and comment

1       248956422       dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF
10      133797422       dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF
11      135086622       dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF
12      133275309       dna:chromosome chromosome:GRCh38:12:1:133275309:1 REF
13      114364328       dna:chromosome chromosome:GRCh38:13:1:114364328:1 REF
14      107043718       dna:chromosome chromosome:GRCh38:14:1:107043718:1 REF
15      101991189       dna:chromosome chromosome:GRCh38:15:1:101991189:1 REF
16      90338345        dna:chromosome chromosome:GRCh38:16:1:90338345:1 REF
17      83257441        dna:chromosome chromosome:GRCh38:17:1:83257441:1 REF
18      80373285        dna:chromosome chromosome:GRCh38:18:1:80373285:1 REF
19      58617616        dna:chromosome chromosome:GRCh38:19:1:58617616:1 REF
2       242193529       dna:chromosome chromosome:GRCh38:2:1:242193529:1 REF

This file is automatically generated in the scSNV index folder with the suffix _lengths.txt

About

scRNA-seq utlities for deconvoluting transplant samples

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published