Targeted Iso Seq QC Wiki

Last Updated: 01/23/2020

This wiki shows scripts used for targeted Iso-Seq analysis.

Python Requirement

How to set up the scripts

No installation is required. You just download the Cupcake repository from GitHub and add the appropriate paths.

$ git clone https://github.com/Magdoll/cDNA_Cupcake.git
$ export PYTHONPATH=$PYTHONPATH:<path_to>/cDNA_Cupcake/sequence

To see that you have successfully gotten BioPython, bx-python, and the sequence/ subdir working, try the following in Python interpreter:

$ python
Python 3.7.4 (default, Aug 13 2019, 20:35:49)
>>> import Bio  # testing BioPython
>>> import bx   # testing bx-python
>>> import BED  # testing Cupcake sequence/

All of the above should work without error messages.

Calculate on-target rate from SAM alignment + probe BED file

You will need:

The input FASTA or FASTQ file
The aligned SAM file or GTF
A tab-delimited probe BED file with at least the first three columns and an optional 4th column of <chrom> <start> <end> <gene_name>. No headers allowed!

You can generate the aligned SAM file using minimap2:

minimap2 -ax splice -t 30 -uf --secondary=no \
   hg38.fa input.fasta > input.fasta.sam

If your input is GTF instead of SAM, you must use the --gtf option.

The script usage is:

python calc_probe_hit_from_sam.py [BED] [FASTA/FASTQ] [SAM/GTF] 
  --start_base {0 or 1}
  --end_base   {0 or 1}
  -o [OUTPUT]
  [--gtf]

Where --start_base, --end_base indicates whether the start/end index is 0-based or 1-based.

For example:

python <path_to>/cDNA_Cupcake/sequence/calc_probe_hit_from_sam.py \
         --gtf \
         my-probes.bed input.fasta input.gtf \
         --start_base 0 --end_base 0 \
         -o input.gtf.probe_hit.txt

The output file will have the following fields:

read_id         : the read ID
read_len        : length of read
num_probe       : number of distinct probes overlapped
num_base_overlap: total base of the read that overlaps with a probe
loci            : mapped start-end of the read on the genome
genes           : the 4th column of the probe BED file, if given

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Targeted Iso Seq QC Wiki

Python Requirement

How to set up the scripts

Calculate on-target rate from SAM alignment + probe BED file

Clone this wiki locally