Skip to content

byee4/junction-counter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

junction-counter

Basically a few steps upstream of any splicing algorithm. It doesn't do anything fancy like detect novel splice junctions, but it doesn't take that long to run. If you just want to count reads supporting splice junctions given an RNA-SEQ bam file and a spliced region, you can use this tool to do it in a somewhat timely manner (unless you have tons of introns :/):

Alt text

Requirements:

Refer to the 'environments.yaml' file for the conda env I tested with, or create your own:

conda create -n jxc python=2.7 samtools=1.4.1 pysam=0.11.2 tqdm pandas
source activate jxc
conda install -c conda-forge bzip2

Then run:

source activate jxc # if you haven't already done this
git clone https://github.com/byee4/junction-counter
python setup.py build
python setup.py install

Usage:

Refer to the bash script run_junction_counter.sh

count-junctions \
--bed data/intron_jxnlist.bed \     # bedfile containing introns (NOTE: make sure these are 0-based BED files of introns)
--bam ENCFF756RDZ.bam \             # alignment bam file
--outfile data/output.txt \         # output file
--min_overlap 1 \                   # requires each read to cover this many bases to be counted as supporting either intron or exon.
--jxc_only \                        # count only reads spanning junctions
--report_names                      # report the names of all reads (WARNING: this might make the file BIG)

More stuff

You can download the BAM file here (make sure to index this as well!)

More more stuff

Methods:
  • (splice = supports this junction, not-spliced = doesn't support it)
  • gather every read that falls between the spliced region (if chr1:10-20 was specified as the intron, our area of interest is 9 to 21)
  • filter out any read that is either: not properly paired, failed qc, isn't a primary alignment
  • use the read's CIGAR string to check whether or not a split "stops" at the left exon (eg read goes from 4-9)
  • use the read's CIGAR string to check whether or not a split "stops" at the right exon (eg read goes from 21-25)
  • use the read's CIGAR string to determine whether or not the read lies somewhere between (eg 13-19)
  • if a read supports the junction site from both left and right sides, +1 'splice'
  • if a read supports a junction site from just one side but not the other, +1 'not-spliced' (although this read probably supports a different splice junction, not just this one)
  • if a read lies totally within the junction, +1 'notspliced' as well.

About

counts reads that support excluded exon-intron junctions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages