Skip to content

UnkindPartition/bamcc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bamcc

Description

A set of sequence alignments (represented as a BAM/SAM/CRAM file) defines a bipartite graph in which the nodes are reference and template sequences and the edges are alignments between a template and a reference.

bamcc splits this graph into connected components and reports the mapping from reference sequences to their components.

This is useful e.g. when analyzing multireads in an RNA-Seq experiment, where connected components can be analyzed independently.

Usage

bamcc input.bam output.tsv

The output.tsv file will look like this:

seqid  seqname      component
0      FBtr0005088  0
1      FBtr0006151  1
2      FBtr0070000  2
3      FBtr0070002  0
4      FBtr0070003  2
5      FBtr0070006  3

where:

  1. seqid is the 0-based number of the reference sequence. Sequence numbers are defined by the input file and are stable.
  2. seqname is the reference sequence name.
  3. component is the 0-based component number to which the reference has been assigned.

To extract, say, the 17th component into a separate bam file, run

samtools view -bh -o example.17.bam example.sorted.bam \
  $(awk 'BEGIN{ORS=" "} NR>1 && $3==17 {print $2}' rsem_orig.tsv)

Building

Dependencies

Compilation

CXXFLAGS=-O2 make

This will create an executable bamcc in the current directory.

Testing

First, run make.

Then, run ./test. This will update all files test_files/example*.tsv. Failures may appear either as messages from bamcc or differences in output files reported by git diff.

Security

Do not run this program on untrusted or potentially malformed input files.

About

bamcc — split SAM/BAM files into connected components

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published