Skip to content

Zhanglab-IOZ/lowFrequencyInsertion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lowFrequencyInsertion

ALU, one of the most successful transposable elements, remains actively mobile in the human genome with a copy number well in excess of 1 million. Detecting ALU insertions, however, proves to be challenging due to the chimera artifacts generated by both PCR and single-cell genome amplification. These artifacts often introduce false positive insertions (Fig1, Fig2).
lowFrequencyInsertion is a tool specifically designed for the sensitive detection of low-frequency ALU insertions. It rules out chimera artifacts, leveraging the fact that the ALU length (~300 bp) closely mirrors the normal fragment length (~400 bp) of the next-generation sequencing (Fig2).

Dependencies

  1. novoalign version: 3.09.04
  2. parallel[1] version: 20220722
  3. pysam version: 0.19.1
  4. python version: 3.10.5
  5. samtools version: 1.15.1

Run lowFI

lowFI  
Detect ALU insertions supported by specific soft-clipped read pairs.
  
Usage: lowFI [options]  
[-i <input file, the absolute path is necessary, bam/sam, mandatory>]  
[-o <output file name, suffix will be added automatically, mandatory>]  
[-u <upper limit of soft-clipped part length, limit itself is included, optional, default: 130>]  
[-l <lower limit of soft-clipped part length, limit itself is included, optional, default: 20>]  
[-p <number of jobs to be run in parallel, optional, default: 2>]  
[-m <memory per thread used for samtools sort, optional, defalut: 2G>]  
[-T <ALU consensus sequences novoalign index file, mandatory>]  
[-G <Genome novoalign index file, mandatory>]  
[-R <ALU annotation file, bed, mandatory>]  
[-X <nonreference insertion detection result, bed, optional>]  
[-h <help>]  

References

[1] Tange, Ole. (2018). GNU Parallel 2018. In GNU Parallel 2018 (p. 112). Ole Tange. https://doi.org/10.5281/zenodo.1146014