Skip to content

LIONS is a bioinformatic analysis pipeline which brings together a few pieces of software and some home-brewed scripts to annotate a paired-end RNAseq library to detect TE-intiated transcripts

License

ababaian/LIONS

Repository files navigation

LIONS

Detecting TE-initiated transcripts from paired-end RNAseq

LIONS is a bioinformatic analysis pipeline which brings together a few pieces of software and some home-brewed scripts to annotate a paired-end RNAseq library against a reference TE annotation set (such as Repeat Masker)

East Lion scripts processes bam file input, re-aligns it to a genome, builds an ab initio assembly using Tophat2. This assembly is then proccessed and local read searches are done at the 5' ends to find additional transcript start sites and quality control the 5' ends of the assembly. The output is a file-type .lions which annotates the intersection between the assembly, a reference gene set and repeat set.

West Lion scripts compile different .lions files, groups them into biological catagories (i.e. Cancer vs. Normal or Treatment vs. Control) and compares and analyzes the data to create graphs and meaningful interpretation of the data.

Installation

  1. Download the LIONS repo

  2. Install the dependencies for LIONS

  3. Initialize the 'Parameter Files' for your system for LIONS

    1. $LIONS_PATH/controls/<system>.sysctrl: System-specific variables
    2. $LIONS_PATH/controls/<project>.ctrl: Project-specific variables
    3. $LIONS_PATH/controls/<input>.ctrl: List of RNA-seq file inputs for project
  4. Add Reference / Annotation files for LIONS

  5. Populate the resource files: NOTE: UCSC files are downloaded from: UCSC Genome Browser). There is an example folder with example of what files should look like.

    1. In $LIONS_PATH/resources/<genomeName>/genome/ add a .fa genome sequence file
    2. In $LIONS_PATH/resources/<genomeName>/repeat/ UCSC annotation for RepeatMasker for
    3. (Optional) In $LIONS_PATH/resources/<genomeName>/annotation/ UCSC annotation for protein-coding genes
  6. Run the master lions.sh in bash:

    bash $LIONS_PATH/lions.sh <$LIONS_PATH/controls/parameter.ctrl>
    

If you have any questions please email me: Artem Babaian. I'll do my best to respond and help get this working!

About

LIONS is a bioinformatic analysis pipeline which brings together a few pieces of software and some home-brewed scripts to annotate a paired-end RNAseq library to detect TE-intiated transcripts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published