Skip to content
This repository has been archived by the owner on May 10, 2018. It is now read-only.

tonig-evo/3D_gaps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scripts for alignment pipepline used in "Human long intrinsically disordered protein regions are frequent targets of positive selection" (Afanasyeva et al., 2018, Genome research)

Initial Scripts

The main script to start file preparation - prep.sh. Procedure utilises several external programms, such as pal2nal and MSAProb (v. -0.9.7). Before running the script please specify the correct paths in prep.sh and align_seq.py. Run the pipeline from the parent directory as follows:

bash scripts/prep.sh

prep.sh evokes following several scripts:

  1. align_seq.py - main logic of the pipeline is here (alignment, annotation, sequence filtering), this script evokes:
  • MSAProb
  • annot.py
  • final_annot.py
  1. sort_seq.py - to sort seq in both protein and cdna files ('Homo_Sapiens' first) for pal2nal procedure
  2. pal2nal
  3. 4 scripts to write input files for PAML:
  • write_simple.py
  • write_sites.py
  • write_dis.py
  • write_ord.py
  1. nw_prune - to prune a tree according to the list of filtered sequences ('Mammal_tree_Toni_names_noroot.tree' file)

Added filtering steps

Alignments can be optimised further, python scripts are provided:

  • realignment (e.g. muscle or prank) and comparison with initial alignment (Simply read in alignments and compare species by species)
  • pairwise_paml (script in Pairwise_filter)
  • Gblocks (script in Site_filter)
  • Zorro (script in Site_filter)
  • Apply filters (example script in join_filters, needs to be customised)

About

Manuscript for paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published