the accession used for the analysis are listed here: accession
- downloadrecords.sh: Run this to download the sequence records from the ebi or the ena. Either you can run this or you can run the code below to generate the direct apis for the download
*code for generating the direct apis for the arabidopsis ena*
for i in $(cat arabidopsisaccessionlinks.md | grep GCA | cut -f 2 -d "|");
do
echo "curl https://www.ebi.ac.uk/ena/browser/api/fasta/$i.1\?download\=true\&gzip\=true -o $i.gz";
done
arabidopsis genome directapis directapis.
*Normalize your header by running this before running the analysis*
cat fastafile | cut -f 1 -d " " | cut -f 1 -d "." > output.fasta
# the output fasta will be used for all the analysis.
- alignmentrecords.sh: Run this to make the corresponding alignments, this follows the lift off approach by transferring the annotations.
- phylogeny.R: Run this to make the phylogeny.
- visualfreq.R: Run this to make the alignment visualization.
- mapalignment-phylogeny.R: Run this to make the alignment, visualization.
- generatemRNAs.py: Run this to extract the corresponding mRNAs.
- genome-annotation-visualizer.R: Run this to make the visualization of the genomic features. You can find the code here also evoseq and here also genome-annotation
How to read this github repository
-
allassembly.md all accession that were studied
-
arabidopsisaccessionlinks.md links to the accession and the corresponding ENA archives
-
arabidopsis_paper.pdf arabidopsis paper
-
directapis.txt directapis for the ena
-
cap_alignments folder containing cap alignments with a readme as how to generate them
-
cap_final_joined_fasta folder containing the final fasta, alignments, ancestral tree, phylogenetic tree, acestral sequence, alignment visualization
-
cap_genes folder containing the cap genes
-
maf_alignments folder containing maf alignments with a readme as how to generate them
-
maf_final_joined_fasta folder containing the final fasta alignments, ancestral tree, phylogenetic tree, acestral sequence, alignment visualization
-
maf_genes folder containing the cap genes
-
python_scripts python scripts for analysis
-
r_scripts r scripts for analysis
-
shell_scripts shell scripts for analysis
-
README.md README for the complete analysis
Folder read for the analysis
cap_final_joined_fasta: File listing cap_final_joined_fasta
alignments can be run with the following:
for i in *.fasta; do echo prank -d=${i} -o=${i%.*}.aligned.fasta -showanc -showtree; done
├── all.cap.gff.clipped.gff: All aligned mRNA positions.
├── capgenes.aligned.fasta.best.anc.dnd: best phylogenetic tree
├── capgenes.aligned.fasta.best.anc.fas: best ancestral sequence
├── capgenes.aligned.fasta.best.dnd: best phylogenetic ancestral tree
├── capgenes.aligned.fasta.best.fas: alignment
└── capview.html: visualization of alignment
maf_final_joined_fasta: File listing maf_final_joined_fasta
├── AT5G65050.all.out.fasta : mRNA regions for the AT5G65050
├── AT5G65050.gff.clipped.gff : aligned position information for the AT5G65050
├── AT5G65060.all.out.fasta : mRNA regions for the AT5G65060
├── AT5G65060.gff.clipped.gff : aligned position information for the AT5G65060
├── AT5G65070.all.out.fasta : mRNA regions for the AT5G65070
├── AT5G65070.gff.clipped.gff : aligned position information for the AT5G65070
├── AT5G65080.all.out.fasta : mRNA regions for the AT5G65080
├── AT5G65080.gff.clipped.gff : aligned position information for the AT5G65070
├── final.all.linear.tar.bz : all arabidopsis accessions
├── maf_aligned_ancestral_tree : ancestral tree for each of the indiviual.
├── maf_aligned_best : aligned regions for each of the indiviuals along with the visualization
├── maf_ancestral_sequence : ancestral sequences for each of them.
Uncompress the tar archive by using the tar -xJf TAIR10_GFF3_genes.tar.xz
for the genome annotations.
if you have any questions i can be contacted at gaurav.sablok@uni-potsdam.de or sablokg@gmail.com
Gaurav
Academic Staff Member
Bioinformatics
Institute for Biochemistry and Biology
University of Potsdam
Potsdam,Germany