typora-copy-images-to | sequence-fold | command-fold | last-update |
---|---|---|---|
./ |
80 xclip -o | tr -d "\n" | fold -w80 |
75 xclip -o | tr -d "\n" | fold -sw75 | sed -e 's/$/\\/' |
April 2nd, 2018 |
by alper yilmaz for GTU Bioinformatics Program Course
2019-03-14 (PDF version of this document is accessible at goo.gl/bhrkqQ
and html version is available at https://goo.gl/siUoax
)
Please go over Guide to BLAST home and search pages and BLAST Report description guides before proceeding with exercises.
If we align A and B (sequences are provided below) what would be the layout of the alignment?
Sequence for A
>A
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTG
GTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGAC
AGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGT
AACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGG
TAACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCG
ATATTCTGGAAAGCAATGCCAGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTG
GCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTT
GACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGCAATTGAAAACTTTCGTCGATCAGGAATTTGCCCAAATAA
AACATGTCCTGCATGGCATTAGTTTGTTGGGGCAGTGCCCGGATAGCATCAACGCTGCGCTGATTTGCCGTGGCGAGAAA
ATGTCGATCGCCATTATGGCCGGCGTATTAGAAGCGCGCGGTCACAACGTTACTGTTATCGATCCGGTCGAAAAACTGCT
GGCAGTGGGGCATTACCTCGAATCTACCGTCGATATTGCTGAGTCCACCCGCCGTATTGCGGCAAGCCGCATTCCGGCTG
ATCACATGGTGCTGATGGCAGGTTTCACCGCCGGTAATGAAAAAGGCGAACTGGTGGTGCTTGGACGCAACGGTTCCGAC
TACTCTGCTGCGGTGCTGGCTGCCTGTTTACGCGCCGATTGTTGCGAGATTTGGACGGACGTTGACGGGGTCTATACCTG
CGACCCGCGTCAGGTGCCCGATGCGAGGTTGTTGAAGTCGATGTCCTACCAGGAAGCGATGGAGCTTTCCTACTTCGGCG
CTAAAGTTCTTCACCCCCGCACCATTACCCCCATCGCCCAGTTCCAGATCCCTTGCCTGATTAAAAATACCGGAAATCCT
CAAGCACCAGGTACGCTCATTGGTGCCAGCCGTGATGAAGACGAATTACCGGTCAAGGGCATTTCCAATCTGAATAACAT
GGCAATGTTCAGCGTTTCTGGTCCGGGGATGAAAGGGATGGTCGGCATGGCGGCGCGCGTCTTTGCAGCGATGTCACGCG
CCCGTATTTCCGTGGTGCTGATTACGCAATCATCTTCCGAATACAGCATCAGTTTCTGCGTTCCACAAAGCGACTGTGTG
CGAGCTGAACGGGCAATGCAGGAAGAGTTCTACCTGGAACTGAAAGAAGGCTTACTGGAG
Sequence for B
>B
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTG
GTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGAC
AGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGT
AACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGG
TAACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCG
ATATTCTGGAAAGCAATGCCAGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTG
GCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTT
GACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGCAATTGAAAACTTTCGTCGATCAGGAATTTGCCCAAATAA
AACATGTCCTGCATGGCATTAGTTTGTTGGGGCAGTGCCCGGATAGCATCAACGCTGCGCTGATTTGCCGTGGCGAGAAA
ATGTCGATCGCCATTATGGCCGGCGTATTAGAAGCGCGCGGTCACAACGTTACTGTTATCTACTCTGCTGCGGTGCTGGC
TGCCTGTTTACGCGCCGATTGTTGCGAGATTTGGACGGACGTTGACGGGGTCTATACCTGCGACCCGCGTCAGGTGCCCG
ATGCGAGGTTGTTGAAGTCGATGTCCTACCAGGAAGCGATGGAGCTTTCCTACTTCGGCGCTAAAGTTCTTCACCCCCGC
ACCATTACCCCCATCGCCCAGTTCCAGATCCCTTGCCTGATTAAAAATACCGGAAATCCTCAAGCACCAGGTACGCTCAT
TGGTGCCAGCCGTGATGAAGACGAATTACCGGTCAAGGGCATTTCCAATCTGAATAACATGGCAATGTTCAGCGTTTCTG
GTCCGGGGATGAAAGGGATGGTCGGCATGGCGGCGCGCGTCTTTGCAGCGATGTCACGCGCCCGTATTTCCGTGGTGCTG
ATTACGCAATCATCTTCCGAATACAGCATCAGTTTCTGCGTTCCACAAAGCGACTGTGTGCGAGCTGAACGGGCAATGCA
GGAAGAGTTCTACCTGGAAC
In which cases query coverage is 100% (more than one correct answer is accepted)
The sequence below is used in the book "Jurassic Park THE LOST WORLD". Please use blastx to find the hidden message. (Hint: unaligned sequences)
>DinoDNA from THE LOST WORLD p. 135
GAATTCCGGAAGCGAGCAAGAGATAAGTCCTGGCATCAGATACAGTTGGAGATAAGGACGGACGTGTGGCAGCTCCCGCA
GAGGATTCACTGGAAGTGCATTACCTATCCCATGGGAGCCATGGAGTTCGTGGCGCTGGGGGGGCCGGATGCGGGCTCCC
CCACTCCGTTCCCTGATGAAGCCGGAGCCTTCCTGGGGCTGGGGGGGGGCGAGAGGACGGAGGCGGGGGGGCTGCTGGCC
TCCTACCCCCCCTCAGGCCGCGTGTCCCTGGTGCCGTGGGCAGACACGGGTACTTTGGGGACCCCCCAGTGGGTGCCGCC
CGCCACCCAAATGGAGCCCCCCCACTACCTGGAGCTGCTGCAACCCCCCCGGGGCAGCCCCCCCCATCCCTCCTCCGGGC
CCCTACTGCCACTCAGCAGCGGGCCCCCACCCTGCGAGGCCCGTGAGTGCGTCATGGCCAGGAAGAACTGCGGAGCGACG
GCAACGCCGCTGTGGCGCCGGGACGGCACCGGGCATTACCTGTGCAACTGGGCCTCAGCCTGCGGGCTCTACCACCGCCT
CAACGGCCAGAACCGCCCGCTCATCCGCCCCAAAAAGCGCCTGCTGGTGAGTAAGCGCGCAGGCACAGTGTGCAGCCACG
AGCGTGAAAACTGCCAGACATCCACCACCACTCTGTGGCGTCGCAGCCCCATGGGGGACCCCGTCTGCAACAACATTCAC
GCCTGCGGCCTCTACTACAAACTGCACCAAGTGAACCGCCCCCTCACGATGCGCAAAGACGGAATCCAAACCCGAAACCG
CAAAGTTTCCTCCAAGGGTAAAAAGCGGCGCCCCCCGGGGGGGGGAAACCCCTCCGCCACCGCGGGAGGGGGCGCTCCTA
TGGGGGGAGGGGGGGACCCCTCTATGCCCCCCCCGCCGCCCCCCCCGGCCGCCGCCCCCCCTCAAAGCGACGCTCTGTAC
GCTCTCGGCCCCGTGGTCCTTTCGGGCCATTTTCTGCCCTTTGGAAACTCCGGAGGGTTTTTTGGGGGGGGGGCGGGGGG
TTACACGGCCCCCCCGGGGCTGAGCCCGCAGATTTAAATAATAACTCTGACGTGGGCAAGTGGGCCTTGCTGAGAAGACA
GTGTAACATAATAATTTGCACCTCGGCAATTGCAGAGGGTCGATCTCCACTTTGGACACAACAGGGCTACTCGGTAGGAC
CAGATAAGCACTTTGCTCCCTGGACTGAAAAAGAAAGGATTTATCTGTTTGCTTCTTGCTGACAAATCCCTGTGAAAGGT
AAAAGTCGGACACAGCAATCGATTATTTCTCGCCTGTGTGAAATTACTGTGAATATTGTAAATATATATATATATATATA
TATATCTGTATAGAACAGCCTCGGAGGCGGCATGGACCCAGCGTAGATCATGCTGGATTTGTACTGCCGGAATTC
The peptide sequence below is translation of 4th exon of a Drosophilia gene. Use blastp and swissprot index to find out gene name.
MSLTVEIVATKPYEGQKPGTSGLRKKVKVFTQPNYTENFVQAILEANGAALAGSTLVVGGDGRFYCKEAAELIVRLSAAN
GVSKLLVGQNGILSTPAVSSLIRHNKALGGIVLTASHNPGGPENDFGIKFNCENGGPAPDAFTNHIYKITTEIKEYKLVR
NLQIDISKVGVTSFDIAGKPFTVEVIDSVANYVRHMEEIFDFAKLKDFVSGKATGKPLKMRIDAMNGVTGSYVREIFLNR
LGATESSVVHTTPLPDFGGLHPDPNLTYAKDLVDTVAQGDYDIGAAFDGDGDRNMIIGSKAFFVTPSDSLAVIAHYLEAI
PYFQKNGVQGFARSMPTASAVDLVGRKLGKEVFEVPTGWKYFGNLMDAGRLCLCGEESFGTGSNHIREKDGIWAVLAWIS
VMQHTGKGIEDILKQHWSVYGRNYFTRYDYEECASDPCNEMVATMEKTITAPEFVGKSYSSGGKTYKVKEADNFSYTDPV
DKSVATKQGLRIVFEDGSRIVVRLSGTGSSGATVRLYIDSYEKENVLGQASVMLKPLIDIALEISQLPKFTGRNAPTVIT
Below is the sequence of a HIV protein, please find out at which codon (nucleotide coordinate) the frameshift had taken place.
ATGAGAGTGAAGGAGAAATATCAGCACTTGTGGAGATGGGGCACCATGCTCCTTGGGTTGTTGATGATCCGTAGTGCTGC
AGACCAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACCACTCCATTTTGTGCATCAGATG
CTAAAGCATATGATACAGAGGTACATAATGTTTGGGCCACACACGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTA
GTATTGGCAAATGTGGCAGAAAATTTTAACATGTGGGATAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTT
ATGGGATCAAAGCCTAAAGCCATGTGTTAAATTAACCCCACTCTGTGTTACTTTAAACTGCACTGATAAGATTAATAATA
CCAAAACTACTCCTAATAATACCAGTACTACTCCCCCTACCACTGTTACTCCTACTAGTAATAGCAGCATGACAGGAGAA
GGAGAAATAAAAAACTGCTCTTTCAATATCACCACAGCCATAAGAGATAAGGTGCAGAGAGGATATGCATTGTTTAGTAA
ACTTTGATATAGTACCAATAGATAATGATAGAAATGATAGTACCAGCTATAGGTTGCTAAGTTGTAACACCTCAGTCATT
ACACAAGCCTGTCCAAAGGTATCCTTTGAACCAATTCCCATACATTATTGTGCCCCAGCTGGTTTTGCGATTCTAAAGTG
TAACAATAAGACGTTCAGTGGAACAGGACCCTGTACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAATAG
TATCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAAGAAGGGATAGTAATTAGATATGAAAATATCACAGACAATGCT
AAAAGCATAATAATACAGCTGAATGAAACTGTACAAATTAATTGTACAAGACCCAACAATAATACAAGGAAAAGTATACC
TATAGGACCAGGAAGAGCATTTTATGCAACAGGAGATATAATAGGAGATATAAGAAAAGCATATTGTAACATTAGTGGAG
CAAAATGGAATAACACTTTAAAAAGGATAGCTTACAAATTAAAAGAACAATTTCCTAATAAAACAATAGTCTTTAAGCCC
TCCTCAGGAGGGGACCCAGAAATTGTAATGCACAGTTTTAATTGTAGAGGGGAATTTTTCTACTGTAATACAACAAAACT
GTTTGATAGTAGTTGGGATAATACTAATTTGAATAAAACTTGGAATAATACCTGGAATAAAAATAACTCTATCATACTTC
CATGCAGAATAAAACAAATCATAAACATGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCGAAGGACCACTT
TACTGTTTATCAAATATTACAGGGCTAATTTTAACAAGAGATGGTGGGAACGAAACTGATGGGAACAACACTGATGGGAA
TGAGACCTTCAGACCTGGAGGAGGGAATATGAGGGACAATTGGA
Following sequence (source) belongs to a C.elegans gene. Please find its homologs in vertebrates. (Hint: use taxonomy report in results)
>C.elegans protein
MFHPGMTSQPSTSNQMYYDPLYGAEQIVQCNPMDYHQANILCGMQYFNNSHNRYPLLPQMPPQFTNDHPYDFPNVPTIST
LDEASSFNGFLIPSQPSSYNNNNISCVFTPTPCTSSQASSQPPPTPTVNPTPIPPNAGAVLTTAMDSCQQISHVLQCYQQ
GGEDSDFVRKAIESLVKKLKDKRIELDALITAVTSNGKQPTGCVTIQRSLDGRLQVAGRKGVPHVVYARIWRWPKVSKNE
LVKLVQCQTSSDHPDNICINPYHYERVVSNRITSADQSLHVENSPMKSEYLGDAGVIDSCSDWPNTPPDNNFNGGFAPDQ
PQLVTPIISDIPIDLNQIYVPTPPQLLDNWCSIIYYELDTPIGETFKVSARDHGKVIVDGGMDPHGENEGRLCLGALSNV
HRTEASEKARIHIGRGVELTAHADGNISITSNCKIFVRSGYLDYTHGSEYSSKAHRFTPNESSFTVFDIRWAYMQMLRRS
RSSNEAVRAQAAAVAGYAPMSVMPAIMPDSGVDRMRRDFCTIAISFVKAWGDVYQRKTIKETPCWIEVTLHRPLQILDQL
LKNSSQFGSS