Skip to content

a protein coding regions annotator that will take the alignment file in the PAF/GFF format and will extract the complete coding regions and prepares them for deep learning.

License

gauravcodepro/miniprot-protein-annotator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

miniprot-protein-annotator

  • a protein coding regions annotator that will take the alignment file in the PAF/GFF format and will generate the fasta from the corresponding fasta files for the aligned regions.
  • implemented faster rates so that you can parse as many aligned regions as you want.
  • you can also create the protein tokenzier from the same for machine learning.

scala-sbt-class-mapper.

3a648218ef3040fdf13c0d06d3ac1c53683b7474

 # align your genome with the given protein  using the miniprot such as 
   miniprot --gff genome.fasta protein.fasta > sample.gf
  • and then run the proteinannotator to extract all the complete coding regions
generatingAlignments("/home/gaurav/Desktop/final_code_push/multi.gff", 
                       "/home/gaurav/Desktop/final_code_push/multi.fasta", 
                              "/home/gaurav/Desktop/final_code_push/multiout.fasta")

Gaurav
Academic Staff Member
Bioinformatics
Institute for Biochemistry and Biology
University of Potsdam
Potsdam,Germany

About

a protein coding regions annotator that will take the alignment file in the PAF/GFF format and will extract the complete coding regions and prepares them for deep learning.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages