Skip to content

Project for the Machine Learning course of "Bioinformatics for Computational Genomics" MSc

Notifications You must be signed in to change notification settings

ecianini/Taxonomic-and-genetic-prediction_ML

Repository files navigation

Taxonomic and genetic prediction using codon usage frequencies

Project for the Machine Learning course of "Bioinformatics for Computational Genomics" MSc.

The notebook can be visualized here.

Aim

The work involves utilizing codon percentages from various species' genomes to categorize samples into their respective kingdoms. The primary goal was to explore the potential of codon usage frequencies across different organisms in classifying codon usage into 11 distinct Kingdoms: archaea, bacteria, bacteriophage, plasmid, plant, invertebrate, vertebrate, mammal, rodent, primate, and virus. This analysis encompasses the application and assessment of clustering, classification, and regression techniques acquired throughout the course.

Further information about the project can be found in the specification file. The data folder contains the train and test dataset.

Reference

Khomtchouk, Bohdan B. "Codon usage bias levels predict taxonomic identity and genetic composition." bioRxiv (2020)

About

Project for the Machine Learning course of "Bioinformatics for Computational Genomics" MSc

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published