Skip to content

jcleydsonsilva/VORFome

Repository files navigation

VORFome

Chapter 2

ORFomes: artificial intelligence in the detection of geminiviral ORFs

Cite:

This project aims to develop an artificial intelligence (AI)-based system for the detection of Open Reading Frames (ORFs) in Geminiviruses, which are a group of plant-infecting viruses known to encode small ORFs and possess introns, a rare occurrence in viral genomes. The identification of these ORFs is crucial for understanding the pathogenicity of Geminiviruses and developing effective control measures.

Using machine learning algorithms and neural networks, the proposed system will analyze genomic data and accurately identify ORFs in Geminiviruses. The system will be trained on a large dataset of Geminivirus genomic sequences, taking into account the presence of introns and small ORFs.

The project will involve various stages, including data collection and preprocessing, feature selection, model training and validation, and performance evaluation. The system will be implemented using Python and will be made available as an open-source tool on Github.

The development of an AI-based system for ORF detection in Geminiviruses, which takes into account the presence of introns and small ORFs, has the potential to significantly improve our understanding of these viruses and their interactions with host plants. The tool will also be useful for researchers and plant pathologists working on Geminivirus control strategies.

Install VORFome

sudo apt install git

git clone https://github.com/jcleydsonsilva/VORFome.git

cd VORFome

pip install biopython

pip install numpy

The pipeline consists of three steps:

  1. Running the ORFome.py script with the input file Sequence.fasta. This script likely identifies and extracts Open Reading Frames (ORFs) from the given sequence file.

python ORFome.py Sequence.fasta

  1. Executing the Features.py script with the input files ORFs-cds.fasta and ORFs-to-aminoadics_sequence.fasta. This script is expected to generate features or additional information based on the identified ORFs and their corresponding coding sequences.

python Features.py ORFs-cds.fasta ORFs-to-amino-acids-sequence.fasta

  1. Running the Classifier.py script with the input file Features.csv. This script likely performs classification tasks using the generated features and a machine learning classifier.

python Classifier.py Features.csv

About

AI-based Detection of Open Reading Frames (ORFs) in Geminiviruses

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages