Implemented AI for genetic analysis using PyTorch (machine learning framework) with two colleagues, mentored by a PhD geneticist from Universidad de los Andes and a Master's student in Applied Mathematics at Universidad Nacional de Colombia. This project excels in analyzing DNA sequences and classifying them based on discernible motifs.
- Python 3
- Jupyter Notebook (recommended for running in Google Colab)
- Clone the repository to your local machine:
git clone https://github.com/anjimenezp/AI-Genetics.git cd AI-Genetics
This repository contains code for a genomics project utilizing artificial intelligence for the classification of DNA sequences. The code includes the following components:
- Gene sequence data is extracted from a CSV file using Pandas.
- The code provides functionality for generating simulated DNA sequences, but it is not used in the main code.
- Sequence labels are encoded using scikit-learn's LabelEncoder.
- DNA sequences are cleaned and converted to one-hot encoding using PyTorch.
- The data is split into training, validation, and test sets for model training and evaluation.
- PyTorch DataLoaders are prepared for efficient batch processing during training.
- A Convolutional Neural Network (CNN) is defined for classifying DNA sequences.
- Functions for training and validation loops are defined.
- The trained model is evaluated on a test set, and performance metrics are displayed.
- Matplotlib is used to plot training and validation loss curves.
- An example DNA sequence is provided, and the trained model predicts its class.
Feel free to explore the code and adapt it to your genomics classification tasks. If you have any questions or suggestions, please open an issue.
Note: The code assumes the availability of PyTorch, scikit-learn, pandas, and matplotlib libraries. Make sure to install these dependencies before running the code.