Once sequenced, a cancer tumor can have thousands of genetic mutations. But the challenge is distinguishing the mutations that contribute to tumor growth (drivers) from the neutral mutations (passengers).Currently cancer genomic scientists need to classify these mutations and have to decide which of these are clinically actionable, this decision is often done based on reviewing current scientific literature about particular mutations and genes that are identified which is a very time consuming task, to avoid this we develop a machine learning ensemble classifier which helps in automating the classification task
The dataset consists of
Gene Variation of the amino acid Text(the clinical evidence used to classify the genetic mutation) The dataset can be downloaded from here
In this project we use two types of techniques such as classical ML Algorithms such as KNN, Random Forest and Logistic Regression, an another method using deep learning models RNN, CNN and LSTM cells
After a set of variety of experiments Random Forest gives better results compared to other models
- Log loss
- Confusion Matrix