Skip to content

OluwadareLab/EnsembleSplice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code for the paper EnsembleSplice: Ensemble Deep Learning Model for Splice Site Prediction

EnsembleSplice: Ensemble Deep Learning Model for Splice Site Prediction

OluwadareLab, University of Colorado, Colorado Springs


Developers:
              Trevor Martin
              Department of Mathematics
              Oberlin College, Oberlin, OH
              Email: trevormartin4321@gmail.com

              Victor Akpokiro
              Department of Computer Science
              University of Colorado, Colorado Springs
              Email: vakpokir@uccs.edu

Contact:
              Oluwatosin Oluwadare, PhD
              Department of Computer Science
              University of Colorado, Colorado Springs
              Email: ooluwada@uccs.edu


1. Code Overview & Dependencies:

We have attached the requirement file for the list of dependencies. For local install of dependencies from the requirement.txt file for virtual environment usage, use command pip install -r requirements.txt from the current working directory.

This project is compatible with Anaconda environments.

When EnsembleSplice is run for Validation, Training, or Testing two things occur. First, a file with the sub-networks, splice sites, dataset, and other relevant information in its name is created. This is a text file containing dictionaries of output results. To have the results printed in the terminal or on Colab, move ENS_Temp_Run.py into ./Logs/ and run it. You can also create a new folder and move the log files and ENS_Temp_Run.py into this folder and then run ENS_Temp_Run.py. Second, the trained sub-networks and their weights are added to ./Models/TrainedModels/.

To run the actual ensemble, make sure the argument --esplice is used. The specified sub-networks are the only models that run when this argument is not used, and results outputs are produced for each submodel.

2. Validation :

EnsembleSplice Validation: To perform validation training Usage: To train, type in the terminal python3 exec.py [--train] [--donor, --acceptor] [--cnn1, --cnn2, --cnn3, --cnn4, --dnn1, --dnn2, --dnn3, --dnn4] [--hs3d_bal, --ar, --hs2] [--esplice]
For Example: python exec.py -validate --donor --dnn1 --dnn2 --dnn3 --dnn4 --cnn1 --cnn2 --cnn3 --cnn4 --hs3d_bal --esplice"

  • Outputs:
    The outputs of training includes:
    • .h5: The deepslicer model file.
    • .txt: The output files (.txt) containig the evaluation metrics results is stored in the log directory.

3. Training :

EnsembleSplice Training: To perform training and saving Usage: To train, type in the terminal python3 exec.py [--train] [--donor, --acceptor] [--cnn1, --cnn2, --cnn3, --cnn4, --dnn1, --dnn2, --dnn3, --dnn4] [--hs3d_bal, --ar, --hs2] [--esplice]

For Example: python exec.py -train --donor --dnn1 --dnn2 --dnn3 --dnn4 --cnn1 --cnn2 --cnn3 --cnn4 --hs2 --esplice"

See exec.py for more details.

  • Outputs:
    The outputs of training includes:
    • .h5: The deepslicer model file contained in ./Models/TrainedModels/
    • .txt: The output files (.txt) containig the evaluation metrics results is stored in the log directory.

4. Testing :

EnsembleSplice Testing: To perform testing For Testing, use python3 exec.py [--test,] [--donor, --acceptor] [--cnn1, --cnn2, --cnn3, --cnn4, --dnn1, --dnn2, --dnn3, --dnn4] [--hs3d_bal, --ar, --hs2] [--esplice]
For Example: python exec.py -test --donor --dnn1 --dnn2 --dnn3 --dnn4 --cnn1 --cnn2 --cnn3 --cnn4 --ar --esplice
ither balanced or imbalanced input dataset, i.e ("balanced" or "imbalanced")

  • Outputs:
    The outputs of testing includes:
    • .txt: The output files (.txt) containig the evaluation metrics results is stored in the log directory.

5. Note:

  • Ensure you have a log directory for text file storage