DeepSplicer: An Improved Method of Splice Sites Prediction using Deep Learning

OluwadareLab, University of Colorado, Colorado Springs

Developers:
              Victor Akpokiro
              Department of Computer Science
              University of Colorado, Colorado Springs
              Email: vakpokir@uccs.edu

Contact:
              Oluwatosin Oluwadare, PhD
              Department of Computer Science
              University of Colorado, Colorado Springs
              Email: ooluwada@uccs.edu

1. Content of folders:

src: DeepSplicer source code. deepsplicer.py
src: Hyper-parameter tuning source code.
src: DeepSplicer cross-validation source code. deepsplicer_cross_val.py
models: Models file for deepsplicer models
log: Log file for utilization results logs
plots: Plots file for utilization results plots

2. Datasets:

In our research, we utilized five carefully selected datasets from organisms, namely: Homo sapiens, Oryza sativa japonica, Arabidopsis thaliana, Drosophila melanogaster, and Caenorhabditis elegans. We downloaded these reference genomic sequence datasets (FASTA file format) from Albaradei, S. et al and its corresponding annotation sequence (GTF file format) from Ensembl. Our data for constructed to permit a Sequence Length of 400

3. One-Hot encoding:

We used One-hot encoding to transforms our Genomic sequence data and labels into vectors of 0 and 1. In other words, each element in the vector will be 0, except the element that corresponds to the nucleotide base of the sequence data input is 1. Adenine (A) is [1 0 0 0], Cytosine (C) is [0 1 0 0], Guanine (G) is [0 0 1 0], Thymine (T) is [0 0 0 1].

4. Usage:

Usage: To use, type in the terminal python deepsplicer.py -n model_name -s sequence(acceptor or donor) -o organism_name -e encoded_sequnce_file -l encoded_label_file

Arguments:
- model_name: A string for the name of the model
- sequence: A string to specify acceptor or donor input dataset
- organism: A string to specify organism name i.e ["hs", "at", "oriza", "d_mel", "c_elegans"]
- encoded sequence file: A file containing the encoded sequence data
- encoded label file: A file containing the encoded label data

6. Output:

Deepsplicer outputs three files:

.h5: The deepslicer model and weight file.
.txt: A log file that contains the accuracy and evaluation metrics results.
png: contains the plotting of the prediction accuracy

7. Note:

Dataset sequence length is 400.
Deepsplice folders [log, models, plots] is essential for code functionality.
Genomic sequence input data should should transfomed using one-hot encoding.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
logs		logs
models		models
plots		plots
text-book		text-book
README.md		README.md
deepsplicer.py		deepsplicer.py
deepsplicer_cross_val.py		deepsplicer_cross_val.py
hyperparameter_tuner.py		hyperparameter_tuner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logs

logs

models

models

plots

plots

text-book

text-book

README.md

README.md

deepsplicer.py

deepsplicer.py

deepsplicer_cross_val.py

deepsplicer_cross_val.py

hyperparameter_tuner.py

hyperparameter_tuner.py

Repository files navigation

DeepSplicer: An Improved Method of Splice Sites Prediction using Deep Learning

1. Content of folders:

2. Datasets:

3. One-Hot encoding:

4. Usage:

6. Output:

7. Note:

About

Releases

Packages

Contributors 2

Languages

OluwadareLab/DeepSplicer

Folders and files

Latest commit

History

Repository files navigation

DeepSplicer: An Improved Method of Splice Sites Prediction using Deep Learning

1. Content of folders:

2. Datasets:

3. One-Hot encoding:

4. Usage:

6. Output:

7. Note:

About

Topics

Resources

Stars

Watchers

Forks

Languages