Skip to content

HarveyYan/RNATracker

Repository files navigation

RNATracker

RNATracker is a deep learning approach to learn mRNA subcellular localization patterns and to infer its outcome. It operates on the cDNA of the longest isoformic protein-coding transcript of a gene with or without its corresponding secondary structure annnotations. The learning targets are fractions/percentage of the transcripts being localized to a fixed set of subcellular compartments of interest.

Our method provides computational-centric insights into the the mRNA trafficking mechanism with identication to the cis-acting zipcodes elements from the transcript sequences.

For what's exactly the RNA trafficking mechanism and its role in the broader gene regulatory network, I find this survey extremely helpful.

RNA localization: Making its way to the center stage

Dataset

  • Cefra-Seq which provides localization targets for cytoplasm, insoluble, membrane and nucleus.

  • APEX-RIP on KDEL(endoplasmic reticulum), Mito(Mitochdrial), NES (cytosol) and NCL (Nucleus)

Other emerging read-mapping technologies investigating subcellular zipcode proximity might provide additional dataset.

Software dependency

Keras version 2.0.9 is recommeneded. The idea can be easily adapted to other deep leaing frameworks such as Tensorflow and PyTorch.

RNAplfold and forgi libraries from the ViennaRNA package and their python wrapper Eden for acquiring RNA secondary annotations.

TOMTOM for comparing similarity between motifs.

Weblogo and its python wrapper Basset for visualizing learned motifs.

Running the codes

  • Scripts/RNATracker.py

    • Main experiment entry

    • Use python3 Scripts/RNATracker.py -h to get a comprehensive list of experiment parameters

    • For model definitions refer to Models/cnn_bilstm_attention.py
  • Scripts/SGDModel.py

    • Experiment without padding or truncation
  • Scripts/mask_test.py

    • Mask test to identify zipcodes with a sufficiently trained RNATracker model
  • Transcript_Coordinates_Mapping/get_conservation_scores.py

    • A script to prepare conseration scores for the downstream mask test
    • Highly recommend downloading Homo_sapiens.GRCh38.cdna.all.fa from the ensembl website, to be further saved under the Data directory

Notes

For secondary structures refer to this customized annotator

About

Prediction of mRNA subcellular localization using deep recurrent neural networks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages