Challenge 2
Challenge #2: PDB Component Stabilization
RNA tectonics is an approach that utilizes know RNA structural components to build complex RNA devices. We have extracted over 24,000 RNA structural components from known X-ray and NMR structures and are challenging you to design loops that stabilize the known secondary structure.
If you haven’t already downloaded the RSIM source code use the following to create a new directory for the challenge project.
git clone http://www.github.com/jpbida/RSIM
In the RSIM directory you are given the following datasets to work with.
/challenges/challenge1/data/components/
/comps
comps_[id]_[component_num].ent
FORMAT
PDB file format for the RNA component
/seqs
comps_[id]_[component_num].seq.
FORMAT
(file_path) (sequence with breaks)
/as
ss_comps.txt (text file containing the component id and a secondary structure)
comps_[id]_[component_num] ((((...)))))
Your goal is to identify sequences that replace the [BREAKS] in the sequence files, such that the target secondary structure is still maintained. The following is a manual example.
Step 1: Get the Inputs
Secondary structures are located in
/ss/ss_comps.txt
Sequences are located in
comps_[id]_[num].seq
/seqs
comps_2aar_54.seq
2aar_comps.txt,AAGCCGAAGUGGC[BREAK]GCUACACCUCAGAAGGUGAGAGUCCUGUAGGCGA
/ss
ss_comps.txt
2aar_54 ..(((...((((()))))(((...).(((.....)))))...))).. 35.955
Step 2: Align the target structure with the sequence
..(((...((((( )))))(((...).(((.....)))))...)))..
AAGCCGAAGUGGC[BREAK]GCUACACCUCAGAAGGUGAGAGUCCUGUAGGCGA
Step 3: Replace [BREAK] with a sequence such that when the sequence is folded with any secondary structure prediction program, the minimal energy fold contains the target secondary structure.
..(((...(((((....)))))(((...).(((.....)))))...)))..
AAGCCGAAGUGGCGAACGCUACACCUCAGAAGGUGAGAGUCCUGUAGGCGA
RNAfold
AAGCCGAAGUGGCGAACGCUACACCUCAGAAGGUGAGAGUCCUGUAGGCGA
..(((..((.(((.....((.(((((....))))))).)))))...)))..
minimum free energy = -16.60 kcal/mol
FAIL! My guess didn’t produce the correct secondary structure.
Step 4: Save data for each design in the following format
2aar_54,[GAAC],ensemble defect,MFE defect,...
where the first column contains the component_id, the second column contains sequences replacing [BREAK]. If multiple [BREAK] exist use the format [SEQ1,SEQ2,SEQ3...]. The remaining columns are metrics that define how far from the target secondary structure the given component with the designed sequence is such as ensemble, MFE, or probability defect.