Skip to content

Challenge 2

jpbida edited this page Apr 26, 2012 · 6 revisions

Challenge #2: PDB Component Stabilization


RNA tectonics is an approach that utilizes know RNA structural components to build complex RNA devices. We have extracted over 24,000 RNA structural components from known X-ray and NMR structures and are challenging you to design loops that stabilize the known secondary structure.

If you haven’t already downloaded the RSIM source code use the following to create a new directory for the challenge project.

git clone http://www.github.com/jpbida/RSIM

In the RSIM directory you are given the following datasets to work with.

/challenges/challenge1/data/components/
		/comps
			comps_[id]_[component_num].ent
				FORMAT
					PDB file format for the RNA component
		/seqs
			comps_[id]_[component_num].seq.
				FORMAT
					(file_path) (sequence with breaks)
		/as
			ss_comps.txt (text file containing the component id and a secondary structure)
				comps_[id]_[component_num]   ((((...)))))

Your goal is to identify sequences that replace the [BREAKS] in the sequence files, such that the target secondary structure is still maintained. The following is a manual example.

Step 1: Get the Inputs


Secondary structures are located in

/ss/ss_comps.txt

Sequences are located in

comps_[id]_[num].seq 
/seqs
comps_2aar_54.seq
2aar_comps.txt,AAGCCGAAGUGGC[BREAK]GCUACACCUCAGAAGGUGAGAGUCCUGUAGGCGA
/ss
ss_comps.txt
2aar_54 ..(((...((((()))))(((...).(((.....)))))...)))..   35.955

Step 2: Align the target structure with the sequence


..(((...(((((       )))))(((...).(((.....)))))...)))..
AAGCCGAAGUGGC[BREAK]GCUACACCUCAGAAGGUGAGAGUCCUGUAGGCGA

Step 3: Replace [BREAK] with a sequence such that when the sequence is folded with any secondary structure prediction program, the minimal energy fold contains the target secondary structure.


..(((...(((((....)))))(((...).(((.....)))))...)))..
AAGCCGAAGUGGCGAACGCUACACCUCAGAAGGUGAGAGUCCUGUAGGCGA

RNAfold
AAGCCGAAGUGGCGAACGCUACACCUCAGAAGGUGAGAGUCCUGUAGGCGA
..(((..((.(((.....((.(((((....))))))).)))))...)))..
minimum free energy = -16.60 kcal/mol

FAIL! My guess didn’t produce the correct secondary structure.

Step 4: Save data for each design in the following format


2aar_54,[GAAC],ensemble defect,MFE defect,... 

where the first column contains the component_id, the second column contains sequences replacing [BREAK]. If multiple [BREAK] exist use the format [SEQ1,SEQ2,SEQ3...]. The remaining columns are metrics that define how far from the target secondary structure the given component with the designed sequence is such as ensemble, MFE, or probability defect.