OneAAatATime

Simulate single AA mutation across all of the AA chain of your .pdb file. Before running the main.py make sure that you have installed PyRosetta and python3.

How to do single amino acid mutation on a new .pdb file

You can clone this repo and add your .pdb file to the pdb_files directory. Add the lines below to main.py to simulate the single amino acid mutation across the entire chain.

new_protein = pyrosetta.pose_from_pdb("pdb_files/new_protein.pdb")

single_mutation_analysis(new_protein, "data.csv")

single_mutation_analysis() takes pose and filename. Afterwards, each amino acid in the chain will be replaced with all 19 other amino acids and then energy values and the number of hydrogen bonds will be calculated for each of them.

After the addition of the two lines above, go to the directory where this repository is located and run the line below in terminal to begin the simulation. Make sure that you have installed PyRosetta for that environment.

python3 main.py

Note: Make sure that only the relevant single_mutation_analysis() is running and comment out the unnecessary ones. Additionally, note that the simulation can take sometime. During the simulation you will see outputs in your terminal, but you can and should ignore them. A progress bar is also in the outputs after each iteration is completed and should provide you with an estimated time of completion which is written on the right hand side of the bar.

The following data are the headers of the generated .`csv` file which contains the result of simulation:

type: wild-type or mutant (only the first row is wild-type)
residue_number: The number of the residue that is being mutated. In "ARVB" sequence, if R is being mutated then the value will be 2 (one-based indexing)
previous_aa: the previous amino acid that was in the wild-type before being mutated at location residue_number
new_aa_1l: the 1 letter code for the new amino acid that will be in the mutant
new_aa_3l: the 3 letter code for the new amino acid that will be in the mutant
conversion: Details location and identity of the both former and successor proteins. A String where the value follows the following formula.

conversion = f"{residue_number}{previous_aa}to{new_aa_1l}"

new_seq: if mutant then the sequence after the mutation otherwise wild-type
fa_score : Full Atom energy score calculated using ref2015 formula in PyRosetta
ddg_score: the delta delta g value (also calculated using ref2015)
hbond_score: the number of hydrogen-bonds
sasa_score: The solvent accessible surface area (SASA) is a measure of the protein's exposure to the solvent. Comparing SASA values between wild-type and mutant proteins can help you determine if the mutation affects the exposure of hydrophobic or hydrophilic residues.
secondary_structure: a String for the secondary protein structure
diff_hbonds: hbond_score of mutant - hbond_score of wildtype
diff_sasa: sasa_score of mutant - sasa_score of wildtype
diff_secondary_structure: secondary_structure of mutant - secondary_structure of wildtype calculated by finding out the number of different elements

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
charts		charts
pdb_files		pdb_files
.gitignore		.gitignore
README.md		README.md
data_utils.py		data_utils.py
docked.csv		docked.csv
docked_analysis.ipynb		docked_analysis.ipynb
il_10.csv		il_10.csv
il_10_analysis.ipynb		il_10_analysis.ipynb
intein.csv		intein.csv
intein_analysis.ipynb		intein_analysis.ipynb
main.py		main.py
mean_ddg_vs_new_aa.R		mean_ddg_vs_new_aa.R
mean_ddg_vs_prev_aa.R		mean_ddg_vs_prev_aa.R
mean_ddg_vs_residue_num.R		mean_ddg_vs_residue_num.R
requirements.txt		requirements.txt
sample.csv		sample.csv
sample_single_aa.csv		sample_single_aa.csv
sample_single_aa_del.csv		sample_single_aa_del.csv
stayGold.csv		stayGold.csv
stayGold_analysis.ipynb		stayGold_analysis.ipynb

redradman/OneAAatATime

Folders and files

Latest commit

History

Repository files navigation

OneAAatATime

How to do single amino acid mutation on a new .pdb file

The following data are the headers of the generated .csv file which contains the result of simulation:

About

Topics

Resources

Stars

Watchers

Forks

Languages

The following data are the headers of the generated .`csv` file which contains the result of simulation: