Protein folding

Molecular dynamics simulations are being used to simulate the interactions of different molecules and atoms over time. In the domain of protein structure prediction, molecular dynamics simulations are a common approach where the chemical processes involved in protein folding are simulated on the scale of individual atoms (or pseudo-atoms/amino acids) to reach the protein's equilibrium state, its tertiary structure. Unfortunately, these simulations are computationially very expensive where it takes a large number of computers a very long time to simulate the folding process of a single protein.

In this repository I do rapid prototyping of ideas I have regarding protein structure prediction, some of them building on top of the idea of molecular dynamics simulations, others from the ground up. It may very well be that none of these ideas ever even achieve to fold a simple protein but iterating over different ideas and implementing the basic concept greatly improves my learning experience.

What I have tried so far

Molecular dynamics simulation with OpenMM

This should theoretically be able to fold a protein given the protein's primary structure. It will take too long and requires too many computational resources to be feasable. This is the naive, brute force way of protein structure prediction without any kind of algorithmic optimization. (open notebook)

Genetic programming using chemical potential energy

A genetic algorithm is a non gradient-based method for the optimization of a given fitness function. In this case, I chose the fitness function to be the potential energy of the current three dimensional configuration of amino acids. The genome of the genetic algorithm are angles between neighbouring amino acids and evaluation of the current state is done inside the OpenMM simulation framework. The approach currently lacks the necessary amount of parameters to completely define the relative positioning of amino acids in 3D space and a way to reliably enforce this configuration inside the molecular dynamics simulation (the current approach sometimes leads to numerical problems). (open notebook)

Using gradient descent to optimize the positions of atoms

Gradient descent optimization, minimizing the spatial error of covalent atomic bonds and non-covalent atomic forces. Bonds and forces are parsed from an AMBER forcefield. The gradient is computed on the locations of individual atoms in three-dimensional space using TensorFlow. Optimization does not leverage all computational resources of the system yet, performance improvements should be possible. Additionally, not all chemical forces are modelled with full precision. (open notebook)

Optimized gradient descent using graphs

A direct approach moving atoms directly based on the error of certain atomic bonds. The idea is somewhat similar to the idea of a physics based molecular dynamics simulation but instead of realistically modelling the velocities and forces involved in the system, atom positions are directly updated using the current bond error. The efficiency of this approach is greatly improved by preprocessing the atomic bonds in the amino acid chain using a graph, i.e. finding all atoms that should be moved in a certain direction to reduce the error of one bond while keeping the error of other bonds constant. This preprocessing step is applied to all harmonic bonds, significantly reducing the number of optimization steps needed to minimize the error. Since both, angle and distance of atomic bonds, are optimized, multiple steps are necessary to achieve a low error because adjusting bond angles might affect the distance between two atoms in a different bond and vice versa. It can also leverage the power of multithreading, since the adjustments of atom positions do not depend on one another. The approach currently lacks Coulomb forces. Lennard-Jones forces are implemented but not physically accurate and cause some stability problems. (open notebook)

Further ideas

Approximating molecular dynamics using deep learning

Neural networks can be used to predict future physical states by learning a representation of physical laws through data (example). This concept could be transferred to the domain of chemical simulations where a neural network would take the role of a molecular dynamics simulator, iteratively predicting the next frame of a chemical simulation. To generate training data, existing molecular dynamics simulators can be used.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
forcefields		forcefields
proteins		proteins
.gitignore		.gitignore
CustomProteinVis.ipynb		CustomProteinVis.ipynb
GeneticAlgorithm.ipynb		GeneticAlgorithm.ipynb
GradientFolding.ipynb		GradientFolding.ipynb
LICENSE		LICENSE
MolecularDynamicsSimulation.ipynb		MolecularDynamicsSimulation.ipynb
ProteinGraph.ipynb		ProteinGraph.ipynb
ProteinVis.ipynb		ProteinVis.ipynb
README.md		README.md
Unfolding.ipynb		Unfolding.ipynb
graph2pdb.py		graph2pdb.py
parser.py		parser.py

License

PhilippThoelke/protein-folding

Folders and files

Latest commit

History

Repository files navigation

Protein folding

What I have tried so far

Molecular dynamics simulation with OpenMM

Genetic programming using chemical potential energy

Using gradient descent to optimize the positions of atoms

Optimized gradient descent using graphs

Further ideas

Approximating molecular dynamics using deep learning

About

Topics

Resources

License

Stars

Watchers

Forks

Languages