Skip to content

RMeli/gsoc19

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CNN Scoring for Flexible Docking

DOI

Powered by MDAnalysis Powered by RDKit

Abstract

Molecular docking—the prediction of binding modes and binding affinity of a molecule to a target of known structure—is a great computational tool for structure-based drug design. However, docking scoring functions are mostly empirical or knowledge-based and the flexibility of the receptor is completely neglected in most docking studies. Recent advances in the field showed that scoring functions can be effectively learnt by convolutional neural networks (CNNs). Here we want to build on top of these findings and develop a CNN scoring function for flexible docking by extending the capabilities of gnina—a state-of-the-art deep learning framework for molecular docking—and by building an high-quality training dataset for flexible docking.

Project Description

This Google Summer of Code 2019 project aims to extend the capabilities of gnina, the deep learning framework for molecular docking devloped in David Koes's group, to build a CNN-based scoring function for docking with flexible side chains.

The main stages of the project are the following:

  • Build a high-quality training dataset of docking with flexible side chains
  • Enable optimisation of flexible side chains (see PR #73)
    • Split ligand and receptor movable atoms in the correct channels
    • Combine ligand and receptor gradients for geometry optimisation
  • Train a new CNN-based scoring function for docking with flexible side chains (see mltraining/README.md)
    • Evaluate the performance of pose prediction
    • Evaluate the performance of pose optimisation
  • Iterate training on datasets augmented with CNN-optimized poses

This repository collects the different pipelines built in order to achieve the project goals. A list of constributions and fixes to openbabel, smina and gnina (OpenChemistry organisation) and MDAnalysis (NumFocus organisation) is given below.

The datasets related to this project will be released on Zenodo in due time.

Poster

Contributions

GNINA

List of contributions to gnina and gnina-scripts:

  • Optimisation of flexible side chains (PR #73)
  • Added option to pymol_arrows.py (PR #31)
  • Low-memory and faster substitute combine_rows.py (PR #30)
  • Attempt to decrease memory usage of combine_rows.py (PR #29)
  • Added serialization of struct residue (PR #74)
  • Small fixes to gninavis for gradients (PR #72)
  • Fixed Python3 pickle in clustering pipeline (PR #26)
  • Added insertion code support to makeflex.py (PR #65)
  • Improved makeflex.py script to deal with PDB file without atom types (PR #64)
  • Added test support for newer versions of Boost (PR #62)
  • Provided documentation and PDB standardization for makeflex.py script (PR #61)
  • Provided fixes for the makeflex.py script (PR #60)
  • Raised issue about gnina parallel compilation without libmolgrid installed (Issue #57)
  • Updated PDBQTUtilities.cpp to latest OpenBabel version (PR #59)

LibMolGrid

List of contributions to libmolgrid:

  • Fixed issue with unsupported CUDA architecture (PR #5)

SMINA

List of contributions to smina:

  • Fixed a problem with proline residues, broken by flexible docking (MR #3)

OpenBabel

List of contributions to openbabel:

  • Fixed various problems with PDB and PDBQT insertion codes (PR #1998)
  • Fixed CMake when compiling without RapidJSON (PR #1988)

MDAnalysis

List of contributions to MDAnalysis:

  • Improved mass guess (PR #2331)
  • Fixed issues with PDB HEADER field in PDBReader and PDBWriter (PR #2325)
  • Allowed MOL2 parser to ignore status bit strings (PR #2319)

Mentors

  • Dr. David Ryan Koes, Assistant Professor, Department of Computational and Systems Biology, University of Pittsburgh
  • Jocelyn Sunseri, Computational Biology Doctoral Candidate, Carnegie Mellon and University of Pittsburgh