Calculating PLEC by bypassing completely OpenBabel #155

tevang · 2021-08-04T22:24:58Z

Hi Maciek,

I want to create PLEC vectors without using OpenBabel at all. The first step is to load the receptor and ligand files, which by convenience I have them in a single PDB file. So according to your explanation on RDKit's mailing list, I extract the ligand's Mol like this:

In [11]: mol = list(oddt.toolkit.readfile('pdb', "complex_forscoring_frm.1.min.pdb", sanitize=False))[0]
In [15]: Chem.rdmolops.SplitMolByPDBResidues(mol.Mol)
Out[15]: 
{' CL': <rdkit.Chem.rdchem.Mol at 0x14a64b65a0f0>,
 'ALA': <rdkit.Chem.rdchem.Mol at 0x14a64b66c0f0>,
 'ARG': <rdkit.Chem.rdchem.Mol at 0x14a64b66cd50>,
 'ASN': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1570>,
 'ASP': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1d50>,
 'CYS': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1b10>,
 'GLN': <rdkit.Chem.rdchem.Mol at 0x14a64b6a14b0>,
 'GLU': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1f30>,
 'GLY': <rdkit.Chem.rdchem.Mol at 0x14a64b6a13f0>,
 'HID': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1870>,
 'HIE': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1330>,
 'ILE': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1c30>,
 'LEU': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1db0>,
 'LIG': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1b70>,
 'LYS': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1ab0>,
 'MET': <rdkit.Chem.rdchem.Mol at 0x14a6840914b0>,
 'Na+': <rdkit.Chem.rdchem.Mol at 0x14a684091210>,
 'PHE': <rdkit.Chem.rdchem.Mol at 0x14a684091c30>,
 'PRO': <rdkit.Chem.rdchem.Mol at 0x14a6840913f0>,
 'SER': <rdkit.Chem.rdchem.Mol at 0x14a685941810>,
 'THR': <rdkit.Chem.rdchem.Mol at 0x14a64b662090>,
 'TRP': <rdkit.Chem.rdchem.Mol at 0x14a64b662930>,
 'TYR': <rdkit.Chem.rdchem.Mol at 0x14a64b6623f0>,
 'VAL': <rdkit.Chem.rdchem.Mol at 0x14a64b662db0>}
In [19]: ligMol = Chem.rdmolops.SplitMolByPDBResidues(mol.Mol)['LIG']

How can I extract the receptor's Mol object (everything but 'LIG' residue)? Is there a command to fuse all these individual RDKit Mol objects and reconstruct an intact receptor Mol object?

Thanks in advance.
Thomas

The text was updated successfully, but these errors were encountered:

mwojcikowski · 2021-08-05T18:52:34Z

Hi Thomas,

In oddt we have created additional module for such operations in RDKit in oddt.toolkits.extras.rdkit.fixer. Note it requires Chem.Mol, not oddt.Molecule.

from oddt.toolkits.extras.rdkit.fixer import ExtractPocketAndLigand
mol = Chem.MolFromPDBFile("complex_forscoring_frm.1.min.pdb", sanitize=False)
pocket, ligand = ExtractPocketAndLigand(mol, cutoff=12., ligand_residue='LIG')

Pocket and ligand are still Chem.Mol objects, which you can convert back to ODDT:

oddt_pocket = oddt.toolkit.Molecule(pocket)

If you wish to get wole protein you can set high cutoff.

oddt/oddt/toolkits/extras/rdkit/fixer.py

Lines 166 to 196 in 8cf5558

    
           def ExtractPocketAndLigand(mol, cutoff=12., expandResidues=True, 
        
                                      ligand_residue=None, ligand_residue_blacklist=None, 
        
                                      append_residues=None): 
        
               """Function extracting a ligand (the largest HETATM residue) and the protein 
        
               pocket within certain cutoff. The selection of pocket atoms can be expanded 
        
               to contain whole residues. The single atom HETATM residues are attributed 
        
               to pocket (metals and waters) 
        
               Parameters 
        
               ---------- 
        
                   mol: rdkit.Chem.rdchem.Mol 
        
                       Molecule with a protein ligand complex 
        
                   cutoff: float (default=12.) 
        
                       Distance cutoff for the pocket atoms 
        
                   expandResidues: bool (default=True) 
        
                       Expand selection to whole residues within cutoff. 
        
                   ligand_residue: string (default None) 
        
                       Residue name which explicitly pint to a ligand(s). 
        
                   ligand_residue_blacklist: array-like, optional (default None) 
        
                       List of residues to ignore during ligand lookup. 
        
                   append_residues: array-like, optional (default None) 
        
                       List of residues to append to pocket, even if they are HETATM, such 
        
                       as MSE, ATP, AMP, ADP, etc. 
        
               Returns 
        
               ------- 
        
                   pocket: rdkit.Chem.rdchem.RWMol 
        
                       Pocket constructed of protein residues/atoms around ligand 
        
                   ligand: rdkit.Chem.rdchem.RWMol 
        
                       Largest HETATM residue contained in input molecule 
        
               """

tevang · 2021-08-08T19:13:04Z

Hi Maciek,

I wrote the following function in line with your suggestion, but I get an error.

import oddt
from oddt.fingerprints import PLEC
from oddt.toolkits.extras.rdkit.fixer import ExtractPocketAndLigand
from rdkit import Chem


def calc_PLEC(complex_pdb, ligand_resname='LIG', cutoff=12.0, depth_ligand=2, depth_protein=4, distance_cutoff=4.5,
                        size=16384, count_bits=True, sparse=False, ignore_hoh=True):
    mol = Chem.MolFromPDBFile(complex_pdb, sanitize=False)
    pocket, ligand = ExtractPocketAndLigand(mol, cutoff=cutoff, ligand_residue=ligand_resname)
    oddt_protein = oddt.toolkit.Molecule(pocket)
    oddt_ligand = oddt.toolkit.Molecule(ligand)
    return PLEC(oddt_ligand, oddt_protein, depth_ligand=depth_ligand, depth_protein=depth_protein,
                    distance_cutoff=distance_cutoff,
                    size=size, count_bits=count_bits, sparse=sparse, ignore_hoh=ignore_hoh)

calc_PLEC('complex_forscoring_frm.1.min.pdb')

The input PDB file is this complex_forscoring_frm.1.min.pdb.gz and the error I get is the following:

RuntimeError: Pre-condition Violation
	RingInfo not initialized
	Violation occurred on line 66 in file Code/GraphMol/RingInfo.cpp
	Failed Expression: df_init
	RDKIT: 2019.03.4
	BOOST: 1_70

Do I need to upgrade RDKit or is it due to something else?

mwojcikowski · 2021-08-10T20:07:21Z

@tevang It looks like you need to sanitize the molecule first. Or at least generate rings with Chem.FastFindRings(...)

tevang · 2021-08-10T21:08:40Z

@mwojcikowski I use the following workaround that - at least seemingly - does not require sanitization or ring generation. Does it produce correct PLEC fingerprints?

def split_complex_pdb(complex_pdb, ligand_resname='LIG'):
    protein_f = open(complex_pdb.replace('.pdb', '_prot.pdb'), 'w')
    ligand_f = open(complex_pdb.replace('.pdb', '_lig.pdb'), 'w')
    with open(complex_pdb, 'r') as f:
        for line in f:
            if line[17:20] == ligand_resname:
                ligand_f.write(line)
            else:
                protein_f.write(line)
    protein_f.close()
    ligand_f.close()
    return protein_f.name, ligand_f.name


protein_pdb, ligand_pdb = split_complex_pdb(complex_pdb, ligand_resname)
oddt_protein = oddt.toolkit.Molecule(list(oddt.toolkit.readfile('pdb', protein_pdb, sanitize=False))[0])
oddt_ligand = oddt.toolkit.Molecule(list(oddt.toolkit.readfile('pdb', ligand_pdb, sanitize=False))[0])
plec_vector = PLEC(oddt_ligand, oddt_protein, depth_ligand=2, depth_protein=4, distance_cutoff=4.5,
                    size=16384, count_bits=True, sparse=False, ignore_hoh=True)

tevang · 2021-08-16T08:54:38Z

@mwojcikowski essentially my query is whether I am getting correct PLEC vectors without sanitizing the ligands or generating rings with FastFindRings() as you stated. Otherwise, it works.

mwojcikowski · 2021-08-17T07:21:41Z

@tevang I would say that sanitization is highly recommended. That said being consistent is the key for most use cases, so you might get away with it. Keep in mind that the input is also important (e.g. kekulization), that is why sanitization helps normalising the data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculating PLEC by bypassing completely OpenBabel #155

Calculating PLEC by bypassing completely OpenBabel #155

tevang commented Aug 4, 2021 •

edited

mwojcikowski commented Aug 5, 2021

tevang commented Aug 8, 2021

mwojcikowski commented Aug 10, 2021

tevang commented Aug 10, 2021

tevang commented Aug 16, 2021

mwojcikowski commented Aug 17, 2021

Calculating PLEC by bypassing completely OpenBabel #155

Calculating PLEC by bypassing completely OpenBabel #155

Comments

tevang commented Aug 4, 2021 • edited

mwojcikowski commented Aug 5, 2021

tevang commented Aug 8, 2021

mwojcikowski commented Aug 10, 2021

tevang commented Aug 10, 2021

tevang commented Aug 16, 2021

mwojcikowski commented Aug 17, 2021

tevang commented Aug 4, 2021 •

edited