Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculating PLEC by bypassing completely OpenBabel #155

Open
tevang opened this issue Aug 4, 2021 · 6 comments
Open

Calculating PLEC by bypassing completely OpenBabel #155

tevang opened this issue Aug 4, 2021 · 6 comments

Comments

@tevang
Copy link

tevang commented Aug 4, 2021

Hi Maciek,

I want to create PLEC vectors without using OpenBabel at all. The first step is to load the receptor and ligand files, which by convenience I have them in a single PDB file. So according to your explanation on RDKit's mailing list, I extract the ligand's Mol like this:

In [11]: mol = list(oddt.toolkit.readfile('pdb', "complex_forscoring_frm.1.min.pdb", sanitize=False))[0]
In [15]: Chem.rdmolops.SplitMolByPDBResidues(mol.Mol)
Out[15]: 
{' CL': <rdkit.Chem.rdchem.Mol at 0x14a64b65a0f0>,
 'ALA': <rdkit.Chem.rdchem.Mol at 0x14a64b66c0f0>,
 'ARG': <rdkit.Chem.rdchem.Mol at 0x14a64b66cd50>,
 'ASN': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1570>,
 'ASP': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1d50>,
 'CYS': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1b10>,
 'GLN': <rdkit.Chem.rdchem.Mol at 0x14a64b6a14b0>,
 'GLU': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1f30>,
 'GLY': <rdkit.Chem.rdchem.Mol at 0x14a64b6a13f0>,
 'HID': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1870>,
 'HIE': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1330>,
 'ILE': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1c30>,
 'LEU': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1db0>,
 'LIG': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1b70>,
 'LYS': <rdkit.Chem.rdchem.Mol at 0x14a64b6a1ab0>,
 'MET': <rdkit.Chem.rdchem.Mol at 0x14a6840914b0>,
 'Na+': <rdkit.Chem.rdchem.Mol at 0x14a684091210>,
 'PHE': <rdkit.Chem.rdchem.Mol at 0x14a684091c30>,
 'PRO': <rdkit.Chem.rdchem.Mol at 0x14a6840913f0>,
 'SER': <rdkit.Chem.rdchem.Mol at 0x14a685941810>,
 'THR': <rdkit.Chem.rdchem.Mol at 0x14a64b662090>,
 'TRP': <rdkit.Chem.rdchem.Mol at 0x14a64b662930>,
 'TYR': <rdkit.Chem.rdchem.Mol at 0x14a64b6623f0>,
 'VAL': <rdkit.Chem.rdchem.Mol at 0x14a64b662db0>}
In [19]: ligMol = Chem.rdmolops.SplitMolByPDBResidues(mol.Mol)['LIG']

How can I extract the receptor's Mol object (everything but 'LIG' residue)? Is there a command to fuse all these individual RDKit Mol objects and reconstruct an intact receptor Mol object?

Thanks in advance.
Thomas

@mwojcikowski
Copy link
Contributor

Hi Thomas,

In oddt we have created additional module for such operations in RDKit in oddt.toolkits.extras.rdkit.fixer. Note it requires Chem.Mol, not oddt.Molecule.

from oddt.toolkits.extras.rdkit.fixer import ExtractPocketAndLigand
mol = Chem.MolFromPDBFile("complex_forscoring_frm.1.min.pdb", sanitize=False)
pocket, ligand = ExtractPocketAndLigand(mol, cutoff=12., ligand_residue='LIG')

Pocket and ligand are still Chem.Mol objects, which you can convert back to ODDT:

oddt_pocket = oddt.toolkit.Molecule(pocket)

If you wish to get wole protein you can set high cutoff.

def ExtractPocketAndLigand(mol, cutoff=12., expandResidues=True,
ligand_residue=None, ligand_residue_blacklist=None,
append_residues=None):
"""Function extracting a ligand (the largest HETATM residue) and the protein
pocket within certain cutoff. The selection of pocket atoms can be expanded
to contain whole residues. The single atom HETATM residues are attributed
to pocket (metals and waters)
Parameters
----------
mol: rdkit.Chem.rdchem.Mol
Molecule with a protein ligand complex
cutoff: float (default=12.)
Distance cutoff for the pocket atoms
expandResidues: bool (default=True)
Expand selection to whole residues within cutoff.
ligand_residue: string (default None)
Residue name which explicitly pint to a ligand(s).
ligand_residue_blacklist: array-like, optional (default None)
List of residues to ignore during ligand lookup.
append_residues: array-like, optional (default None)
List of residues to append to pocket, even if they are HETATM, such
as MSE, ATP, AMP, ADP, etc.
Returns
-------
pocket: rdkit.Chem.rdchem.RWMol
Pocket constructed of protein residues/atoms around ligand
ligand: rdkit.Chem.rdchem.RWMol
Largest HETATM residue contained in input molecule
"""

@tevang
Copy link
Author

tevang commented Aug 8, 2021

Hi Maciek,

I wrote the following function in line with your suggestion, but I get an error.

import oddt
from oddt.fingerprints import PLEC
from oddt.toolkits.extras.rdkit.fixer import ExtractPocketAndLigand
from rdkit import Chem


def calc_PLEC(complex_pdb, ligand_resname='LIG', cutoff=12.0, depth_ligand=2, depth_protein=4, distance_cutoff=4.5,
                        size=16384, count_bits=True, sparse=False, ignore_hoh=True):
    mol = Chem.MolFromPDBFile(complex_pdb, sanitize=False)
    pocket, ligand = ExtractPocketAndLigand(mol, cutoff=cutoff, ligand_residue=ligand_resname)
    oddt_protein = oddt.toolkit.Molecule(pocket)
    oddt_ligand = oddt.toolkit.Molecule(ligand)
    return PLEC(oddt_ligand, oddt_protein, depth_ligand=depth_ligand, depth_protein=depth_protein,
                    distance_cutoff=distance_cutoff,
                    size=size, count_bits=count_bits, sparse=sparse, ignore_hoh=ignore_hoh)

calc_PLEC('complex_forscoring_frm.1.min.pdb')

The input PDB file is this complex_forscoring_frm.1.min.pdb.gz and the error I get is the following:

RuntimeError: Pre-condition Violation
	RingInfo not initialized
	Violation occurred on line 66 in file Code/GraphMol/RingInfo.cpp
	Failed Expression: df_init
	RDKIT: 2019.03.4
	BOOST: 1_70

Do I need to upgrade RDKit or is it due to something else?

@mwojcikowski
Copy link
Contributor

@tevang It looks like you need to sanitize the molecule first. Or at least generate rings with Chem.FastFindRings(...)

@tevang
Copy link
Author

tevang commented Aug 10, 2021

@mwojcikowski I use the following workaround that - at least seemingly - does not require sanitization or ring generation. Does it produce correct PLEC fingerprints?

def split_complex_pdb(complex_pdb, ligand_resname='LIG'):
    protein_f = open(complex_pdb.replace('.pdb', '_prot.pdb'), 'w')
    ligand_f = open(complex_pdb.replace('.pdb', '_lig.pdb'), 'w')
    with open(complex_pdb, 'r') as f:
        for line in f:
            if line[17:20] == ligand_resname:
                ligand_f.write(line)
            else:
                protein_f.write(line)
    protein_f.close()
    ligand_f.close()
    return protein_f.name, ligand_f.name


protein_pdb, ligand_pdb = split_complex_pdb(complex_pdb, ligand_resname)
oddt_protein = oddt.toolkit.Molecule(list(oddt.toolkit.readfile('pdb', protein_pdb, sanitize=False))[0])
oddt_ligand = oddt.toolkit.Molecule(list(oddt.toolkit.readfile('pdb', ligand_pdb, sanitize=False))[0])
plec_vector = PLEC(oddt_ligand, oddt_protein, depth_ligand=2, depth_protein=4, distance_cutoff=4.5,
                    size=16384, count_bits=True, sparse=False, ignore_hoh=True)

@tevang
Copy link
Author

tevang commented Aug 16, 2021

@mwojcikowski essentially my query is whether I am getting correct PLEC vectors without sanitizing the ligands or generating rings with FastFindRings() as you stated. Otherwise, it works.

@mwojcikowski
Copy link
Contributor

@tevang I would say that sanitization is highly recommended. That said being consistent is the key for most use cases, so you might get away with it. Keep in mind that the input is also important (e.g. kekulization), that is why sanitization helps normalising the data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants