Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TautomerCanonicalizer gives unexpected/forbidden form of phosphoric acid #20

Open
benbowen opened this issue Feb 8, 2018 · 3 comments
Labels

Comments

@benbowen
Copy link

benbowen commented Feb 8, 2018

I'm converting all the molecules in my database to canonical-tautomers and noticed that things like NADH looked weird. You can see it most plainly for phosphoric acid. I didn't expect the Hydrogen on the phosphorous. Is this the correct/expected behavior?

from rdkit import Chem
from rdkit.Chem import Draw
from molvs.tautomer import TautomerCanonicalizer

original_smiles = 'OP(=O)(O)O'

original_mol = Chem.MolFromSmiles(original_smiles)
tautomerized_mol = TautomerCanonicalizer().canonicalize(original_mol)

Draw.MolsToGridImage([original_mol,tautomerized_mol],
                     molsPerRow=3,subImgSize=(200,200),
                     legends=['original','tautomer'])

image

@benbowen
Copy link
Author

benbowen commented Feb 8, 2018

NADH looks like this

original_smiles = 'NC(=O)C1=CN([C@@H]2O[C@H](COP(=O)(O)OP(=O)(O)OC[C@H]3O[C@@H](N4C=NC5=C4N=CN=C5N)[C@H](O)[C@@H]3O)[C@@H](O)[C@H]2O)C=CC1'

original_mol = Chem.MolFromSmiles(original_smiles)
tautomerized_mol = TautomerCanonicalizer().canonicalize(original_mol)

Draw.MolsToGridImage([original_mol,tautomerized_mol],
                     molsPerRow=1,subImgSize=(600,300),
                     legends=['original','tautomer'])

image

@mcs07
Copy link
Owner

mcs07 commented Feb 9, 2018

I think this is caused by the phosphonic acid rules: https://github.com/mcs07/MolVS/blob/master/molvs/tautomer.py#L130

It can probably be fixed by making the SMARTS pattern more strict to match only the intended target:
https://en.wikipedia.org/wiki/Phosphorous_acid

@mcs07 mcs07 added the bug label Feb 9, 2018
@benbowen
Copy link
Author

benbowen commented Feb 9, 2018

You are correct, removing that rule stops that moiety from being modified. When you say, "more strict", you think specify an explicit number of bonds on the Phosphorous in the SMARTS pattern?

Why does rdkit allow 7 bonds on the phosphorous? Rdkit is a vast package, but looking at the definition of Phosphorous, it has max bonds of 5.

If I do SantizeMol, the hydrogen stays put. When I paste the structure into ChemDraw, its not valid.

gjgetzinger added a commit to gjgetzinger/MolVS that referenced this issue Apr 16, 2020
Updates SMARTS definitions for phosphinic acids. Requires 3 explicit (X3) and 3 total (D3) connections for tautomerizing phosphinic acids. New behavior properly handles compounds with 4 connections (e.g., phosphates, phosphonic acids). 

```python
from rdkit import Chem
from molvs.tautomer import TautomerCanonicalizer
import pandas as pd 

my_transforms = (
  TautomerTransform('phosphonic acid f', '[OH]-[PD3X3H0]', bonds='='),
  TautomerTransform('phosphonic acid r', '[PD3X3H1]=[O]', bonds='-')
)

cpds = ['methylphosphinic acid','methylphosphonous acid','methylphosphonic acid','NADPH']
smiles = ['CP(=O)O','CP(O)O','CP(=O)(O)O','NC(=O)C1=CN([C@@h]2O[C@H](COP(=O)(O)OP(=O)(O)OC[C@H]3O[C@@h](N4C=NC5=C4N=CN=C5N)[C@H](O)[C@@h]3O)[C@@h](O)[C@H]2O)C=CC1']
mols = [Chem.MolFromSmiles(smi) for smi in smiles]
can_taut = [TautomerCanonicalizer(transforms=my_transforms).canonicalize(mol) for mol in mols]
smiles_taut = [Chem.MolToSmiles(mol) for mol in can_taut]

df = pd.DataFrame({'cpd':cpds,'smi':smiles,'taut_smi':smiles_taut})

	cpd	smi	taut_smi
0	methylphosphinic acid	CP(=O)O	C[PH](=O)O
1	methylphosphonous acid	CP(O)O	C[PH](=O)O
2	methylphosphonic acid	CP(=O)(O)O	CP(=O)(O)O
3	NADPH	NC(=O)C1=CN([C@@h]2O[C@H](COP(=O)(O)OP(=O)(O)OC[C@H]3O[C@@h](N4C=NC5=C4N=CN=C5N)[C@H](O)[C@@h]3O)[C@@h](O)[C@H]2O)C=CC1	NC(=O)C1=CN([C@@h]2O[C@H](COP(=O)(O)OP(=O)(O)OC[C@H]3O[C@@h](n4cnc5c(N)ncnc54)[C@H](O)[C@@h]3O)[C@@h](O)[C@H]2O)C=CC1
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants