Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to standardize some PubChem molecules #39

Open
VladislavChernykh opened this issue Mar 24, 2021 · 1 comment
Open

Unable to standardize some PubChem molecules #39

VladislavChernykh opened this issue Mar 24, 2021 · 1 comment

Comments

@VladislavChernykh
Copy link

Hello,

I was using molvs standardizer on PubChem molecules and found out several molecules that cannot be standardized:

  1. SMILES: CC(S(=O)CC1=CC=C(C=C1)C(S(=O)CC2=CC=C(C=C2)C(S(=O)CC3=CC=C(C=C3)C(S(=O)C4=CC=C(C=C4)Br)S(=O)C5=CC=C(C=C5)Br)S(=O)CC6=CC=C(C=C6)C(S(=O)C7=CC=C(C=C7)Br)S(=O)C8=CC=C(C=C8)Br)S(=O)CC9=CC=C(C=C9)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br

Link: https://pubchem.ncbi.nlm.nih.gov/compound/59827358

  1. SMILES: CC1=CC=C(C=C1)C(S(=O)CC2=CC=C(C=C2)C(S(=O)CC3=CC=C(C=C3)C(S(=O)CC4=CC=C(C=C4)C(S(=O)C5=CC=C(C=C5)Br)S(=O)C6=CC=C(C=C6)Br)S(=O)CC7=CC=C(C=C7)C(S(=O)C8=CC=C(C=C8)Br)S(=O)C9=CC=C(C=C9)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br

Link: https://pubchem.ncbi.nlm.nih.gov/compound/59827349

Code to reproduce:

from rdkit import Chem
from molvs import Standardizer

smiles = "CC1=CC=C(C=C1)C(S(=O)CC2=CC=C(C=C2)C(S(=O)CC3=CC=C(C=C3)C(S(=O)CC4=CC=C(C=C4)C(S(=O)C5=CC=C(C=C5)Br)S(=O)C6=CC=C(C=C6)Br)S(=O)CC7=CC=C(C=C7)C(S(=O)C8=CC=C(C=C8)Br)S(=O)C9=CC=C(C=C9)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br)S(=O)CC1=CC=C(C=C1)C(S(=O)C1=CC=C(C=C1)Br)S(=O)C1=CC=C(C=C1)Br"
mol = Chem.MolFromSmiles(smiles)
res = Standardizer().standardize(mol)

It seems that the flow goes into an infinite loop in function _apply_transform() (normalize.py). After 10 minutes of transformation still got no result.

Thanks,
Vladislav

@UnixJunkie
Copy link

It might be nice to do this on the whole pubchem, to flag all erroneous molecules at once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants