Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out best practice way to take in smiles from standard in #67

Open
mikemhenry opened this issue Jun 29, 2021 · 6 comments
Open

Figure out best practice way to take in smiles from standard in #67

mikemhenry opened this issue Jun 29, 2021 · 6 comments

Comments

@mikemhenry
Copy link
Contributor

We have issues when using containers that have a conda env as an entry point:

# works by adding white space to end of SMILES string
docker run -it --rm -v $(pwd):/data adv -r "inputs/4w51-cryo.pdb" -c -32.355 7.263 2.207 -b 14 14 14 -e 50 -s "CC(C)Cc1ccccc1 "
# works but SMILES may use backslashes
docker run -it --rm -v $(pwd):/data adv -r "inputs/4w51-cryo.pdb" -c -32.355 7.263 2.207 -b 14 14 14 -e 50 -s "CC\(C\)Cc1ccccc1"
# doesn't work, causes a parse error for openeye 
docker run -it --rm -v $(pwd):/data adv -r "inputs/4w51-cryo.pdb" -c -32.355 7.263 2.207 -b 14 14 14 -e 50 -s "\"CC\(C\)Cc1ccccc1\""
Warning: Problem parsing SMILES:
Warning: "CC\(C\)Cc1ccccc1"
Warning: ^

We need to figure out a robust way to handle this.

List of test smiles cases:

CSC1=CC=C(C=C1)[N+][O-]
CC1=CC=C(C=C1)N=[N+]=[N-]
CC#Cc1ccccc1
[N-]=[N+]=NC1=CC=CC=C1
c1noc2ccccc12
C1=CC(=C(C(=C1)F)CBr)F
C#Cc1ccccc1
[N-]=[N+]=NC1=CC=CC=C1
C1=CC=C(C(=C1)Cl)Cl
CCBr
CC1=CC(=CC=C1)CN=[N+]=[N-]
CC1=CC(=CC=C1)CN=[N+]=[N-]
SCCCc1ccccc1
CC/C=C\\CCOC=O
@megosato
Copy link
Collaborator

Here is the exact error that occurs:

$ docker run -it --rm -v $(pwd):/data adv -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14 -e 50 -s "CC(C)Cc1ccccc1"
ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['dock', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14', '-e', '50', '-s', 'CC(C)Cc1ccccc1']' command failed.  (See above for error)
/opt/conda/envs/ADVenv/.tmp776pf1ux: line 3: syntax error: unexpected "("

@robbason
Copy link
Collaborator

robbason commented Jun 29, 2021 via email

@mikemhenry
Copy link
Contributor Author

mikemhenry commented Jun 29, 2021

This is how they are doing it (I've linked the line throwing the error):
https://github.com/conda/conda/blob/master/conda/cli/main_run.py#L33

@megosato
Copy link
Collaborator

We should try calling this from python as opposed to the command line to see how it works.

@robbason it works fine if you run using a call to python from the command line python

# CALLED USING DOCKER RUN
oedock % docker run -it --rm -v $(pwd):/data oedock -r /data/sEH-apo.pdb -s "C1=CC=C(C(=C1)Cl)Cl" -b -5 -5 -5 5 5 5
ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['oedock', '-r', '/data/sEH-apo.pdb', '-s', 'C1=CC=C(C(=C1)Cl)Cl', '-b', '-5', '-5', '-5', '5', '5', '5']' command failed.  (See above for error)
/opt/conda/envs/oepy37/.tmpicnzha6z: line 3: syntax error: unexpected "("

# CALLED FROM COMMAND LINE
oedock % python oedock_process.py -r sEH-apo.pdb -s "C1=CC=C(C(=C1)Cl)Cl" -b -5 -5 -5 5 5 5 
score: 1.8279122114181519	type: <class 'float'>

@megosato
Copy link
Collaborator

megosato commented Jun 29, 2021

Just so this information is consolidated here:
The issue seems to only be with docker containers that inherit from miniconda. The rdkit logp container I made which inherits from a mcs07/rdkit:latest rather than continuumio/miniconda3:4.9.2-alpine handles the quoted smile strings properly:

predict-rdkitlogp % docker run -it --rm rdlogp "C1=CC=C(C(=C1)Cl)Cl"
2.9934000000000003

docker run command is tokenized below

"CCc1ccc(C)cc1"  - No special treatment (error)
/opt/app # conda run -n ADVenv dock -s "CCc1ccc(C)cc1" -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'CCc1ccc(C)cc1', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']

"CCc1ccc\(C\)cc1" - escape character for () adds an extra escape character (works and doesn't seem to cause a change in structure)
conda run -n ADVenv dock -s "CCc1ccc\(C\)cc1" -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'CCc1ccc\\(C\\)cc1', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']

"\"CCc1ccc(C)cc1\"" - escaped quotations not evaluated into string (error)
conda run -n ADVenv dock -s "\"CCc1ccc(C)cc1\"" -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', '"CCc1ccc(C)cc1"', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']

"CCc1ccc(C)cc1 " - extra white space character tacked on (works)
conda run -n ADVenv dock -s "CCc1ccc(C)cc1 " -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'CCc1ccc(C)cc1 ', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']
"C1=CC=C(C(=C1)Cl)Cl" - No special treatment (error)
/opt/app # conda run -n ADVenv dock -s "C1=CC=C(C(=C1)Cl)Cl" -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'C1=CC=C(C(=C1)Cl)Cl', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']
ERROR conda.cli.main_run:execute(36): Subprocess for 'conda run ['dock', '-s', 'C1=CC=C(C(=C1)Cl)Cl', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']' command failed.  (See above for error)
/opt/conda/envs/ADVenv/.tmpxat7qbd8: line 3: syntax error: unexpected "("

"C1=CC=C\(C\(=C1\)Cl\)Cl"  - escape character for () adds an extra escape character (works and doesn't seem to cause a change in structure)
/opt/app # conda run -n ADVenv dock -s "C1=CC=C\(C\(=C1\)Cl\)Cl" -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'C1=CC=C\\(C\\(=C1\\)Cl\\)Cl', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']

"C1=CC=C(C(=C1)Cl)Cl " - extra white space character tacked on (works)
conda run -n ADVenv dock -s "C1=CC=C(C(=C1)Cl)Cl " -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'C1=CC=C(C(=C1)Cl)Cl ', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']
"CCC\(=O\)/C\(=C\(/F\)\Cl\)/F" - escape character for () adds an extra escape character (works and doesn't seem to cause a change in structure)
conda run -n ADVenv dock -s "CCC\(=O\)/C\(=C\(/F\)\Cl\)/F" -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'CCC\\(=O\\)/C\\(=C\\(/F\\)\\Cl\\)/F', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']
"CC/C=C\\CCOC=O" - No special treatment (error maybe because it treats \ as an escape character)
conda run -n ADVenv dock -s "CC/C=C\\CCOC=O" -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'CC/C=C\\CCOC=O', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']
ERROR conda.cli.main_run:execute(36): Subprocess for 'conda run ['dock', '-s', 'CC/C=C\\CCOC=O', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']' command failed.  (See above for error)
Warning: : Failed due to unspecified stereochemistry

"CC/C=C\\CCOC=O " - white space added to end (works)
conda run -n ADVenv dock -s "CC/C=C\\CCOC=O " -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'CC/C=C\\CCOC=O ', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']

"CC/C=C\\\\CCOC=O" - esc characters added (works and seems like structure is correct)
/opt/app # conda run -n ADVenv dock -s "CC/C=C\\\\CCOC=O" -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'CC/C=C\\\\CCOC=O', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']

@mikemhenry
Copy link
Contributor Author

Something else we can try is using the python API for conda
https://docs.conda.io/projects/conda/en/latest/api/python_api.html

Then we can try using that to pass a command into a conda env, which should help to get around the way they are using subprocess.

I was also thinking that we could provide a template setup.py and cmd.py that would show an example of importing an external package, and then running a command in that conda env. That could be a way to make a more standard way for people to make their containers. As a bonus, if we do this per-challenge, we could include in the template the command arguments we expect them to handle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants