Skip to content

Latest commit

 

History

History
79 lines (55 loc) · 7.8 KB

README.md

File metadata and controls

79 lines (55 loc) · 7.8 KB

The SAMPL8 physical properties challenge

We recently finalized work with GSK on data collection for a physical properties challenge. The data recently cleared legal review at GSK and challenge details and input files are now available. The challenge will include pKa prediction as well as logD between (diverse) organic phases.

For details on SAMPL8 physical properties dataset collection, refer to this GCC/EuroSAMPL talk by Aakankschit Nandkeolyar and Matthew Bahr. Full details of the experiments, along with the results of the measurements, are available in a paper draft which will be submitted after the challenge closes.

Overview

We have collected pKa data for 23 diverse compounds, along with pH-dependent solubility (which was used to determine pKa). The pKa data will form the basis for an initial pKa challenge, followed by release of the pKa data.

We also measured logD for 11 of these compounds for distribution between different phases: : water-octanol, water-cyclohexane, water-ethyl acetate, water-heptane, water-MEK, water-TBME, and cyclohexane-DMF. Not all combinations of distribution coefficient are available because of compound solubility in the different phases. The total number of data points/combinations of (compound)x(phase identities) is between 40 and 50. These logD values will form the basis for a logD challenge which will run after the pKa challenge.

We are planning on a deadline of Aug. 3, 2021 for the pKa challenge and Aug. 25, 2021 for the logD challenge. We will release pKa values immediately upon the close of the pKa challenge to allow these to be used in logD predictions if desired.

The general format of the challenge will follow that of SAMPL7, so refer to the SAMPL7 physical properties overview paper for details. Challenge instructions are also being made available here in the relevant subdirectories.

A view of the compounds

23 SAMPL8 molecules

Fig 1. SAMPL8 Challenge molecules.

The pKa challenge

The pKa challenge involves predicting pKa values for SAMPL8 compounds SAMPL8-1 through SAMPL8-23 (except that for SAMPL8-11 and 13 we were only able to measure pH-dependent solubility but no pKa). pH-dependent solubility measurements are also available so please reach out if you are interested in predicting these. Preliminary details of the experiments are available in this talk.

As in the SAMPL7 physical properties challenge, we will be collecting pKa values predicted as relative free energies to transition between microstates relative to a reference microstate for each compound; these can be estimated from predicted microstate populations. In some cases, compounds had multiple pKa values, and these are handled within the same framework. Refer to our SAMPL7 report.

In some cases, compounds have multiple measured pKa values; submissions for these cases should still follow the same format. Details and submission file format will be posted shortly.

The logD challenge

The logD challenge involves distribution of each compound between phases in a series of biphasic systems. For this to occur, each compound must have some solubility in each phase, and we only included compounds with measurable pKa, which limited the number of possible compounds.

Thus, The logD challenge covers compounds for which pKa values were measured which were adequately soluble in both solvents and partitioned adequately into the seven combinations of biphasic systems considered.

Our full dataset includes partitioning for these biphasic systems:

  • octanol-water
  • cyclohexane-water
  • ethyl acetate-water
  • heptane-water
  • MEK-water
  • TBME-water
  • cyclohexane-DMF

In all cases water was Britton-Robinson buffer from Ricca. The pH used was pH 3 or pH 8 depending on the pKa of the compound, and will be specified when the log D challenge is launched (e.g. for one compound water might have been at pH 3 and for another, at pH 8). (Update 2021-08-20: pH 8 was used for SAMPL8-1, 3, 5 and 6; all other compounds were done at pH 3.)

For our compounds, we have measurements for these combinations of solute and solvent system:

octanol-water cyclohexane-water ethyl acetate-water heptane-water MEK-water TBME-water cyclohexane-DMF
SAMPL8-1 x
SAMPL8-3 x x x x x
SAMPL8-5 x
SAMPL8-6 x x
SAMPL8-7 x x x x x
SAMPL8-9 x x x x
SAMPL8-10 x x
SAMPL8-12 x x x x
SAMPL8-14 x x x x
SAMPL8-16 x
SAMPL8-17 x x x x x

What's here?

  • A Powerpoint file (and PDF thereof) from GSK giving the identity of the compounds under consideration
  • Submission formats
  • Challenge instructions for the challenges (pKa, logD)
  • Submission links:

Manifest

  • source_data/: Files provided by GSK
  • SAMPL8_molecule_ID_and_SMILES.csv: A .CSV file containing SAMPL8 challenge molecule IDs and isomeric SMILES. SMILES were provided by GSK.
  • microstates/: This directory currently contains molecules in Tripos MOL2 (.mol2), SDF (.sdf), and PDB (.pdb) file format (generated from the SMILES in SAMPL8_molecule_ID_and_SMILES.csv), as well as enumerated microstates for each molecule. Optional additional microstates (from Stefan Kast and Nicolas Tielker) were added 2021-08-03 for molecules SAMPL8-1, SAMPL8-3, SAMPL8-12, SAMPL8-14, SAMPL8-21, and SAMPL8-22; as these were added late, they use the "_dortmundYYY" extension in the SAMPL8-XX_microstates.csv files.
  • images/: Folder containing images related to this challenge in various formats.
  • pKa: This folder contains instructions, a submission template and the analysis for the pKa challenge. Also contains challenge input files in .CSV format with relative microstate free energies at a reference pH of 0. This folder contains instructions and a submission template for the pKa challenge.
  • logD: logD challenge details and instructions. Also contains the results of the analysis of logD data.