Think through challenge design #12

davidlmobley · 2021-12-07T22:15:56Z

Migrating this comment rom @procacci in #9 here for better tracking:

My feelings on this, future (bCDs), and on past (CB8) host-guest
challenges are that the focus is too much on force fields while being
somehow unfaithful to the original commitment of the challenge. In my
understanding, that commitment was aimed at identifying the most
appropriate methodology in dealing with the SAMPLing of systems with
disparate time scales/complex collective variables and hence in
producing sufficiently reliable (and reproducible) binding free
energies. WP6 is a mostly rigid host that does not involve any
serious sampling issue. The very same can be said for the guest
molecules. This is at variance with the reality of ligand-protein
systems where both the host (the binding pocket) and the guest
(drug-like compounds with up to 9 rotatable bonds) are characterized
by a complex conformational landscape. Besides, the issue of the
protonation state of heavily charged WP6 is an extra complication that
can affect severely the prediction. If WP6 is in part protonated at ph
7.4, then the BFEs of the guest cations (12 out of 13) computed
assuming a fully deprotonated state for WP6 are expected to be
overestimated. On top of all this, electrostatic interactions have
certainly a prevailing role in shaping the affinity. So, despite all
these modelization challenges, it was quite a surprise for me to see
that all the MD-based methodologies performed decently, whether they
used openFF GAFF or AMOEBA.

I do think that testing FF is important but I also think that the
SAMPLing issue in BFE determination is still far from being solved for
protein-ligand systems, even in the GPU era. My hope is that in the
future challenges these two crucial issues (FF and SAMPLing) are
addressed in separate specialized sessions: i) test the FF (and any
other physical methods) for rigid host and guest challenges ii) test
MD-based methodologies for protein-ligand systems, requesting for all
participants to use only one (well-established) FF. The common set of
topological/parameter files for the most popular MD codes could be
provided by the organizers with little work.

Using a fancy and expensive FF (e.g. AMOEBA or QM-M) with a poor
sampling methodology in a system where sampling is important makes no
real sense in my view. By the same token, using a fancy sampling
methodology with a poor FF in systems where sampling is not an issue
makes no sense as well. I am quite convinced that disentangling the FF
issue (systematic errors) from the sampling issue (reproducibility)
will allow us to move forward more quickly.

Piero Procacci

I'll try and take this point-by-point for the record:

My feelings on this, future (bCDs), and on past (CB8) host-guest
challenges are that the focus is too much on force fields while being
somehow unfaithful to the original commitment of the challenge. In my
understanding, that commitment was aimed at identifying the most
appropriate methodology in dealing with the SAMPLing of systems with
disparate time scales/complex collective variables and hence in
producing sufficiently reliable (and reproducible) binding free
energies.

Honestly, the SAMPL challenges are not staged with a particular purpose in mind other than to (a) allow fair assessment of the state of the art in prospective challenges, and (b) to drive progress in the field. Whether that progress comes from force fields or from sampling methods is dependent on participation/innovation, not on the structure of the challenges necessarily.

That said, the SAMPL6 "SAMPLing" challenge did have a clear focus specifically on sampling methods. One thing on my list has been to stage another "SAMPLing" challenge as I see it important to focus some amount of attention directly on sampling issues (as opposed to overall accuracy, which can be the result of many factors) but I haven't had the bandwidth to orchestrate another such challenge recently, as it requires a great deal of engagement with the community. Are you interested in helping out with organizing one?

WP6 is a mostly rigid host that does not involve any
serious sampling issue. The very same can be said for the guest
molecules. This is at variance with the reality of ligand-protein
systems where both the host (the binding pocket) and the guest
(drug-like compounds with up to 9 rotatable bonds) are characterized
by a complex conformational landscape.

This seems like it's an argument for NOT using WP6 for a SAMPLing challenge, but not necessarily for not including WP6 in SAMPL, unless I'm missing something.

Besides, the issue of the
protonation state of heavily charged WP6 is an extra complication that
can affect severely the prediction. If WP6 is in part protonated at ph
7.4, then the BFEs of the guest cations (12 out of 13) computed
assuming a fully deprotonated state for WP6 are expected to be
overestimated.

I did just add some experimental data on WP6 protonation states in #11

I do think that testing FF is important but I also think that the
SAMPLing issue in BFE determination is still far from being solved for
protein-ligand systems, even in the GPU era. My hope is that in the
future challenges these two crucial issues (FF and SAMPLing) are
addressed in separate specialized sessions: i) test the FF (and any
other physical methods) for rigid host and guest challenges ii) test
MD-based methodologies for protein-ligand systems, requesting for all
participants to use only one (well-established) FF. The common set of
topological/parameter files for the most popular MD codes could be
provided by the organizers with little work.

Completely isolating FF from sampling is tricky, in my experience, since even "simple" systems often have some amount of sampling problems. However, as noted, one can pose a "SAMPLing" challenge (as in SAMPL6/as you suggest) to specifically focus on sampling issues. Note, however, such challenges do not necessarily require new data as in such cases the goal is to get the "right" answer (the force field gold standard) efficiently rather than to get the most accurate result.

It is probably time for another such challenge; let me know if you're interested in lending a hand with organization.

Using a fancy and expensive FF (e.g. AMOEBA or QM-M) with a poor
sampling methodology in a system where sampling is important makes no
real sense in my view. By the same token, using a fancy sampling
methodology with a poor FF in systems where sampling is not an issue
makes no sense as well. I am quite convinced that disentangling the FF
issue (systematic errors) from the sampling issue (reproducibility)
will allow us to move forward more quickly.

In general I agree with you that one needs to address both sampling and force field quality and if sampling is the limiting factor, force field quality won't fix it. That said, my role in running these challenge is... to make them available to the community and let them use the challenges to drive progress, not to decide who gets to use which methods and how. I would note that over the years, far fewer pure QM methods have participated in SAMPL host-guest challenges and far more methods have begun incorporating extensive sampling, as this has proven critical for decent performance. Yes, we still see more expensive force fields used -- but there are far fewer methods now which just do expensive QM calculations on host-guest complexes.

Anyway, let me know if you're interested in helping orchestrate another round of SAMPLing challenges, and what system(s) you think would be best suited for those.

procacci · 2021-12-11T14:31:16Z

Hi David,

I have seen the pKa data (by the way the file WP6_pKa_report.pdf does
not show up correctly). In my submission, as a justification for
computing the BFE with the -12 host, I used the statistical factor
formula pka(n) = pka(1) - log((n)/(13-n)) where the pKa(1) of 3.23
refers to the monobasic acid 2-(2,5-dimethylphenoxy) acetic acid (from
Scifinder). This yielded 5.71 as the largest pKa, in surprisingly good
agreement with the experimental data. I hence guess that there must be
some other reason for the systematic overestimation of the BFA in my
ranked submission. Probably electrostatics again plays a primary
role. I am saying this because only the AMOEBA/Ponder gave a negative
MSE among the MD submissions, while all the other subs (including mine)
seem to consistently overestimate the BFE. Paraquat (g13) is an
exception since I knew already that the BFE would have been underestimated
using the GAFF2 parameterization (see FSDAM results for g19 on Isaac
clips in SAMPL7). In any case, this challenge with -12 charges on the
host, like those of the calixarenes, put a very hard strain on the
electrostatic model.

Concerning the possibility of organizing a new SAMPLing
challenge specifically addressing the sampling issue and assessing
the efficiency and convergence as you did in the past SAMPLing, I am
available for helping as far as I can. Speaking of host-guest, I
recently came across two interesting macrocyclic systems recently
reported in the literature (see the attached file). Unlike the
relatively rigid WP6 CB8 or OA, these two hosts are characterized by
high conformational flexibility, making them suitable for achieving
good selectivity and affinity. For host1 (10.1002/cjoc.202000738),
the interconversion between conformers is fast (in the ns), and
binding for a given guest is specific and more efficient for just one or two host1
conformations (induced fit). In host2 the five conformers are separated by
large energetic barriers so that they can be even distinguished in NMR
( 10.1038/s41467-020-16534-9 ). host2 follows "conformational
selection", in the sense that binding affinity of the metastable
conformers can be quite different. Induced fit and conformational
selection are quite common in drug-receptor systems, and I think that
these two hosts could be an excellent platform for assessing the
precision/reproducibility of MD-based approaches. In the attached tgz,
for each of the hosts you will find two figures referring to a standard MD
and an HREM MD in vacuo at 300 K (with Nose). Conformations have been
examined by using two distance for host1 ( between oxygen atoms 118 130
and oxygen atoms 123 120), and two dihedral for host2 (involving the oxygen
atoms 181 196 165 194 and the oxygen atoms 172 193 188 195). The measurement
for these hosts systems were done in 1:1 acetonitrile:dicloroethane (for host1) and
in CD2CL2 (for host2).
These hosts are not soluble in water (see attached swissadme.pdf), but
the synthesis seems to be pretty straightforward and maybe they can be
modified (as they did with WP6) introducing some polar group (e.g. OH
or COOH) to enhance solubility if we feel that the water solvent is
essential for the challenge. On the other hand, simulation in CH2Cl2
of acetonitrile should pose no extra problems for MD-based free energy
calculations. I am looking forward to hearing your comments

Piero

hosts1-2.tar.gz

procacci · 2021-12-31T11:25:19Z

There was a mistake in the tar file. The host2.png was a copy of the host1.png. The attached png is the correct version
for host2 MD/HREM comparison.

davidlmobley · 2022-04-13T19:52:37Z

Hi, sorry for the very long delay on this. I'm in a bit of a rush but I'll try and take a couple of your questions:

Concerning the possibility of organizing a new SAMPLing
challenge specifically addressing the sampling issue and assessing
the efficiency and convergence as you did in the past SAMPLing, I am
available for helping as far as I can.

The hard part of this is, essentially, identifying a system we think people can converge with somewhat reasonable efficiency and then getting enough folks to agree to do it, as well as finding an approach to serve as a reference method and getting someone to run that. It's a major community initiative/a large cat-herding exercise, and I haven't had bandwidth to lead this. One route forward might be if you wanted to put together a draft of a proposed plan for such a challenge, such as in an editable Google Doc, I could circulate it to our e-mail list for feedback and then depending on the feedback we'd know how hard it would be to push this forward/herd the cats.

The hosts you mention sound interesting and like they share some similarity with the CBclip/TrimerTrip type systems that have appeared in past challenges, in that they are significantly flexible. It's interesting that the timescales are already known. Are you proposing those as the focus of a SAMPLing challenge? At some level I thought that the rihgt level of difficulty would be do essentially revisit something like the systems which were in the SAMPL6 SAMPLing challenge, as at that point the field still did not quite reach complete agreement as to the right answer for calculated binding free energies even with equivalent methods. Originally the plan had been to compare method efficiency by checking which converged to the known answer most rapidly, but in the end we were unable to do this since they didn't all converge to the same answer. Before going to systems with more difficult sampling I'd hope we could do an efficiency comparison in a case where we do all get the same answer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Think through challenge design #12

Think through challenge design #12

davidlmobley commented Dec 7, 2021

procacci commented Dec 11, 2021

procacci commented Dec 31, 2021

davidlmobley commented Apr 13, 2022

Think through challenge design #12

Think through challenge design #12

Comments

davidlmobley commented Dec 7, 2021

procacci commented Dec 11, 2021

procacci commented Dec 31, 2021

davidlmobley commented Apr 13, 2022