Test & improve metrics for removing low density conformations #346

stephaniewankowicz · 2023-06-08T23:09:27Z

Currently, in qFit, removing conformers below a certain density level is defined as:
if any voxel in the conformed has a density intensity <0.3 e−1 Å−3, the conformed will be removed.

This is default turned off.

We should include something like this, but it should look at if somewhere around >=5 atoms lack support (or somewhere around there).

This should be tested with different atomic cutoff values

stephaniewankowicz · 2023-11-27T19:07:51Z

I test removing low density conformers and putting a qp test in angle. Both removed almost all of the 'too many conformers, reverting'. However, the qp in angle kinda blew up the R-free while the remove conformer even if one atom is below a certain threshold tended to increase the number of residues where we could not find a solution. I test this with removing if 1, 2, or 3 atoms are below the cutoff value. All of these removed the angle issue but increased the number of residues that we could not find a good conformer. I am going to try with 4 or 5 atoms as this should still eliminate many aromatic conformations.

blake-riley · 2023-11-28T17:07:24Z

OK, perhaps this comment is a little broader than this specific issue --- lmk if I should create a new issue to track it?

Problem: overfitting / trying to fit against an "oversampled" model
I've seen a bunch of "too many conformers" in which there are over 2000 conformers (and sometimes over 10000!)
In these circumstances, we know in advance that we will be trying to find a best fit for 2000 conformers to ~1500 voxels (or so).

To my ears: that's at best an overfit QP solution (more parameters than datapoints), at worst an unsolvable QP.
As you highlighted in #378, and as you're trying to address here in this issue (#346) the change to the angle sampling will make this yet more common.

Suggestions

An interim measure: qFit should emit a logger.warning from qfit._BaseQFit._solve_qp() if it notices that it has been asked to solve an "over-fit" situation (more conformers, so more ω than voxel datapoints --- i.e. self._models.shape[0] > self._target.shape[0]).
Ultimately, I think that if qFit notices it's attempting to solve an "over-fit" situation (more conformers than voxel datapoints), it would be a good idea to reduce the sampling, and do a more coarse sampling than what the user requested.
Already, we try to backtrack, and I see messages like:
Too many conformers generated (29720). Reverting to a previous iteration of degrees of freedom: item 0. n_coords: [29720]
That's ... not really reverting, tbh. Is this another bug?
This feels like the sampling code might need a pretty deep rewrite, but I think it would be good to have this in place if you're gonna get backbone sampling working? (I'm excited for this!)

stephaniewankowicz mentioned this issue Nov 22, 2023

New angle sampling causing too many conformers #378

Closed

stephaniewankowicz self-assigned this Nov 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test & improve metrics for removing low density conformations #346

Test & improve metrics for removing low density conformations #346

stephaniewankowicz commented Jun 8, 2023

stephaniewankowicz commented Nov 27, 2023

blake-riley commented Nov 28, 2023

Test & improve metrics for removing low density conformations #346

Test & improve metrics for removing low density conformations #346

Comments

stephaniewankowicz commented Jun 8, 2023

stephaniewankowicz commented Nov 27, 2023

blake-riley commented Nov 28, 2023