Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test & improve metrics for removing low density conformations #346

Open
stephaniewankowicz opened this issue Jun 8, 2023 · 2 comments
Open
Assignees

Comments

@stephaniewankowicz
Copy link
Collaborator

Currently, in qFit, removing conformers below a certain density level is defined as:
if any voxel in the conformed has a density intensity <0.3 e−1 Å−3, the conformed will be removed.

This is default turned off.

We should include something like this, but it should look at if somewhere around >=5 atoms lack support (or somewhere around there).

This should be tested with different atomic cutoff values

@stephaniewankowicz
Copy link
Collaborator Author

I test removing low density conformers and putting a qp test in angle. Both removed almost all of the 'too many conformers, reverting'. However, the qp in angle kinda blew up the R-free while the remove conformer even if one atom is below a certain threshold tended to increase the number of residues where we could not find a solution. I test this with removing if 1, 2, or 3 atoms are below the cutoff value. All of these removed the angle issue but increased the number of residues that we could not find a good conformer. I am going to try with 4 or 5 atoms as this should still eliminate many aromatic conformations.

@blake-riley
Copy link
Contributor

OK, perhaps this comment is a little broader than this specific issue --- lmk if I should create a new issue to track it?

Problem: overfitting / trying to fit against an "oversampled" model
I've seen a bunch of "too many conformers" in which there are over 2000 conformers (and sometimes over 10000!)
In these circumstances, we know in advance that we will be trying to find a best fit for 2000 conformers to ~1500 voxels (or so).

To my ears: that's at best an overfit QP solution (more parameters than datapoints), at worst an unsolvable QP.
As you highlighted in #378, and as you're trying to address here in this issue (#346) the change to the angle sampling will make this yet more common.

Suggestions

  1. An interim measure: qFit should emit a logger.warning from qfit._BaseQFit._solve_qp() if it notices that it has been asked to solve an "over-fit" situation (more conformers, so more ω than voxel datapoints --- i.e. self._models.shape[0] > self._target.shape[0]).
  2. Ultimately, I think that if qFit notices it's attempting to solve an "over-fit" situation (more conformers than voxel datapoints), it would be a good idea to reduce the sampling, and do a more coarse sampling than what the user requested.
    Already, we try to backtrack, and I see messages like:
    Too many conformers generated (29720). Reverting to a previous iteration of degrees of freedom: item 0. n_coords: [29720]
    That's ... not really reverting, tbh. Is this another bug?
    This feels like the sampling code might need a pretty deep rewrite, but I think it would be good to have this in place if you're gonna get backbone sampling working? (I'm excited for this!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants