-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adaptive-depth anderson #939
base: master
Are you sure you want to change the base?
Conversation
de1f480
to
a9727e1
Compare
From a few experiments, just limiting the window size using adaptive depth anderson is rarely a good idea as the obtained sizes can get too large to be practical. Thus we for sure need both a maximal For now I've left |
Yeah I think mmax is always going to make sense. I have to reread the paper but I think I remember that it's just an alternative way to monitor condition number (both are essentially measuring the size of the extrapolation coefficients). I think the version of the code in master is basically sound. @msdupuy might know better. One thing I wanted to explore was to annotate each residual vector with an error bar, coming additively both from the approximate eigensolve (which we can try to estimate from the eigenvalue residuals) and from some estimate of the nonlinear effects (which we maybe try to model as an isotropic quadratic), and take that into account in the least squares. I never got around to it, but do ping me if you're interested in exploring this kind of things further. |
So you suggest we don't do this PR at all ? I mean what I like about it is that it catches the issues before computing the considerably more costly dot products and it overall seems to do a better job in shrinking the history when needed (from the few experiments I did).
Hmm. Are you sure that would be worth it, given that it probably turns out to be quite crude (especially closer to convergence) ? Or was your thinking to only employ this for the initial phase to avoid Anderson putting too much trust into a potentially sketchy result ? For that it could be useful, given that the bounds are not completely random. How do you want to get the insight on the isotropic quadratic, just using the points the SCF anyway follows ? Happy to discuss at some point. |
5330a6d
to
a2b18e9
Compare
@antoine-levitt Thoughts on the above ? |
# We need to solve 0 = M' Pfxₙ + M'M βs <=> βs = - (M'M)⁻¹ M' Pfxₙ | ||
|
||
# Ensure the condition number of M stays below maxcond, else prune the history | ||
# TODO This is too be tested, but in theory the adaptive-depth DIIS mechanism |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
|
||
# Ensure the condition number of M stays below maxcond, else prune the history | ||
# TODO This is too be tested, but in theory the adaptive-depth DIIS mechanism | ||
# we implement, should ensure the condition number to stay bounded as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no comma
# we implement, should ensure the condition number to stay bounded as well. | ||
Mfac = qr(M) | ||
while size(M, 2) > 1 && cond(Mfac.R) > anderson.maxcond | ||
M = M[:, 2:end] # Drop oldest entry in history |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
view?
Merge the two mechanisms? Ie drop vectors with biggest errors (instead of oldest ones) until conditioning decreases below the acceptable threshold. For safety don't drop say the last iterate and the one before. |
Are they really expensive? What I liked in the current version is its simplicity, but if dropping based on residual norm is better then sure go ahead. Also might be better to monitor the norm of the extrapolation coefficients rather than the conditioning. Idk there's so much possibilities it scares me. |
The point is to have something a bit more principled than the binary keep/discard mechanism; instead of having a weight of 1 (keep) or 0 (discard), have something in between. But it's pretty hazy how to do this properly, so more research is needed (tm). |
Superseeds #719.
Todo: