Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider scale-invariant convergence criterion for eigenproblems? #60

Open
danielwe opened this issue Aug 23, 2022 · 7 comments
Open

Consider scale-invariant convergence criterion for eigenproblems? #60

danielwe opened this issue Aug 23, 2022 · 7 comments

Comments

@danielwe
Copy link

danielwe commented Aug 23, 2022

KrylovKit.jl uses an absolute convergence criterion norm(residual) < tol. Some packages, like ArnoldiMethod.jl and Arpack.jl, use a slightly different criterion norm(residual) < tol * |λ| where λ is the eigenvalue associated with the residual vector (modulo some absolute tolerance for when |λ| ≈ 0). To me, this makes a lot of sense, as it makes convergence of an eigenpair invariant to the overall scale of A, inversion of A, and the size of A on the eigenspace in question. I suppose you could think of it as setting a relative tolerance for eigenvalues, or equivalently an absolute tolerance for eigenvector directions*. It's particularly relevant to my current application, where I'm solving for the smallest eigenvalue of Hermitian positive definite matrices and the numbers can get quite small (or large if I invert).

Is modifying the convergence criterion something you might consider for KrylovKit.jl?


*The intuition I'm describing here applies to symmetric/Hermitian problems where Schur vectors and eigenvectors are the same, so norm(residual) = ‖Ax - λx‖₂ with normalized x. Not sure exactly how the intuition translates to Schur vector residuals.

@danielwe
Copy link
Author

danielwe commented Aug 24, 2022

I'm not an expert in numerical linear algebra btw., so let me rephrase the question: Is there a reason you prefer the current convergence criterion over a scale-invariant one?

@Jutho
Copy link
Owner

Jutho commented Aug 29, 2022

Thanks for this question. I don't think there is a well established reason for preferring an absolute tolerance as I currently do. The only problem is indeed with eigenvalues close to zero, as you indicate. Having to special case this, and then anyway rely on an absolute tolerance for this special case, together with a relative tolerance for the 'normal' case, would make the code more complicated.

I admit that in many use cases where I use eigsolve, I have some information about the linear operator by which I know what the proper scale i, and which absolute tolerance I want to impose.

Another reason for the current implementation is the analogy with linsolve, where all the code internally also uses an absolute tolerance. In that case, it is easier to convert a relative tolerance to an absolute tolerance, as it amounts to multiplying by the norm of the right hand side.

@danielwe
Copy link
Author

Thanks for responding! I understand the concern about code complexity. I'm not sure this needs to add that much complexity, though: Arpack.jl's criterion is resnorm < tol * max(eps^(2/3), |λ|), and ArnoldiMethod.jl's is resnorm < max(ε‖H‖, tol * |λ|) where ‖H‖ is the Frobenius norm of some intermediate matrix in the iterations (you probably understand this better than me). That is, both simply hardcode the abstol parameter at the point of evaluating the criterion, although they differ in how it works.

Then there's of course the option of making both reltol and abstol customizable and using resnorm < max(abstol, reltol * |λ|). That certainly adds complexity to both the code and the interface.

I haven't thought about similarities and differences to linsolve, so can't comment on that.


Adding a bit of rambling about what I'm doing and thinking that makes me care about this issue:

One thing I've been trying to do is to benchmark different packages and algorithms for finding the minimum eigenvalue of Hermitian positive definite operators of varying sparsity. However, it's a little tricky to make everything comparable when the convergence criteria are different. Right now my solution is to only test with matrices where the smallest eigenvalue is close to 1.

However, for my actual use case, I need reliable results as the spectral gap shrinks through orders of magnitude. When solving directly for the smallest eigenvalue I can always solve in two stages, first with tol=myreltol to get in the ballpark, and then with tol=myreltol * |λ| with λ from the previous solution, and initializing with the vector from the previous solution, to get a more accurate solution.

However, it usually seems preferable to invert, in which case the problem is that the tolerance is too strict rather than too lenient. That's harder to work around.

ArnoldiMethod.jl seems to work very well, so it's not at all critical for my work to get this change into KrylovKit.jl. The only drawback there is it doesn't implement the Lanczos algorithm for Hermitian matrices, but that's not really a problem.

@Jutho
Copy link
Owner

Jutho commented Oct 28, 2022

I am still not fully convinced. I prefer that the algorithm uses some absolute tolerance. If you want something that scales with the overall scaling of A, then the absolute tolerance that is specified can easily be chosen to depend on A. However, I don't see a good motivation for having a relative tolerance as proposed here, which depends on the eigenvalue. While indeed it means that it scales with an overall scaling in A, it also means that different eigenvalues of the same matrix A are converged with different tolerances? I don't see a good motivation for that.

Letting the tolerance only depend on some norm estimate for A (for example, via the partial matrix projected into the Krylov subspace) would resolve that issue, but it still disables one to use an absolute tolerance in the case where this is wanted.

The only solution that then remains is to have both an absolute and relative tolerance, and use e.g. the least strict of both (thereby assuming that if you don't want to use one of the two tolerances, you set it to zero exactly). That could be an option, but requires some more work in the implementation.

@danielwe
Copy link
Author

However, I don't see a good motivation for having a relative tolerance as proposed here, which depends on the eigenvalue. While indeed it means that it scales with an overall scaling in A, it also means that different eigenvalues of the same matrix A are converged with different tolerances? I don't see a good motivation for that.

The motivation is precisely that it depends on each eigenvalue, meaning that each converged eigenvalue is correct to a specified number of significant digits. This also means that the accuracy of a converged eigenpair is invariant to whether you use forward or shift-and-invert iterations. Finally, it means that each converged eigenvector is correct to some angular tolerance, regardless of whether the associated eigenvalue is large or small (with the current behavior, eigenvectors associated with smaller eigenvalues have a looser tolerance).

As I mentioned, I'm using inverse iterations to find the minimum eigenvalue of a Hermitian positive definite operator. If this is close to zero, the corresponding eigenvalue of the inverse will be very large, so I'll need to set a fairly loose absolute tolerance in order to get convergence (at least within a reasonable number of iterations). However, I don't know a priori what the magnitude of this eigenvalue will be, and hence what the appropriate tolerance is, so if I were to use KrylovKit I'd have to employ some kind of two-stage scheme as sketched above (perhaps using forward iterations in the first stage).

I definitely think the ability to set both an absolute and relative tolerance would be the best solution, with a behavior like norm < max(abstol, reltol * |λ|). This seems to be a common pattern in numerical software.

I totally understand concerns about code complexity and dev effort, and unfortunately, I don't think I'd be able to help out with implementation right now. Just offering this up as a suggestion---it's something I need in my current project, but I've been quite happy using ArnoldiMethod.

@Jutho
Copy link
Owner

Jutho commented Oct 29, 2022

I don't understand why you use inverse iterations for the minimum eigenvalue of a Hermitian positive definite operator? It's not because it is close to zero, that you need the inverse. It is still an extremal eigenvalue, and you should be able to find it using just :SR (smallest real). (Shift-and)-Invert is used if you want an eigenvalue in the middle of the spectrum, which a Krylov method would otherwise not be able to find.

I also don't understand the argument that the relative tolerance would then be the same for normal and shift-and-invert iterations, I don't see how to relate || A v - lambda v || to ||(A - mu)^{-1} v - (lambda - mu)^{-1} v ||. Could you elaborate on that?

Krylov methods are a generalisation of the power method, and by this feature they converge best for extremal eigenvalues, in particular the largest magnitude. So there are many applications where what you want to find is the largest magnitude eigenvalues, and the largest will converge the most quickly, and so forth. However, with a tolerance that scales with |lambda|, the largest would actually have the loosest tolerance, and then the second one a slightly stricter tolerance, which seems contradictory.

@danielwe
Copy link
Author

I don't understand why you use inverse iterations for the minimum eigenvalue of a Hermitian positive definite operator? It's not because it is close to zero, that you need the inverse. It is still an extremal eigenvalue, and you should be able to find it using just :SR (smallest real). (Shift-and)-Invert is used if you want an eigenvalue in the middle of the spectrum, which a Krylov method would otherwise not be able to find.

I profiled both ways: just passing the matrix with :SR, or computing the Cholesky factorization F and passing x -> F \ x with :LM. The latter is much faster for many of the matrix sizes and sparsities I care about. I made sure that the desired eigenvalue is approximately 1 in these benchmarks to avoid issues with what we're discussing here.

These two approaches illustrate the issue quite well: With the current convergence criterion, if the smallest eigenvalue is much smaller than 1, they obtain vastly different precision, the inverse method being much more accurate. Specifically, say the smallest eigenvalue is 1e-3 and I set tol=1e-6. With forward iterations, I'll get 3 converged digits, while with inverse iterations I'll get 9 converged digits. With the relative convergence criterion, I'd get 6 converged digits with both methods.

I also don't understand the argument that the relative tolerance would then be the same for normal and shift-and-invert iterations, I don't see how to relate || A v - lambda v || to ||(A - mu)^{-1} v - (lambda - mu)^{-1} v ||. Could you elaborate on that?

I suppose I was mainly referring to forward and inverse iterations, like in the example above (otherwise, you usually don't have a choice). But to take the shift-and-invert case: if you use the relative convergence criterion, tol becomes the relative tolerance on both (lambda - mu)^{-1} and (lambda - mu), i.e., both are correct to about a factor (1 + tol); however, with the absolute convergence criterion, tol is the absolute tolerance on (lambda - mu)^{-1}, which means that the tolerance on (lambda - mu) is actually tol * (lambda - mu)^2. That's not very intuitive---the larger the number, the fewer the significant digits; the smaller, the more. That's what I mean when I say that a relative tolerance is invariant to whether you are looking at inverses or not, while an absolute tolerance isn't. (I'd write out the math if I could use TeX; let me know if the derivation is unclear.)

However, with a tolerance that scales with |lambda|, the largest would actually have the loosest tolerance, and then the second one a slightly stricter tolerance, which seems contradictory.

I prefer to be able to say "I want the two largest eigenvalues, both of them with 9 significant figures." Absolute tolerances are sometimes needed because you need to define the concept of "approximately zero", but for all other purposes, relative precision is more meaningful. Float64 spans 632 orders of magnitude(!), there's no a priori way for me to choose a good absolute tolerance for a maximum/minimum eigenvalue.

I hope that helped explain my reasoning a little better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants