Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highly scale-dependent efficiency of ZHEEVR and DSYEVR #995

Open
1 of 2 tasks
amilsted opened this issue Mar 6, 2024 · 3 comments
Open
1 of 2 tasks

Highly scale-dependent efficiency of ZHEEVR and DSYEVR #995

amilsted opened this issue Mar 6, 2024 · 3 comments

Comments

@amilsted
Copy link

amilsted commented Mar 6, 2024

Description

Requesting all eigenvalues and eigenvectors of a large matrix M = A * c with ZHEEVR can take an amount of time that strongly depends the overall real scale factor c.

For example, a gaussian random matrix of size 4096 x 4096 with values close to 1.0 takes ~7.5s on my system with ZHEEVR. Scaling the matrix by a factor of 1e13 makes the process take ~14s, an increase of almost 100%. There is a similar slowdown with DSYEVR.

In a more extreme case, with a different example matrix of the same size, the process takes 180s with ZHEEVR when the scale of eigenvalues is ~1e12 vs. 8s when the scale is ~1. This is a ~20x slowdown. In this case, the matrix is real, but in complex representation. I am happy to upload the matrix if you can tell me where you want it. It is sparse and may have degeneracies (computed eigenvalues are close up to about 1e-9 relative). Random unitary transformations of the same matrix show the same slowdown.

With the same matrix, using DSYEVR stills show a slowdown with the "bad" scaling, but not by as much (~20s vs 6s).

The slowdown only occurs when eigenvectors are requested. The ABSTOL parameter is set to 0 (I am calling via Julia and scipy - Julia definitely sets ABSTOL to 0. Probably scipy does too?).

The accuracy of the eigendecomposition does not seem to be strongly related to the scale. I get sufficiently accurate results in all cases.

The extra time is taken in a single-threaded part of the algorithm (I have seen the problem with both MKL and OpenBLAS).

HEEV does not seem to exhibit the problem and solves fast.

I was surprised that an overall scaling factor makes such a big difference to solution speed. I would have expected the scaling would not matter.

This seems like a problem of significance, since both Julia and Scipy now choose syevr/heevr as the default Hermitian/Symmetric eigensolve algorithms.

Checklist

  • I've included a minimal example to reproduce the issue
  • I'd be willing to make a PR to solve this issue
@thijssteel
Copy link
Collaborator

That is a very interesting observation.

I don't know MRRR that well (but I don't think any of the currently active contributors knows it well).

My initial hunch is that it is just the selection of the algorithm. The documentation of zstemr says this:

If TRYRAC = .TRUE., indicates that the code should check whether
the tridiagonal matrix defines its eigenvalues to high relative
accuracy. If so, the code uses relative-accuracy preserving
algorithms that might be (a bit) slower depending on the matrix.
If the matrix does not define its eigenvalues to high relative
accuracy, the code can uses possibly faster algorithms.

Perhaps it is using the more expensive algorithm for the scaled matrix?

Definitely worth investigating.

@RalphAS
Copy link

RalphAS commented Mar 10, 2024

When asked for all vectors, DSYEVR tries the MRRR scheme, and if it fails it runs the more expensive inverse-iteration approach. It appears that the jump in runtime for at least some cases like the ones reported above is due to this fallback. (Inverse iteration is especially slow for clustered eigenvalues.)

In a few cases I tried, the internal MRRR routine DLARRV (which is only used when vectors are wanted) fails for a magnified copy of a matrix for which it gave good results.

A quick fix would be to recalibrate the existing scaling logic in DSTEMR.

@RalphAS
Copy link

RalphAS commented Mar 17, 2024

The cause of the failure reported by DSTEMR for certain matrices of large norm is an inconsistency in scaling used in a test in DLARRF:

lapack/SRC/dlarrf.f

Lines 467 to 477 in 5426147

* None of the representations investigated satisfied our
* criteria. Take the best one we found.
IF((SMLGROWTH.LT.FAIL).OR.NOFAIL) THEN
LSIGMA = BESTSHIFT
RSIGMA = BESTSHIFT
FORCER = .TRUE.
GOTO 5
ELSE
INFO = 1
RETURN
ENDIF

SMLGROWTH scales with the norm of the input data, but FAIL is a relative measure:

lapack/SRC/dlarrf.f

Lines 289 to 292 in 5426147

S = DLAMCH( 'S' )
SMLGROWTH = ONE / S
FAIL = DBLE(N-1)*MINGAP/(SPDIAM*EPS)
FAIL2 = DBLE(N-1)*MINGAP/(SPDIAM*SQRT(EPS))

Presumably the test at line 491 should use FAIL * SPDIAM.

This may account for some of the failures noted by @oamarques in #100.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants