Highly scale-dependent efficiency of ZHEEVR and DSYEVR #995

amilsted · 2024-03-06T22:23:29Z

Description

Requesting all eigenvalues and eigenvectors of a large matrix M = A * c with ZHEEVR can take an amount of time that strongly depends the overall real scale factor c.

For example, a gaussian random matrix of size 4096 x 4096 with values close to 1.0 takes ~7.5s on my system with ZHEEVR. Scaling the matrix by a factor of 1e13 makes the process take ~14s, an increase of almost 100%. There is a similar slowdown with DSYEVR.

In a more extreme case, with a different example matrix of the same size, the process takes 180s with ZHEEVR when the scale of eigenvalues is ~1e12 vs. 8s when the scale is ~1. This is a ~20x slowdown. In this case, the matrix is real, but in complex representation. I am happy to upload the matrix if you can tell me where you want it. It is sparse and may have degeneracies (computed eigenvalues are close up to about 1e-9 relative). Random unitary transformations of the same matrix show the same slowdown.

With the same matrix, using DSYEVR stills show a slowdown with the "bad" scaling, but not by as much (~20s vs 6s).

The slowdown only occurs when eigenvectors are requested. The ABSTOL parameter is set to 0 (I am calling via Julia and scipy - Julia definitely sets ABSTOL to 0. Probably scipy does too?).

The accuracy of the eigendecomposition does not seem to be strongly related to the scale. I get sufficiently accurate results in all cases.

The extra time is taken in a single-threaded part of the algorithm (I have seen the problem with both MKL and OpenBLAS).

HEEV does not seem to exhibit the problem and solves fast.

I was surprised that an overall scaling factor makes such a big difference to solution speed. I would have expected the scaling would not matter.

This seems like a problem of significance, since both Julia and Scipy now choose syevr/heevr as the default Hermitian/Symmetric eigensolve algorithms.

Checklist

I've included a minimal example to reproduce the issue
I'd be willing to make a PR to solve this issue

The text was updated successfully, but these errors were encountered:

thijssteel · 2024-03-07T07:35:44Z

That is a very interesting observation.

I don't know MRRR that well (but I don't think any of the currently active contributors knows it well).

My initial hunch is that it is just the selection of the algorithm. The documentation of zstemr says this:

If TRYRAC = .TRUE., indicates that the code should check whether
the tridiagonal matrix defines its eigenvalues to high relative
accuracy. If so, the code uses relative-accuracy preserving
algorithms that might be (a bit) slower depending on the matrix.
If the matrix does not define its eigenvalues to high relative
accuracy, the code can uses possibly faster algorithms.

Perhaps it is using the more expensive algorithm for the scaled matrix?

Definitely worth investigating.

RalphAS · 2024-03-10T04:40:07Z

When asked for all vectors, DSYEVR tries the MRRR scheme, and if it fails it runs the more expensive inverse-iteration approach. It appears that the jump in runtime for at least some cases like the ones reported above is due to this fallback. (Inverse iteration is especially slow for clustered eigenvalues.)

In a few cases I tried, the internal MRRR routine DLARRV (which is only used when vectors are wanted) fails for a magnified copy of a matrix for which it gave good results.

A quick fix would be to recalibrate the existing scaling logic in DSTEMR.

RalphAS · 2024-03-17T16:57:48Z

The cause of the failure reported by DSTEMR for certain matrices of large norm is an inconsistency in scaling used in a test in DLARRF:

lapack/SRC/dlarrf.f

Lines 467 to 477 in 5426147

    
           *        None of the representations investigated satisfied our 
        
           *        criteria. Take the best one we found. 
        
                    IF((SMLGROWTH.LT.FAIL).OR.NOFAIL) THEN 
        
                       LSIGMA = BESTSHIFT 
        
                       RSIGMA = BESTSHIFT 
        
                       FORCER = .TRUE. 
        
                       GOTO 5 
        
                    ELSE 
        
                       INFO = 1 
        
                       RETURN 
        
                    ENDIF

SMLGROWTH scales with the norm of the input data, but FAIL is a relative measure:

lapack/SRC/dlarrf.f

Lines 289 to 292 in 5426147

    
           S = DLAMCH( 'S' ) 
        
           SMLGROWTH = ONE / S 
        
           FAIL = DBLE(N-1)*MINGAP/(SPDIAM*EPS) 
        
           FAIL2 = DBLE(N-1)*MINGAP/(SPDIAM*SQRT(EPS))

Presumably the test at line 491 should use FAIL * SPDIAM.

This may account for some of the failures noted by @oamarques in #100.

amilsted added the Type: Bug label Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Highly scale-dependent efficiency of ZHEEVR and DSYEVR #995

Highly scale-dependent efficiency of ZHEEVR and DSYEVR #995

amilsted commented Mar 6, 2024 •

edited

thijssteel commented Mar 7, 2024

RalphAS commented Mar 10, 2024

RalphAS commented Mar 17, 2024

Highly scale-dependent efficiency of ZHEEVR and DSYEVR #995

Highly scale-dependent efficiency of ZHEEVR and DSYEVR #995

Comments

amilsted commented Mar 6, 2024 • edited

thijssteel commented Mar 7, 2024

RalphAS commented Mar 10, 2024

RalphAS commented Mar 17, 2024

amilsted commented Mar 6, 2024 •

edited