-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lapack_sgeev routine not going to all routines shown in the call graph while debugging. #1013
Comments
The call graph contains everything that may be called depending on the properties of your input, not necessarily what will be called on every invocation. Also the call graphs you are using are for the reference implementations of both LAPACK and BLAS, while the "kernel/arm64" path that you mentioned strongly suggests that you are using OpenBLAS, which re-implements some functions in a different way. |
Yes I am using OpenBLAS. When debugging LAPACKE_sgesv routine it calls the functions under "kernel/arm64" like sgemv, sscal etc. where the basic operations are performed, but in case of other routines like ssyev or sgeev i don't see any functions under "kernel/arm64" being called. There must be routines that do the algebraic operations and it be called (correct me if i am wrong). I tried with different data sizes but those routines are never being called. |
For a run of ssyev or sgeev, what routine do you see? For SYEV, I would expect to see SYMV, SYR2, SYR2K for the symmetric tridiagonal reduction. Then it gets complicated by yes, you should see some GEMM, TRMM, GEMV, TRMV. You can see some of the call graph at: As Martin said, yes, (1) during a specific run, not all functions in the call tree are used. (2) we can only speak for what is done in reference LAPACK. Nothing wrong in using OpenBLAS. This is great actually, but then I am not sure if they changed the LAPACK algorithm or not. All this being said, I agree that this is weird that you do not see more routines. We can explain it. But I find it surprising. Maybe some libraries are not compiled with a correct GDB flag so that GDB cannot "see" the routines in these libraries and it just gives the higher level drivers? (Not exactly sure what I am writing here!) Maybe you tell us all the routines that are being called in a given run, and we start from there. |
GESV in OpenBLAS is reimplemented (parallelized) as GETRF/GETRS but you should eventually find some GEMM/GEMV on your backtrace. No dirty tricks in GEEV so it should be following the call graph of the reference implementation. I trust you (re)built all of OpenBLAS with DEBUG=1 or the equivalent "-g" compiler flag ? |
Rebuilt the library and debugged ssyev . Following routines are seen while debugging
In OpenBLAS-0.3.26 , I changed some of the routines under kernel/arm64 from .S to my sve .c routines like gemv_n.S to gemv_n.c , swap.c , scal.c , copy.c etc. When i run ssyev routine on this sve implemented blas it runs fine with matrix size 64 and gives error "failed to calculate the eigenvalues " and with increase in size further like 600 or 1000 it gives "segmentation fault" , I am not able to figure out why it's giving segmentation fault or failed to calculate eigen values. Any help is appreciated. |
If you are already debugging with gdb, it should be able to tell you where in the code the segmentation fault occurs (and if it is in any of the functions you wrote, or a pre-existing problem in OpenBLAS or LAPACK). At the gdb prompt, enter "handle 11 nopass" so that the program does not terminate on the segfault, and use "bt" to see the call stack when the segfault occurs. |
while debugging with size 200 it give the following error in OpenBLAS-0.3.26/lapack-netlib/SRC/ilaenv.f ssyev (jobz=..., uplo=..., n=200, a=..., lda=200, w=..., work=..., lwork=6800, info=0, _jobz=1, _uplo=1) at ssyev.f:184 |
looks like part of the stack got overwritten, try going "up" until you reach the last call that had meaningful arguments |
ilaenv (ispec=1, name=<error reading variable: Cannot access memory at address 0xffffbe790000>, |
strange, this does not even look as if one of your newly written BLAS kernels got called |
did you rebuild everything (make clean; make) after making your code changes ? |
yes. |
info = LAPACKE_ssyev( LAPACK_ROW_MAJOR, 'V', 'U', n, a, lda, w ); |
RELATIVE MACHINE PRECISION IS TAKEN TO BE 1.2E-07 cblas_sgemv PASSED THE TESTS OF ERROR-EXITS ******* FATAL ERROR - PARAMETER NUMBER 7 WAS CHANGED INCORRECTLY ******* ******* FATAL ERROR - TESTS ABANDONED ******* THE FOLLOWING PARAMETER VALUES WILL BE USED: ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN 16.00 COLUMN-MAJOR AND ROW-MAJOR DATA LAYOUTS ARE TESTED RELATIVE MACHINE PRECISION IS TAKEN TO BE 2.2D-16 cblas_dgemv PASSED THE TESTS OF ERROR-EXITS ******* FATAL ERROR - PARAMETER NUMBER 7 WAS CHANGED INCORRECTLY ******* ******* FATAL ERROR - TESTS ABANDONED ******* THE FOLLOWING PARAMETER VALUES WILL BE USED: ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN 16.00 COLUMN-MAJOR AND ROW-MAJOR DATA LAYOUTS ARE TESTED RELATIVE MACHINE PRECISION IS TAKEN TO BE 1.2E-07 While building the library it fails this test case. Could you please explain meaning of incx and incy being -ve and is K used for number or rows?, Why does it shows "FATAL ERROR - PARAMETER NUMBER 7 WAS CHANGED INCORRECTLY" what is this parameter number 7. If see the cblas_sgemv( order, transa, m, n, alpha, a, lda, x, incx, beta, |
Negative increments means stepping over the array elements backwards. There is no K in GEMV, so this parameter is most likely unused in this particular test (cin2 contains inputs for a number of different level2 BLAS functions that xccblat2 checks one after the other). "order" is relevant for CBLAS only (BLAS assumes default Fortran matrix order, CBLAS offers you a choice of row-major and column-major and transforms the input accordingly for the actual BLAS call). Indeed the error message suggests that one of the input-only arguments was overwritten, which should not happen. |
THE FOLLOWING PARAMETER VALUES WILL BE USED: ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN 16.00 COLUMN-MAJOR AND ROW-MAJOR DATA LAYOUTS ARE TESTED RELATIVE MACHINE PRECISION IS TAKEN TO BE 1.2E-07 cblas_sgemv PASSED THE TESTS OF ERROR-EXITS ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE ******* |
How to handle the row major and col major matrices while doing an operation. If i write a program for row major will it work for both col major and row major. As you mentioned BLAS assumes default fortran matrix order that means it is col major . I tried with col major but still it fails the test cases while building the library. Then it should not be the problem with order ? ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE ******* ******* FATAL ERROR - TESTS ABANDONED ******* It's able to calculate one value correctly but fails in the other. |
Hello all,
While debugging sgeev or ssyev functions of Lapack using gdb , it's not going to all the functions shown in the call graph.
It's going into only some funtions of lapack like lapack_dge_trans.c , lapacke_nancheck.c ,lapacke_sgeev.c, lapacke_sge_nancheck.c, lapacke_sgeev_work.c and ../kernel/arm64/../generic/lsame.c.
Not going into functions like sscal.c , sgemv.c , scopy , strmv etc.
I am not able to figure out why it's not going into these routines, It should go to some routines where the basic algebraic operations are performed.
How can I get those details of the routines. I am using gdb for debugging.
Thank You.
The text was updated successfully, but these errors were encountered: