Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in fsspmdm #805

Open
semi-h opened this issue Aug 8, 2023 · 6 comments
Open

Segfault in fsspmdm #805

semi-h opened this issue Aug 8, 2023 · 6 comments

Comments

@semi-h
Copy link

semi-h commented Aug 8, 2023

I observe that libxsmm_fsspmdm_create is giving a segfault when ldb and ldc are large. The cutoff ldb/ldc value for segfault seems to vary a bit with the size of the A matrix.

I managed to recreate the issue with the pyfr samples in the libxsmm repository. Below there are three examples of segfaults. In the first two cases A matrix sizes are roughly the same but they have different nnz. Halving the ldb/ldc for the first two results in a successful run, and both fail at ldb=ldc=2,400,000 as shown. Last one is a larger A matrix but roughly the same nnz as the first example, and it fails at ldb=ldc=1,200,000.

$ ./pyfr_driver_asp_reg mats/p5/pri/m0-sp.mtx 2400000 1
CSR matrix data structure we just read (mats/p5/pri/m0-sp.mtx):
rows: 150, columns: 126, elements: 2520

LIBXSMM_VERSION: main_stable-1.17-3674 (25693786)
CLX/DP      TRY    JIT    STA    COL
   0..13      0      0      0      0 
  14..23      0      0      0      0 
  24..64      1      1      0      0 
Registry and code: 13 MB + 8 KB (gemm=1 spmdm=1)
Command: ./pyfr_driver_asp_reg mats/p5/pri/m0-sp.mtx 2400000 1
Uptime: 6.499610 s
Segmentation fault (core dumped)
$ ./pyfr_driver_asp_reg mats/p4/hex/m0-sp.mtx 2400000 1
CSR matrix data structure we just read (mats/p4/hex/m0-sp.mtx):
rows: 150, columns: 125, elements: 750

LIBXSMM_VERSION: main_stable-1.17-3674 (25693786)
CLX/DP      TRY    JIT    STA    COL
   0..13      0      0      0      0 
  14..23      0      0      0      0 
  24..64      1      1      0      0 
Registry and code: 13 MB + 8 KB (gemm=1 spmdm=1)
Command: ./pyfr_driver_asp_reg mats/p4/hex/m0-sp.mtx 2400000 1
Uptime: 5.788157 s
Segmentation fault (core dumped)
$ ./pyfr_driver_asp_reg mats/p6/hex/m0-sp.mtx 1200000 1
CSR matrix data structure we just read (mats/p6/hex/m0-sp.mtx):
rows: 294, columns: 343, elements: 2058

LIBXSMM_VERSION: main_stable-1.17-3674 (25693786)
CLX/DP      TRY    JIT    STA    COL
   0..13      0      0      0      0 
  14..23      0      0      0      0 
  24..64      0      0      0      0 
    > 64      1      1      0      0 
Registry and code: 13 MB + 12 KB (gemm=1 spmdm=1)
Command: ./pyfr_driver_asp_reg mats/p6/hex/m0-sp.mtx 1200000 1
Uptime: 10.935839 s
Segmentation fault (core dumped)

I used the latest available version of libxsmm but actually I first observed a segfault when running PyFR on Intel Skylake and ARM (Graviton2/3) a few months ago, just wasn't able to pinpoint until now where the segfault was originating. I believe the present issue was the root cause for all this so I think this issue first appeared at least a few months ago.

I run the above examples on an i7-1185G7 (Willow Cove). For building libxsmm I just did 'make' in the main folder and then again 'make' in samples/pyfr.

@FreddieWitherden
Copy link
Contributor

x86 displacements are limited to 32-bit signed integers. But log2(150*2400000*8) ~ 31.5. The matrix is sufficiently large that a single instruction can not reference the entire region. I can play some tricks to get us the full 32-bits by per-displacing the base pointer, but the right solution is to avoid jumbo matrices.

@FreddieWitherden
Copy link
Contributor

FreddieWitherden commented Aug 8, 2023

@hfp Are you okay to add a check limiting the total size (k*ldb*sizeof(dtype) and m*ldc*sizeof(dtype)) to be less than 2**31? Technically, it is only needed on x86 but I would apply it to ARM too for consistency. This way the user gets a warning rather than a segfault.

@alheinecke
Copy link
Collaborator

alheinecke commented Aug 8, 2023

this is in my eyes a hot fix.

I would like to see where the bug is in the code gen, and we can easily fix the large displacement issue with SIB addressing mode.

@hfp
Copy link
Collaborator

hfp commented Aug 14, 2023

@hfp Are you okay to add a check limiting the total size (k*ldb*sizeof(dtype) and m*ldc*sizeof(dtype)) to be less than 2**31? Technically, it is only needed on x86 but I would apply it to ARM too for consistency. This way the user gets a warning rather than a segfault.

I am ok with it (also my 1st thought was to check the input). However, with Alex' fix this is not necessary except for hotfix. If support for the full/anticipated range keeps slipping, we can still deploy a range-check.

@FreddieWitherden
Copy link
Contributor

So the most efficient means of supporting this is probably through using several registers for storing b and c pointers each displaced by 4 GiB. By burning 6 GPRs we can support 12 GiB of input and 12 GiB of output without any real additional cost. (I think that we have the GPRs to spare.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants