Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault when running with HSA #4217

Open
Hprairie opened this issue Apr 19, 2024 · 8 comments
Open

Segmentation Fault when running with HSA #4217

Hprairie opened this issue Apr 19, 2024 · 8 comments

Comments

@Hprairie
Copy link

Running the following with a fresh conda environment with python 3.11. CPU 3970X Threadripper. GPU 7900 XTX. Ubuntu 22.04.1 .

DEBUG=3 python3 -c "from tinygrad import Tensor;
N = 1024; a, b = Tensor.rand(N, N), Tensor.rand(N, N);
c = (a.reshape(N, 1, N) * b.T.reshape(1, N, N)).sum(axis=2);
print((c.numpy() - (a.numpy() @ b.numpy())).mean())"

Results in.

opening device METAL from pid:9140
opening device HSA from pid:9140
opening device NPY from pid:9140
scheduled 4 kernels
*** HSA   rand  seed 1713494129 size 1048576         dtype dtypes.float
*** HSA   rand  seed 1713494130 size 1048576         dtype dtypes.float
  0 ━┳ STORE MemBuffer(idx=0, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1), strides=(1024, 1, 0), offset=0, mask=None, contiguous=True),)))
  1  ┗━┳ SUM ((2,), dtypes.float)
  2    ┗━┳ MUL 
  3      ┣━━ LOAD MemBuffer(idx=1, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1024), strides=(1024, 0, 1), offset=0, mask=None, contiguous=False),)))
  4      ┗━━ LOAD MemBuffer(idx=2, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1024), strides=(0, 1, 1024), offset=0, mask=None, contiguous=False),)))
Segmentation fault (core dumped)
@nimlgen
Copy link
Collaborator

nimlgen commented Apr 20, 2024

Can you run with gdb and share backtrace (py-bt)?

@Hprairie
Copy link
Author

Hprairie commented Apr 20, 2024

Yeah looks like a conda issue. Here is the stack trace.

GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
Starting program: /home/prairie/anaconda3/envs/tiny/bin/python -c from\ tinygrad\ import\ Tensor\;'
'N\ =\ 1024\;\ a,\ b\ =\ Tensor.rand\(N,\ N\),\ Tensor.rand\(N,\ N\)\;'
'c\ =\ \(a.reshape\(N,\ 1,\ N\)\ \*\ b.T.reshape\(1,\ N,\ N\)\).sum\(axis=2\)\;'
'print\(\(c.numpy\(\)\ -\ \(a.numpy\(\)\ @\ b.numpy\(\)\)\).mean\(\)\)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff3fff640 (LWP 11711)]
[New Thread 0x7ffff37fe640 (LWP 11712)]
[New Thread 0x7ffff0ffd640 (LWP 11713)]
[New Thread 0x7fffee7fc640 (LWP 11714)]
[New Thread 0x7fffebffb640 (LWP 11715)]
[New Thread 0x7fffe97fa640 (LWP 11716)]
[New Thread 0x7fffe6ff9640 (LWP 11717)]
[New Thread 0x7fffe47f8640 (LWP 11718)]
[New Thread 0x7fffe1ff7640 (LWP 11719)]
[New Thread 0x7fffdf7f6640 (LWP 11720)]
[New Thread 0x7fffdcff5640 (LWP 11721)]
[New Thread 0x7fffda7f4640 (LWP 11722)]
[New Thread 0x7fffd7ff3640 (LWP 11723)]
[New Thread 0x7fffd57f2640 (LWP 11724)]
[New Thread 0x7fffd2ff1640 (LWP 11725)]
[New Thread 0x7fffce7f0640 (LWP 11726)]
[New Thread 0x7fffcbfef640 (LWP 11727)]
[New Thread 0x7fffcb7ee640 (LWP 11728)]
[New Thread 0x7fffc8fed640 (LWP 11729)]
[New Thread 0x7fffc67ec640 (LWP 11730)]
[New Thread 0x7fffc3feb640 (LWP 11731)]
[New Thread 0x7fffc17ea640 (LWP 11732)]
[New Thread 0x7fffbefe9640 (LWP 11733)]
[New Thread 0x7fffbc7e8640 (LWP 11734)]
[New Thread 0x7fffb9fe7640 (LWP 11735)]
[New Thread 0x7fffb77e6640 (LWP 11736)]
[New Thread 0x7fffb4fe5640 (LWP 11737)]
[New Thread 0x7fffb27e4640 (LWP 11738)]
[New Thread 0x7fffaffe3640 (LWP 11739)]
[New Thread 0x7fffad7e2640 (LWP 11740)]
[New Thread 0x7fffa8fe1640 (LWP 11741)]
[New Thread 0x7fffa67e0640 (LWP 11742)]
[New Thread 0x7fffa5fdf640 (LWP 11743)]
[New Thread 0x7fffa37de640 (LWP 11744)]
[New Thread 0x7fffa0fdd640 (LWP 11745)]
[New Thread 0x7fff9e7dc640 (LWP 11746)]
[New Thread 0x7fff9bfdb640 (LWP 11747)]
[New Thread 0x7fff977da640 (LWP 11748)]
[New Thread 0x7fff96fd9640 (LWP 11749)]
[New Thread 0x7fff947d8640 (LWP 11750)]
[New Thread 0x7fff91fd7640 (LWP 11751)]
[New Thread 0x7fff8f7d6640 (LWP 11752)]
[New Thread 0x7fff8cfd5640 (LWP 11753)]
[New Thread 0x7fff8a7d4640 (LWP 11754)]
[New Thread 0x7fff87fd3640 (LWP 11755)]
[New Thread 0x7fff837d2640 (LWP 11756)]
[New Thread 0x7fff82fd1640 (LWP 11757)]
[New Thread 0x7fff807d0640 (LWP 11758)]
[New Thread 0x7fff7bfcf640 (LWP 11759)]
[New Thread 0x7fff7b7ce640 (LWP 11760)]
[New Thread 0x7fff78fcd640 (LWP 11761)]
[New Thread 0x7fff747cc640 (LWP 11762)]
[New Thread 0x7fff71fcb640 (LWP 11763)]
[New Thread 0x7fff717ca640 (LWP 11764)]
[New Thread 0x7fff6efc9640 (LWP 11765)]
[New Thread 0x7fff6c7c8640 (LWP 11766)]
[New Thread 0x7fff69fc7640 (LWP 11767)]
[New Thread 0x7fff697c6640 (LWP 11768)]
[New Thread 0x7fff64fc5640 (LWP 11769)]
[New Thread 0x7fff627c4640 (LWP 11770)]
[New Thread 0x7fff5ffc3640 (LWP 11771)]
[New Thread 0x7fff5b7c2640 (LWP 11772)]
[New Thread 0x7fff5afc1640 (LWP 11773)]
[New Thread 0x7fff4bfff640 (LWP 11774)]
[New Thread 0x7fff4b7fe640 (LWP 11775)]
[Thread 0x7fff4b7fe640 (LWP 11775) exited]

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
P_set (ptr=0x7ffe3ae00000, value=<optimized out>, size=<optimized out>) at /usr/local/src/conda/python-3.11.8/Modules/_ctypes/cfield.c:1463
1463	/usr/local/src/conda/python-3.11.8/Modules/_ctypes/cfield.c: No such file or directory.

Is there a way to use it with conda, so that it works within an env?

@Hprairie Hprairie reopened this Apr 20, 2024
@Notnaton
Copy link

Notnaton commented Apr 26, 2024

I ran into this issue outside conda, just using pip install git+https....
Ran the example code and got a seg fault

Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".

(gdb) run
Starting program: /usr/bin/python3 script.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff3bff640 (LWP 227236)]
[New Thread 0x7ffff33fe640 (LWP 227237)]
[New Thread 0x7ffff0bfd640 (LWP 227238)]
[New Thread 0x7fffee3fc640 (LWP 227239)]
[New Thread 0x7fffebbfb640 (LWP 227240)]
[New Thread 0x7fffe93fa640 (LWP 227241)]
[New Thread 0x7fffe4bf9640 (LWP 227242)]
[New Thread 0x7fffe43f8640 (LWP 227243)]
[New Thread 0x7fffe1bf7640 (LWP 227244)]
[New Thread 0x7fffdf3f6640 (LWP 227245)]
[New Thread 0x7fffdcbf5640 (LWP 227246)]
[New Thread 0x7fffda3f4640 (LWP 227247)]
[New Thread 0x7fffd7bf3640 (LWP 227248)]
[New Thread 0x7fffd53f2640 (LWP 227249)]
[New Thread 0x7fffd0bf1640 (LWP 227250)]
[New Thread 0x7fffc37ff640 (LWP 227251)]
[New Thread 0x7ffec2dff640 (LWP 227252)]
[Thread 0x7ffec2dff640 (LWP 227252) exited]

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
P_set (ptr=0x7ffec0a00000, value=, size=) at ./Modules/_ctypes/cfield.c:1462
1462 ./Modules/_ctypes/cfield.c: No such file or directory.
(gdb) py-bt
Traceback (most recent call first):
File "/home/anton/.local/lib/python3.10/site-packages/tinygrad/runtime/ops_hsa.py", line 89, in call
for i in range(len(args)): args_st.setattr(f'f{i}', args[i])
File "/home/anton/.local/lib/python3.10/site-packages/tinygrad/device.py", line 182, in call
return self.clprg(*[x._buf for x in rawbufs], **lra, vals=tuple(var_vals[k] for k in self.vars), wait=wait)
File "/home/anton/.local/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 16, in run
et = self.prg([cast(Buffer, x).ensure_allocated() for x in self.bufs], var_vals if var_vals is not None else {}, wait=wait or DEBUG >= 2)
File "/home/anton/.local/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 90, in run_schedule
ei.run(var_vals)
File "/home/anton/.local/lib/python3.10/site-packages/tinygrad/tensor.py", line 165, in realize
run_schedule(*self.schedule_with_vars(*lst))
File "/home/anton/.local/lib/python3.10/site-packages/tinygrad/tensor.py", line 198, in _data
cpu = self.cast(self.dtype.scalar()).contiguous().to("CLANG").realize()
File "/home/anton/.local/lib/python3.10/site-packages/tinygrad/tensor.py", line 245, in numpy
return np.frombuffer(self._data(), dtype=self.dtype.np).reshape(self.shape)
File "/home/anton/script.py", line 4, in
print((c.numpy() - (a.numpy() @ b.numpy())).mean())
(gdb)

@Notnaton
Copy link

Using version 0.8.0 using pip, works.

@Hprairie
Copy link
Author

Hprairie commented May 1, 2024

@nimlgen I have been looking through the code base to try to see how it is generating the source of "/usr/local/src/conda/python-3.11.8/Modules/_ctypes/cfield.c", but have been unsuccessful. Any ideas on where to start debugging? I would prefer to use the main branch of tinygrad rather than 0.8.0.

@nimlgen
Copy link
Collaborator

nimlgen commented May 1, 2024

      args_st = self.args_struct_t.from_address(kernargs)
      for i in range(len(args)): args_st.__setattr__(f'f{i}', args[I])

is kernargs 0 here?

Do you have an integrated gpu? You can manage visible gpus with https://rocm.docs.amd.com/en/latest/conceptual/gpu-isolation.html#rocr-visible-devices if it is the case.

@Hprairie
Copy link
Author

Hprairie commented May 1, 2024

kernargs isn't zero for me. I have a dedicated GPU, 7900 XTX.

(tiny) prairie@TRX40:~/Projects$ DEBUG=3 python3 -c "from tinygrad import Tensor;
N = 1024; a, b = Tensor.rand(N, N), Tensor.rand(N, N);
c = (a.reshape(N, 1, N) * b.T.reshape(1, N, N)).sum(axis=2);
print((c.numpy() - (a.numpy() @ b.numpy())).mean())"
opening device METAL from pid:8864
opening device HSA from pid:8864
opening device NPY from pid:8864
*** CUSTOM     1 custom_random                          arg   1 mem  0.00 GB 
*** CUSTOM     2 custom_random                          arg   1 mem  0.01 GB 
  0 ━┳ STORE MemBuffer(idx=0, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1), strides=(1024, 1, 0), offset=0, mask=None, contiguous=True),)))
  1  ┗━┳ SUM ((2,), dtypes.float)
  2    ┗━┳ MUL 
  3      ┣━━ LOAD MemBuffer(idx=1, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1024), strides=(1024, 0, 1), offset=0, mask=None, contiguous=False),)))
  4      ┗━━ LOAD MemBuffer(idx=2, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1024), strides=(0, 1, 1024), offset=0, mask=None, contiguous=False),)))
> /home/prairie/Projects/tinygrad/tinygrad/runtime/ops_hsa.py(88)__call__()
-> kernargs = self.device.alloc_kernargs(self.kernargs_segment_size)
(Pdb) n
> /home/prairie/Projects/tinygrad/tinygrad/runtime/ops_hsa.py(89)__call__()
-> args_st = self.args_struct_t.from_address(kernargs)
(Pdb) p kernargs
130574266138624
(Pdb) 

@Hprairie
Copy link
Author

Hprairie commented May 1, 2024

I could be wrong, but it just seems like that pointer for the function definition of __setattr__ for args_struct_t is looking at /usr/local/src rather than correctly identifying where _ctypes is located.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants