New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA out of memory during saving to phy #670
Comments
Thanks for catching this, we're deciding on how to fix it. If you want to be able to sort in the meantime, since you mentioned you're comfortable modifying code, you could install from source and comment out a few lines to skip this step for now. It's only used for plotting the spike positions on the probe (which is nice, but not necessary). The relevant lines are 214 and 215 in
172, 173, 181, and 185 in
|
Thanks. I am still straggling with the KS implementation of CUDA. I have found out that unless explicitly released using torch.cuda.empty_cache(), the cache is not emptied and "CUDA out of memory" causes the program to exit. When using torch.cuda.empty_cache() before and after every phase of the sorting I successfully managed to sort NP1 recording of 5 hours. I think you should add empty_cache() to your code, or have some flag to do it when desired. Unfortunately, KS4 still crashed with "CUDA out of memory" when I tried sorting a longer recording of 49h. There was a warning about scalar overflow, and then it tried to allocate 2.6TB of GPU memory. See the trace log below: Interpreting binary file as default dtype='int16'. If data was saved in a different format, specify computing drift warnings.warn(msg, RuntimeWarning) |
Okay thanks, looking into it. Just to clarify, are you using the default settings to sort this? I.e. no changes to batch size, detection thresholds, etc. |
I did not change any parameter. Thanks for looking into it. |
Hi. Any news regarding this issue? |
Not yet. |
Re: the last error you described, I have a fix working. I'll push it after I test a few more things (probably today). The problem was caused because the large number of samples was causing an integer overflow that caused the program to try to load many batches at once. As for the other memory issues you brought up, that will take longer to work on but it's on the to-do list. It sounds like using |
Thanks for the overflow fix. I will wait for your push and test it on my data. WRT the GPU memory usage, I understand that it is not critical for most users, but I hope that the KS team will find time to optimize this, as recording time will surely grow fast inbyhe near future. Thanks again for putting an effort to solve these problems. Much obliged |
Describe the issue:
Hi,
Thanks for the great work. I've been using KS to sort long recordings (days) of NP1. I've managed to edit KS3 so that some of the memory-heavy computations will be done using the CPU memory, with the cost of slower running time. Moving to KS4, running using the GPU was possible again, but failed oddly during saving to phy. I will be happy if you can help me resolve this issue.
Thanks
Anan
The following is the output of KS4
Interpreting binary file as default dtype='int16'. If data was saved in a different format, specify
data_dtype
.Using GPU for PyTorch computations. Specify
device
to change this.sorting G:\NDR21\NDR21_hab3ToExt_g0\NDR21_hab3ToExt_g0_imec0\NDR21_hab3ToExt_g0_t0.imec0.ap.bin
using probe neuropixPhase3B1_kilosortChanMap.mat
Preprocessing filters computed in 227.77s; total 227.85s
computing drift
Re-computing universal templates from data.
H:\envs\kilosort4_1\lib\site-packages\threadpoolctl.py:1223: RuntimeWarning:
Found Intel OpenMP ('libiomp') and LLVM OpenMP ('libomp') loaded at
the same time. Both libraries are known to be incompatible and this
can cause random crashes or deadlocks on Linux when loaded in the
same Python program.
Using threadpoolctl may cause crashes or deadlocks. For more
information and possible workarounds, please see
https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md
warnings.warn(msg, RuntimeWarning)
100%|██████████████████████████████████████████████████████████████████████████| 21600/21600 [5:29:56<00:00, 1.09it/s]
drift computed in 24592.70s; total 24820.55s
Extracting spikes using templates
Re-computing universal templates from data.
100%|██████████████████████████████████████████████████████████████████████████| 21600/21600 [5:34:03<00:00, 1.08it/s]
101617684 spikes extracted in 20305.22s; total 45127.25s
First clustering
100%|██████████████████████████████████████████████████████████████████████████████| 96/96 [15:20:35<00:00, 575.37s/it]
742 clusters found, in 55302.18s; total 100429.43s
Extracting spikes using cluster waveforms
100%|██████████████████████████████████████████████████████████████████████████| 21600/21600 [3:41:37<00:00, 1.62it/s]
119437390 spikes extracted in 13482.27s; total 113911.70s
Final clustering
100%|██████████████████████████████████████████████████████████████████████████████| 96/96 [22:33:55<00:00, 846.20s/it]
492 clusters found, in 81236.76s; total 195148.93s
Merging clusters
471 units found, in 362.79s; total 195511.72s
Saving to phy and computing refractory periods
Traceback (most recent call last):
File "H:\envs\kilosort4_1\lib\runpy.py", line 187, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "H:\envs\kilosort4_1\lib\runpy.py", line 110, in _get_module_details
import(pkg_name)
File "F:\PycharmProjects\kilosort4\ks4_main.py", line 23, in
run_kilosort(settings=settings, probe_name='neuropixPhase3B1_kilosortChanMap.mat')
File "H:\envs\kilosort4_1\lib\site-packages\kilosort\run_kilosort.py", line 146, in run_kilosort
save_sorting(ops, results_dir, st, clu, tF, Wall, bfile.imin, tic0,
File "H:\envs\kilosort4_1\lib\site-packages\kilosort\run_kilosort.py", line 472, in save_sorting
results_dir, similar_templates, is_ref, est_contam_rate = io.save_to_phy(
File "H:\envs\kilosort4_1\lib\site-packages\kilosort\io.py", line 172, in save_to_phy
xs, ys = compute_spike_positions(st, tF, ops)
File "H:\envs\kilosort4_1\lib\site-packages\kilosort\postprocessing.py", line 39, in compute_spike_positions
chs = ops['iCC'][:, ops['iU'][st[:,1]]].cpu()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.90 GiB. GPU 0 has a total capacity of 11.00 GiB of which 6.65 GiB is free. Of the allocated memory 922.21 MiB is allocated by PyTorch, and 543.79 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
The text was updated successfully, but these errors were encountered: