Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

min_frames and frames_per_block can not have different values! #512

Open
Lexelius opened this issue Nov 17, 2023 · 1 comment
Open

min_frames and frames_per_block can not have different values! #512

Lexelius opened this issue Nov 17, 2023 · 1 comment

Comments

@Lexelius
Copy link

Lexelius commented Nov 17, 2023

When running reconstructions with the min_frames set to a value that is not the same as frames_per_block, there is unexpected behaviour, with different outcomes depending on which of the parameter values are larger!

The reconstructions were run on GPU with the dev branch (and the livescan subclass available in the livescan_maxiv branch).

These parameters:

p.frames_per_block = = 50
p.scans.scan00.data.min_frames = 10
p.min_frames_for_recon = 1

gives the following error just when the iterations are about to start:

---------------------------------- Autosaving ----------------------------------
Generating copies of probe, object and parameters and runtime
Saving to /home/reblex/Documents/Reconstructions/NM_scannr_1190/livesim_scannr1190_fpb50_minframes10_startframe1____itcont1_00/dumps/dump_scan_000000_None_0000.ptyr
--------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/reblex/Documents/Scripts/Reconstruct_livescan_nanomax_scannr1190.py", line 427, in <module>
    P = Ptycho(p, level=5)
  File "/home/reblex/.local/lib/python3.8/site-packages/ptypy/core/ptycho.py", line 396, in __init__
    self.run()
  File "/home/reblex/.local/lib/python3.8/site-packages/ptypy/core/ptycho.py", line 784, in run
    self.run(engine=engine)
  File "/home/reblex/.local/lib/python3.8/site-packages/ptypy/core/ptycho.py", line 713, in run
    engine.iterate()
  File "/home/reblex/.local/lib/python3.8/site-packages/ptypy/engines/base.py", line 233, in iterate
    self.error = self.engine_iterate(niter_contiguous)
  File "/home/reblex/.local/lib/python3.8/site-packages/ptypy/accelerate/cuda_pycuda/engines/projectional_pycuda_stream.py", line 215, in engine_iterate
    ev_ex, ex, data_ex = self.ex_data.to_gpu(prep.ex, dID, self.qu_htod)
  File "/home/reblex/.local/lib/python3.8/site-packages/ptypy/accelerate/cuda_pycuda/mem_utils.py", line 285, in to_gpu
    ev, gpu = m.to_gpu(cpu, id, stream)
  File "/home/reblex/.local/lib/python3.8/site-packages/ptypy/accelerate/cuda_pycuda/mem_utils.py", line 148, in to_gpu
    self.gpu = gpuarray.to_gpu_async(cpu, allocator=self._allocator, stream=stream)
  File "/sw/easybuild/software/PyCUDA/2020.1-fosscuda-2020b/lib/python3.8/site-packages/pycuda/gpuarray.py", line 1056, in to_gpu_async
    result = GPUArray(ary.shape, ary.dtype, allocator, strides=_compact_strides(ary))
  File "/sw/easybuild/software/PyCUDA/2020.1-fosscuda-2020b/lib/python3.8/site-packages/pycuda/gpuarray.py", line 210, in __init__
    self.gpudata = self.allocator(self.size * self.dtype.itemsize)
  File "/home/reblex/.local/lib/python3.8/site-packages/ptypy/accelerate/cuda_pycuda/mem_utils.py", line 42, in _allocator
    raise Exception('requested more bytes than maximum given before: {} vs {}'.format(nbytes, self.nbytes))
Exception: requested more bytes than maximum given before: 26214400 vs 5242880
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
-------------------------------------------------------------------
[gn10:2417591] *** Process received signal ***
[gn10:2417591] Signal: Aborted (6)
[gn10:2417591] Signal code:  (-6)
[gn10:2417591] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x7f84c8dcbcf0]
[gn10:2417591] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f84c883eacf]
[gn10:2417591] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f84c8811ea5]
[gn10:2417591] [ 3] /sw/easybuild/software/PyCUDA/2020.1-fosscuda-2020b/lib/python3.8/site-packages/pycuda/_driver.cpython-38-x86_64-linux-gnu.so(+0xd8f88)[0x7f836cebcf88]
[gn10:2417591] [ 4] /sw/easybuild/software/PyCUDA/2020.1-fosscuda-2020b/lib/python3.8/site-packages/pycuda/_driver.cpython-38-x86_64-linux-gnu.so(_ZN5boost19thread_specific_ptrIN6pycuda13context_stackEE15default_deleterEPS2_+0xe)[0x7f836cebcf9e]
[gn10:2417591] [ 5] /sw/easybuild/software/Boost/1.74.0-GCC-10.2.0/lib/libboost_thread.so.1.74.0(_ZN5boost6detail12set_tss_dataEPKvPFvPFvPvES3_ES5_S3_b+0x4e)[0x7f836cd9364e]
[gn10:2417591] [ 6] /sw/easybuild/software/PyCUDA/2020.1-fosscuda-2020b/lib/python3.8/site-packages/pycuda/_driver.cpython-38-x86_64-linux-gnu.so(_ZN5boost19thread_specific_ptrIN6pycuda13context_stackEED1Ev+0x15)[0x7f836cebc7c5]
[gn10:2417591] [ 7] /lib64/libc.so.6(+0x5126c)[0x7f84c884126c]
[gn10:2417591] [ 8] /lib64/libc.so.6(on_exit+0x0)[0x7f84c88413a0]
[gn10:2417591] [ 9] /lib64/libc.so.6(__libc_start_main+0xec)[0x7f84c882ad8c]
[gn10:2417591] [10] python[0x400799]
[gn10:2417591] *** End of error message ***
Aborted (core dumped)
[reblex@gn10 ~]$ 

Whereas running with these parameter:

p.frames_per_block = 50
p.scans.scan00.data.min_frames = 100
p.min_frames_for_recon = 1

does not produce an error, but the loading does not start until end_of_scan = True, assuming because the input 'frames' in the check() function is equal to p.frames_per_block but since min_frames has a higher value the code gets stuck at line 619-620 in /ptypy/core/data.py ;

    if frames_accessible < self.min_frames and not self.end_of_scan:
        return WAIT
@Lexelius Lexelius added the bug label Nov 17, 2023
@daurer
Copy link
Contributor

daurer commented Dec 18, 2023

@Lexelius thanks for producing this report and sorry for the long wait.

Not sure if I would describe the behaviour that you observed as unexpected - the BlockScan model together with the GPU-accelerated PyCUDA/CuPy engines has been designed in a way that it expects block-wise loading where each block is of size p.frames_per_block which the exception of the final block which can be smaller. We do recognize that this is limiting the flexibility for on-demand live data loading and data processing.

Let me quickly explain the two specific cases mentioned above before suggesting a potential solution to your use case.

  1. The check() function in LiveScan seems to return a non-zero count of frames_accesible as soon as any data is received from your live data service. As soon as you are above the min_frames threshold of 10, the load() function is triggered, which will then create the first data block with a size of 10. The next time the loading routing returns a WAIT the code will return to the main ptycho level and start initialising the engine. As part of the pycuda stream engine initialisation, a GpuDataManager instance is being created for the exit wave buffer, see here. The size of this exit buffer is calculated based on the maximum block size that has been loaded so far, which in this case would be 10. This means that the exit wave buffer that is created in GPU memory can only hold 10 exit waves. But the LiveScan loader is very likely to produce subsequent blocks that are larger than 10 frames and as soon as this happens, it will produce a very predictable error when the GpuDataManager is trying to copy a block of exit waves to the GPU which is larger than the originally allocated 10 blocks.

  2. As correctly mentioned, the maximum of frames_accessible in this scenario is frames_per_block (50) which is always smaller then min_frames (100) and therefore the loading routine returns WAIT until the end of the scan is reached. I would say this is expected behaviour, min_frames should be smaller than frames_per_block.

To solve your specific issue, there are two possible solutions: A) making the GpuDataManager more flexible and automatically adjust the required GPU memory on-the-fly or B) modify the check() routine of the LiveScan class to ensure that the first data block is always the largest. While A) is doable in principle I would question whether it is justified to make such changes purely for the edge case of live data processing. My suggestion therefore is to focus on B) and improve the check() routine. In particular, I would suggest the folllwing:

  • Introduce internal variable self._waiting_for_initial_block = True
  • Add the following at the end of check():
if self._waiting_for_initial_block and msg[0] < frames:
    return 0,0
else:
    self._waiting_for_initial_block = False
return min(frames, msg[0]), msg[1]

Hope this makes sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants