100% RAM usage for large batch simulation (simulation ends up CRASHING) #349

pablosreyero · 2024-03-07T15:20:56Z

Dear Sionna community,

I've been playing around with this tutorial: https://nvlabs.github.io/sionna/examples/Sionna_Ray_Tracing_Introduction.html for quite a while.

The thing is that after computing in a for loop: y = channel([x, h_freq, no]), b_hat = pusch_receiver([y, no]) , BER = compute_ber(b, b_hat).numpy() for several iterations, I run out of memory. The problem is that the RAM memory slowly fades away as the number of iterations increases and this forces the kernel to die, because I run out of memory. Therefore, If I set a batch of 100000 iterations, the simulation never gets to finish and we usually only reach 56000 iterations (approx)

I have noticed this behaviour not only for the functions mentioned above inside a for, but also for the Ray-Tracing part in a whole (also inside a for loop), i.e., load scene, deploy Tx and Rx, compute paths and CIRs.

I am running Sionna in a docker container, which is currently running in a machine with the following specifications:

Ubuntu 20.04 LTS
Intel® Xeon(R) Silver 4309Y CPU @ 2.80GHz × 32
64 GB of RAM

Here's the code to reproduce this issue:

LongBatchCode.zip

Any feedback will be appreciated, thanks in advance!

Pablo.-

The text was updated successfully, but these errors were encountered:

merlinND · 2024-03-13T14:26:25Z

Hello @pablosreyero,

Could you try structuring your code in the following way and let us know if the problem still occurs:

import gc

def iteration(...):
    y = channel([x, h_freq, no])
    b_hat = pusch_receiver([y, no]) 
    BER = compute_ber(b, b_hat)
    return BER.numpy()

def main():
    for it_i in range(...):
        BER_np = iteration(...)
        # Use BER_np as needed

        del BER_np
        gc.collect()

The key thing is that all per-iteration variables must go out of scope before calling the garbage collector.

If this works, you can call the garbage collector less often to reduce the overhead (e.g. once every 500 iterations).

jhoydis · 2024-03-14T09:22:47Z

Hi,

I had a quick look at your code.

First of all, it seems that you want to simulate 1 single transmitter sending the same stream to 5 receivers.
However, you only configure a single PUSCHReceiver. So something is wrong in your setup. I have also some doubts that this is a typical PUSCH scenario.

Could it be that you actually want to simulate a distributed MIMO receiver? If this is the case, you would simply need to reshape the tensor of the channel frequency response from [batch_size, num_rx, num_rx_ant,...] to [batch_size, 1, num_rx* num_rx_ant,...].

This will probably not solve the memory issue. However, I would recommend that you run your simulations in graph mode. This might resolve it. In any case, it should substantially speed-up your simulations, even on CPU.

pablosreyero · 2024-03-14T14:47:32Z

Hello @pablosreyero,

Could you try structuring your code in the following way and let us know if the problem still occurs:
import gc

def iteration(...):
    y = channel([x, h_freq, no])
    b_hat = pusch_receiver([y, no]) 
    BER = compute_ber(b, b_hat)
    return BER.numpy()

def main():
    for it_i in range(...):
        BER_np = iteration(...)
        # Use BER_np as needed

        del BER_np
        gc.collect()
The key thing is that all per-iteration variables must go out of scope before calling the garbage collector.

If this works, you can call the garbage collector less often to reduce the overhead (e.g. once every 500 iterations).

Hello @merlinND,

Thank you very much for your reply. We have already tested both the garbage collector and the del statement with multiple variables in the past, but it did not help out. However, I reproduced the exact same code structure you have provided, and I obtained the same results. I attach the new code and a log file with the evolution of the RAM (memory-wise) in the zip file.

Thanks again for your help.

Pablo.-

https://github.com/NVlabs/sionna/files/14603871/files.zip

pablosreyero · 2024-03-14T15:31:58Z

Hi,

I had a quick look at your code.

First of all, it seems that you want to simulate 1 single transmitter sending the same stream to 5 receivers. However, you only configure a single PUSCHReceiver. So something is wrong in your setup. I have also some doubts that this is a typical PUSCH scenario.

Could it be that you actually want to simulate a distributed MIMO receiver? If this is the case, you would simply need to reshape the tensor of the channel frequency response from [batch_size, num_rx, num_rx_ant,...] to [batch_size, 1, num_rx* num_rx_ant,...].

This will probably not solve the memory issue. However, I would recommend that you run your simulations in graph mode. This might resolve it. In any case, it should substantially speed-up your simulations, even on CPU.

Hello @jhoydis,

Thanks for pointing out the error regarding my scenario, you are totally right, I rushed and copied an old version of my code to just reproduce the RAM issue in a smaller code. If I'm not mistaken you already mentioned this tensor reshape in another discussion (#269) and that's how I noticed, so thanks again for the reminder.

Now coming back to the RAM issue, I have tried everything: garbage collectors, del statements (at the end of every iteration), limit memory usage, convert .ipynb to .py and run the code from the terminal (without JupyterNotebook), analyze variables and objets with a python profiler, but nothing seems to unveil the problem. This MEM consumption is encountered when running simulations without a keras model and in CPU, and even though Sionna is meant to be run in a keras layer and in GPU, it is really weird to see how RAM slowly fades away, like if something was accumulating in memory.

Thanks for your help and for bringing the worlds of AI and Wireless Communications even closer together with Sionna!

Pablo.-

jhoydis · 2024-03-14T15:42:53Z

Have you tried running your simulations in graph mode?

pablosreyero · 2024-03-14T16:15:30Z

Have you tried running your simulations in graph mode?

Not yet, but I'm going to do so. I'll let you know if we encounter any other errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

100% RAM usage for large batch simulation (simulation ends up CRASHING) #349

100% RAM usage for large batch simulation (simulation ends up CRASHING) #349

pablosreyero commented Mar 7, 2024

merlinND commented Mar 13, 2024 •

edited

jhoydis commented Mar 14, 2024

pablosreyero commented Mar 14, 2024

pablosreyero commented Mar 14, 2024

jhoydis commented Mar 14, 2024

pablosreyero commented Mar 14, 2024

100% RAM usage for large batch simulation (simulation ends up CRASHING) #349

100% RAM usage for large batch simulation (simulation ends up CRASHING) #349

Comments

pablosreyero commented Mar 7, 2024

merlinND commented Mar 13, 2024 • edited

jhoydis commented Mar 14, 2024

pablosreyero commented Mar 14, 2024

pablosreyero commented Mar 14, 2024

jhoydis commented Mar 14, 2024

pablosreyero commented Mar 14, 2024

merlinND commented Mar 13, 2024 •

edited