You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I launch 100 clients supposed to learn how to classify images from the cifar-100 dataset.
I have 2 GPUs and 6 CPUs and enable gpu growth. Each client has access to 1 CPU and the whole GPU (2x32G VRAM). I expect to have enough GPU memory for this task!
At the first round a device is created with 32G: 2024-04-08 14:49:08.296728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 31141 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:1a:00.0, compute capability: 7.0
But later at the second round, a new one is created with only 494MB: (DefaultActor pid=460472) 2024-04-08 14:49:32.679854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /device:GPU:0 with 494 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:1a:00.0, compute capability: 7.0
The OOM seems to come from a lack of memory on the 494MB device.
Why isn't the memory released after the first round?
I expect the memory to be released after each round in order to run a new round on the same GPU.
Actual Results
2024-04-08 14:48:23.651346: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x154d2c8b5aa0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-04-08 14:48:23.651385: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla V100-SXM2-32GB, Compute Capability 7.0
2024-04-08 14:48:23.725837: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
2024-04-08 14:49:08.293991: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 2
2024-04-08 14:49:08.294123: I tensorflow/core/grappler/clusters/single_machine.cc:361] Starting new session
2024-04-08 14:49:08.296728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 31141 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:1a:00.0, compute capability: 7.0
2024-04-08 14:49:08.297003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 31141 MB memory: -> device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:1c:00.0, compute capability: 7.0
2024-04-08 14:49:09.506876: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 2
2024-04-08 14:49:09.507014: I tensorflow/core/grappler/clusters/single_machine.cc:361] Starting new session
2024-04-08 14:49:09.509594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 31141 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:1a:00.0, compute capability: 7.0
2024-04-08 14:49:09.509865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 31141 MB memory: -> device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:1c:00.0, compute capability: 7.0
2024-04-08 14:49:10.052274: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:186] Calibration with FP32 or FP16 is not implemented. Falling back to use_calibration = False.Note that the default value of use_calibration is True.
2024-04-08 14:49:10.192426: W tensorflow/compiler/tf2tensorrt/segment/segment.cc:970]
TensorRT unsupported/non-converted OP Report:
- NoOp -> 2x
- Cast -> 1x
- Identity -> 1x
- Placeholder -> 1x
- Total nonconverted OPs: 5
- Total nonconverted OP Types: 4
For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops.
2024-04-08 14:49:10.192981: W tensorflow/compiler/tf2tensorrt/segment/segment.cc:1298] The environment variable TF_TRT_MAX_ALLOWED_ENGINES=20 has no effect since there are only 1 TRT Engines with at least minimum_segment_size=3 nodes.
2024-04-08 14:49:10.193039: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:799] Number of TensorRT candidate segments: 1
2024-04-08 14:49:10.300789: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 0 consisting of 79 nodes by TRTEngineOp_000_000.
2024-04-08 14:49:13.086942: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:378] Ignored output_format.
2024-04-08 14:49:13.086983: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:381] Ignored drop_control_dependency.
2024-04-08 14:49:13.087426: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /linkhome/rech/genwej01/uov71di/experiments/examples/federated/cifar100/my_trials_2024-04-08--14:47:50_federatedListicCFL_strategy_minCl100_minFit100_learningRate0.001_nbEpoch1_pro
cID0_dropout0.2_dataConfig1_isFLserverTrue_num_experiments1/expe_0/exported_models/1
2024-04-08 14:49:13.093707: I tensorflow/cc/saved_model/reader.cc:51] Reading meta graph with tags { serve }
2024-04-08 14:49:13.107495: I tensorflow/cc/saved_model/loader.cc:233] Restoring SavedModel bundle.
2024-04-08 14:49:13.317161: I tensorflow/cc/saved_model/loader.cc:316] SavedModel load for tags { serve }; Status: success: OK. Took 229736 microseconds.
INFO flwr 2024-04-08 14:49:27,263 | server.py:104 | FL starting
INFO:flwr:FL starting
DEBUG flwr 2024-04-08 14:49:30,687 | server.py:222 | fit_round 1: strategy sampled 100 clients (out of 100)
DEBUG:flwr:fit_round 1: strategy sampled 100 clients (out of 100)
(DefaultActor pid=460472) if distutils.version.LooseVersion(
(DefaultActor pid=460472) /usr/local/lib/python3.11/dist-packages/tensorflow_probability/python/__init__.py:57: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
(DefaultActor pid=460472) if (distutils.version.LooseVersion(tf.__version__) <
(DefaultActor pid=460472) /usr/local/lib/python3.11/dist-packages/tf_agents/utils/common.py:91: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
(DefaultActor pid=460472) distutils.version.LooseVersion(tf.__version__)
(DefaultActor pid=460472) /gpfsssd/jobscratch/uov71di_1419691/session_2024-04-08_14-48-11_378165_459197/runtime_resources/py_modules_files/_ray_pkg_aa459d48e12ddf8a/deeplearningtools/tools/gpu.py:78: DeprecationWarning: invalid escape sequence '\ '
(DefaultActor pid=460472) f.write('\n -> /!\ Layer not tensorcore compliant (index, name, input, output):'+str((i,layer.name,layer.input_shape, layer.output_shape)))
(DefaultActor pid=460472) /usr/local/lib/python3.11/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning:
(DefaultActor pid=460472)
(DefaultActor pid=460472) TensorFlow Addons (TFA) has ended development and introduction of new features.
(DefaultActor pid=460472) TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
(DefaultActor pid=460472) Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP).
(DefaultActor pid=460472)
(DefaultActor pid=460472) For more information see: https://github.com/tensorflow/addons/issues/2807
(DefaultActor pid=460472)
(DefaultActor pid=460472) warnings.warn(
(DefaultActor pid=460472) /usr/local/lib/python3.11/dist-packages/tensorflow_model_optimization/__init__.py:65: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
(DefaultActor pid=460472) if (distutils.version.LooseVersion(tf.version.VERSION) <
(DefaultActor pid=460472) 2024-04-08 14:49:32.679854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /device:GPU:0 with 494 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:1a:00.0, compute capability: 7.0
(DefaultActor pid=460472) 2024-04-08 14:49:32.681863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 494 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:1a:00.0, compute
capability: 7.0
(DefaultActor pid=460472) 2024-04-08 14:49:32.683875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /device:GPU:0 with 494 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:1a:00.0, compute capability: 7.0
(DefaultActor pid=460472) 2024-04-08 14:49:32.685186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /device:GPU:0 with 494 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:1a:00.0, compute capability: 7.0
(DefaultActor pid=460472) 2024-04-08 14:49:37.205480: I tensorflow/tsl/profiler/lib/profiler_session.cc:104] Profiler session initializing.
(DefaultActor pid=460472) 2024-04-08 14:49:37.205519: I tensorflow/tsl/profiler/lib/profiler_session.cc:119] Profiler session started.
(DefaultActor pid=460472) 2024-04-08 14:49:37.205559: I tensorflow/compiler/xla/backends/profiler/gpu/cupti_tracer.cc:1694] Profiler found 1 GPUs
(DefaultActor pid=460472) 2024-04-08 14:49:37.242056: I tensorflow/tsl/profiler/lib/profiler_session.cc:131] Profiler session tear down.
(DefaultActor pid=460472) 2024-04-08 14:49:37.242199: I tensorflow/compiler/xla/backends/profiler/gpu/cupti_tracer.cc:1828] CUPTI activity buffer flushed
(DefaultActor pid=460472) 2024-04-08 14:49:37.288573: I tensorflow/tsl/profiler/lib/profiler_session.cc:104] Profiler session initializing.
(DefaultActor pid=460472) 2024-04-08 14:49:37.288606: I tensorflow/tsl/profiler/lib/profiler_session.cc:119] Profiler session started.
(DefaultActor pid=460472) 2024-04-08 14:49:37.324481: I tensorflow/tsl/profiler/lib/profiler_session.cc:131] Profiler session tear down.
(DefaultActor pid=460472) 2024-04-08 14:49:37.324669: I tensorflow/compiler/xla/backends/profiler/gpu/cupti_tracer.cc:1828] CUPTI activity buffer flushed
(DefaultActor pid=460472) /usr/local/lib/python3.11/dist-packages/flwr/simulation/ray_transport/ray_actor.py:72: DeprecationWarning: Ensure your client is of type `flwr.client.Client`. Please convert it using the `.to_client()` method before returning it in the `clie
nt_fn` you pass to `start_simulation`. We have applied this conversion on your behalf. Not returning a `Client` might trigger an error in future versions of Flower.
(DefaultActor pid=460472) client = check_clientfn_returns_client(client_fn(cid))
(DefaultActor pid=460472) 2024-04-08 14:49:37.446197: I tensorflow/tsl/profiler/lib/profiler_session.cc:104] Profiler session initializing.
(DefaultActor pid=460472) 2024-04-08 14:49:37.446238: I tensorflow/tsl/profiler/lib/profiler_session.cc:119] Profiler session started.
(DefaultActor pid=460472) 2024-04-08 14:49:37.482799: I tensorflow/tsl/profiler/lib/profiler_session.cc:131] Profiler session tear down.
(DefaultActor pid=460472) 2024-04-08 14:49:37.482979: I tensorflow/compiler/xla/backends/profiler/gpu/cupti_tracer.cc:1828] CUPTI activity buffer flushed
(DefaultActor pid=460472) 2024-04-08 14:49:39.050371: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:961] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape insequential/dropout/dropout/SelectV2-2-TransposeNHW
CToNCHW-LayoutOptimizer
(DefaultActor pid=460472) 2024-04-08 14:49:39.376274: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8600
(DefaultActor pid=460472) 2024-04-08 14:49:39.650979: W tensorflow/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 563.19MiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
(DefaultActor pid=460472) 2024-04-08 14:49:39.663984: W tensorflow/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 602.12MiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
(DefaultActor pid=460472) 2024-04-08 14:49:39.664051: W tensorflow/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 602.12MiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
(DefaultActor pid=460472) 2024-04-08 14:49:39.667988: W tensorflow/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.08GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
(DefaultActor pid=460472) 2024-04-08 14:49:39.678127: W tensorflow/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 602.12MiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
(DefaultActor pid=460472) 2024-04-08 14:49:39.678184: W tensorflow/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 602.12MiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
(DefaultActor pid=460472) 2024-04-08 14:49:39.682286: W tensorflow/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.08GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
(DefaultActor pid=460472) 2024-04-08 14:49:40.488444: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x15761a00 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
(DefaultActor pid=460472) 2024-04-08 14:49:40.488503: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla V100-SXM2-32GB, Compute Capability 7.0
(DefaultActor pid=460472) 2024-04-08 14:49:40.562673: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
(DefaultActor pid=460472) 2024-04-08 14:49:40.667952: W tensorflow/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 602.12MiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
(DefaultActor pid=460472) 2024-04-08 14:49:40.683735: W tensorflow/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.08GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
(DefaultActor pid=460472) 2024-04-08 14:49:40.683801: W tensorflow/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.08GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
The text was updated successfully, but these errors were encountered:
Hi @MilowB, I wonder if this is because you are also making use of the model in the main thread (i.e. where the strategy/server run) and it's causing that to use the whole GPU (by default with Tensorflow). Note that in the examples/simulation-tensorflow we set another enable_tf_gpu_growth()outside the start_simulation(). See this line. I hope this fixes your issue!
Describe the bug
I launch 100 clients supposed to learn how to classify images from the cifar-100 dataset.
I have 2 GPUs and 6 CPUs and enable gpu growth. Each client has access to 1 CPU and the whole GPU (2x32G VRAM). I expect to have enough GPU memory for this task!
At the first round a device is created with 32G:
2024-04-08 14:49:08.296728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 31141 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:1a:00.0, compute capability: 7.0
But later at the second round, a new one is created with only 494MB:
(DefaultActor pid=460472) 2024-04-08 14:49:32.679854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /device:GPU:0 with 494 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:1a:00.0, compute capability: 7.0
The OOM seems to come from a lack of memory on the 494MB device.
Why isn't the memory released after the first round?
Steps/Code to Reproduce
Expected Results
I expect the memory to be released after each round in order to run a new round on the same GPU.
Actual Results
The text was updated successfully, but these errors were encountered: