New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enormous memory usage after batched forward passes with TensorFlow 2.16.1 (CPU) #19500
Comments
try wrapping your model with |
I tried running your snippet 20x and added a call to
Memory usage has higher variance than in Keras 2 (and is higher on average) but it is stable within a range (max: 1808, min: 1117, reached after 18 iterations), which indicates that there's no leak. Are you able to run a Python profiler to see what's taking memory? For good measure, here's what I get when I do the same with
Albeit the variance is much lower and the average is lower, this one does look leaky in the sense that it's monotonously increasing. |
@sirfz Replacing |
@fchollet Thanks for checking! I tried to reproduce your test, but failed so far, i.e., the memory usage is still high, even directly after import gc
import numpy as np
import psutil
import tensorflow as tf
model = tf.keras.applications.ResNet152V2()
images = np.zeros([20, 224, 224, 3], dtype=np.uint8)
for run in range(10):
memory_usage_in_MiB = psutil.Process().memory_info().rss / (1024 * 1024)
print(f"Memory usage after {run} run(s) before gc.collect() (in MiB): {memory_usage_in_MiB:.3f}", flush=True)
gc.collect()
memory_usage_in_MiB = psutil.Process().memory_info().rss / (1024 * 1024)
print(f"Memory usage after {run} run(s) after gc.collect() (in MiB): {memory_usage_in_MiB:.3f}", flush=True)
model(images)
(Dockerfile to reproduce)
Sorry, currently no. Are you? |
I suspect that this is more likely an issue on tensorflow or docker environment side. import psutil
import keras
import keras.applications.resnet_v2
model = keras.applications.resnet_v2.ResNet152V2()
images = keras.ops.zeros([1, 224, 224, 3], dtype="uint8")
for run in range(100):
model(images)
memory_usage_in_MiB = psutil.Process().memory_info().rss / (1024 * 1024)
print(
f"Memory usage after {run} run(s) (in MiB): {memory_usage_in_MiB:.3f}",
flush=True,
) I run the above script for all backends and here are the numbers:
My environment:
|
Oh, I ran my tests without any GPU. It's all CPU only. I've just expanded the issue title accordingly.
In the TensorFlow repo, I've been told to open the issue here. 😁 Regarding Docker: The memory problem happens for me not only in Docker, but also when I run on bare metal. |
Thanks for the detailed analysis. The lack of the issue with other eager backends, and the disappearance of the issue when using a tf.function, strongly indicate that the leak may be at the level of the TF eager runtime. It is also likely system dependent, since I can't observe it on my system, nor on Colab (I tried both with TF 2.15 and TF 2.16 with the latest Keras, and while the memory usage differs across the 2 TF versions, there isn't a leak either way). This isn't the first time we've seen memory leaks with the TF runtime (eager or graph). |
This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you. |
If I'm not mistaken, the issue is not solved yet. Or should we close it, because work continues in the corresponding issue in the TensorFlow repo? |
(Dockerfile to reproduce)
And it's not just
tf.keras.applications.ResNet152V2()
. It also happens (for example) withtf.keras.applications.inception_resnet_v2.InceptionResNetV2
andtf.keras.applications.inception_v3.InceptionV3
. And it also happens when using Python3.12.2
instead of3.11.8
.With TensorFlow
2.15.1
(instead of2.16.1
), however, the memory usage does not explode:(Dockerfile to reproduce)
Workaround: Replacing
model(images).numpy()
withmodel.predict(images)
improves the situation, i.e., it only leaks a little bit.The text was updated successfully, but these errors were encountered: