-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process memory grows when loading parquet files even with -Xmx #876
Comments
The guide mentions this: https://docs.atoti.io/latest/deployment/deployment_setup.html#off-heap-size
So in my case I could expect the growth to be 32 + 32 ~ 64 GB. But what I'm seeing instead is 75 GB, and I'm afraid it would continue to rise as I load more files. |
Hello @dufferzafar Anyway, there is no possible trick here. The parameters you passed will correctly configure the maximal amount of memory available to the JVM powering atoti. Unless there are major bugs in this JVM - very unlikely - you have correctly limited the memory available and this limit cannot be exceeded. |
Unfortunately, I am not expert enough in linux measurement to tell you exactly why there is such a difference between what our logs report and what you are seeing in htop. |
I tried out smaller numbers as well: 512m, 1G, 2G, 4G & as you said Atoti correctly OOMed. I'm now running an instance with -Xmx5G and the RSS memory was capped to 12.5G after loading the same 200 parquet files that we were loading before. So it seems max memory is actually capped ~ But it's good to know that the load times are not really affected when we're using smaller memory limits. I'll try loading more data concurrently & report if anything else seems amiss. |
Steps to reproduce
As a continuation of #866 I have this snippet to load parquet files (compressed or otherwise) in a separate thread.
Actual Result
Process memory continues to grow when my loader thread runs!
The process starts with 40 GB VIRT & 2.6 GB RSS (observed via
htop
)Initial server log indicates other values (1GB heap + 3GB direct)
Now as the loader thread runs, and loads a file i see that the total memory continues to rise & only stops rising when the loader stops.
At the end it reached ~85 GB VIRT & 74GB RSS (seen via
htop
)But the last line of the server log says that heap being used is 12GB and direct is 16GB.
So the numbers don't add up. The sum of all such memory log lines is within the 32GB upper-bound that I'd set initially.
But the process is actually taking much much more RAM.
Expected Result
I expected the
-Xmx
option to set a maximum RAM size on the process.But I think that is just the JVM heap max? I read somewhere that Atoti allocates data off-heap as well. Is that what is happening?
Is there a way to restrict the TOTAL memory usage of the process?
Could it perhaps be a "leak" in the parquet loader?
Environment
atoti: 0.8.10
Python: 3.12.2
Operating system: linux
Machine being tested on has 32 cores & 256 GB RAM
Logs
I have detailed logs as well, please let me know what additional info you require and I'll be happy to help!
The text was updated successfully, but these errors were encountered: