-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel dynamic memory is not released again #137
Comments
Just adding that I've experienced the same kernel memory leak with kernel 6.5. For me it only happens when the host uses the iGPU for graphics output (my use case is a regular-use desktop PC with a Windows VM, not a Proxmox server) while virtual functions are enabled – even when no VM is using any virtual functions. I never planned on passing through my dedicated NVIDIA card to the VM, so I just made sure the host uses the NVIDIA GPU always and never accesses the iGPU. Then virtual functions work fine without leaking kernel memory. I've since moved to kernel 6.6 but I don't know if the issue still persists since I'm happy with my current setup. |
@makoONE @brussig-tud Could you guys read my post and help me how I can check if there also kernel dynamic memory allocated? (What command do I find for this) And if this is the case so you have any idea how to disable sriov temporary? Do I uninstall the DKMS module or do I need to undo all steps? |
@devedse I'm not super knowledgable about containers and containerizing things. But if I read your post correctly, then you have SR-IOV enabled using this driver, but you don't actually use any virtual functions since you're not passing them on to VMs. Instead, you only actually use the SRIOV-enabled GPU from the host OS, since containers after all still technically run on the host. So yeah, it very much sounds like you're facing the same issue. You can check your kernel dynamic memory usage using the smem utility:
I don't have any output saved from when I tried, but my "kernal dynamic memory" value was 27GB once after just running a normal KDE desktop on the iGPU for about 2 hours with this module enabled. Just |
@brussig-tud , that's exactly the answer I was looking for. So I don't need to remove this from grub:
And also don't need to remove this file:
? |
@devedse The "vanilla" i915 driver will ignore the max_vfs kernel boot parameter, and the sysfs entry will just silently fail if the driver does not provide the endpoints, so yeah, you can leave them in place. I don't remember whether just not creating VFs via sysfs was enough to fix the memory leak, or if you also had to set max_vfs=0, or if you had to completely disable GuC scheduling altogether (which should also cause this driver to not leak memory). You can try narrowing it down further like this, but removing the DKMS module will surely prove or disprove the hypothesis that this driver is causing your memory leak and you can leave the other things there in case you need them later. |
@brussig-tud , Thanks for the explanation. To keep things further on topic, do you know any place to more casually discuss this stuff further? IRC/Discord? I'm curious what you all use SRIOV for. Edit:
So indeed I also seem to be using quite some kernel dynamic memory. |
@devedse In general, I think SR-IOV is mainly used on NICs as a sort of high-performance ethernet bridge for VMs. |
@brussig-tud , I just removed the dkms module and rebooted the system. Now the whole /dev/dri folder seems to be missing though. Am I missing the normal drivers or something to get the intel N100 working again? I played around a bit and I found out that reverting to kernel 6.2 seems to solve the issue. Does the 6.5 kernel not actually have an i915 driver included? |
@devedse I have actually no experience with Proxmox whatsoever, but that seems very unlikely to me (after all every other Debian-based distro usually packages the i915 driver for every officially available kernel version). You can try to \edit you should definitely check what driver is being assigned to the iGPU using If everything else fails, keeping the i915-sriov DKMS driver with |
Apparently the problem was that the "i915.ko" file seemed to be missing in the modules folder. I had to reinstall the kernel by doing the following:
That fixed my issues |
yes, I also encountered this issue, but after I rolled back the PVE kernel to 6.2.16-20-PVE, the memory usage was normal, and SRIOV could also be used normally. |
Could this be linked to this issue https://patchwork.kernel.org/project/intel-gfx/patch/BYAPR03MB4168C6D020B750EAF8021731ADE22@BYAPR03MB4168.namprd03.prod.outlook.com/ ? |
Since I have been using PVE 8.1 with kernel 6.5, I have noticed for some time that the kernel dynamic memory is not released again. Whenever I start a VM that has allocated the GPU and shut it down again, the host's memory display remains at about the same value as if the VM was still running. A check with smem shows that the memory is no longer allocated by any processes but to the kernel dynamic memory.
With PVE 8.0 and kernel 6.2 I never experienced the described behavior.
Is anyone else here affected or knows a solution?
The text was updated successfully, but these errors were encountered: