Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreCLR fails to run when mlock is unavailable #10568

Closed
omajid opened this issue Jun 25, 2018 · 3 comments
Closed

CoreCLR fails to run when mlock is unavailable #10568

omajid opened this issue Jun 25, 2018 · 3 comments
Labels
area-PAL-coreclr os-linux Linux OS (any supported distro)

Comments

@omajid
Copy link
Member

omajid commented Jun 25, 2018

CoreCLR uses mlock during startup and fails if mlock fails with EPERM. Generally, that's not a problem.

However, many Linux distributions are starting to use systemd-nspawn for building code. This creates a chroot where programs have restricted capabilities. Specifically they do not have CAP_IPC_LOCK, which means they can't use mlock.

Wwhen mlock doesn't work, coreclr fails to start. This shows up in an strace as something like:

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd542bb000
mlock(0x7fbd542bb000, 4096)       = -1 EPERM (Operation not permitted)
write(2, "Failed to initialize CoreCLR, HR"..., 49) = 49

As a result, this makes it basically impossible to build coreclr in some Linux distribution build systems.

@omajid
Copy link
Member Author

omajid commented Jun 25, 2018

cc @tmds @alucryd

@omajid
Copy link
Member Author

omajid commented Jun 25, 2018

See dotnet/source-build#285 (comment) and rpm-software-management/mock#186 for some examples where this is hitting some builds

@janvorli
Copy link
Member

The mlock is necessary for proper behavior of the FlushProcessWriteBuffers PAL function that is crucial for ensuring reliable runtime suspension for GC. See https://github.com/dotnet/coreclr/blob/e6ebea25bea93eb4ec07cbd5003545c4805886a8/src/pal/src/thread/process.cpp#L3095-L3098 for description of the reason.
On Linux 4.3 and higher, there is a sys_membarrier syscall that we could use as an alternate mechanism to implement FlushProcessWriteBuffers. Issue #4501 is tracking that. @sdmaclea tried to implement it and tested it on ARM64 . He has found that the performance was really bad and that running time of our ~11000 coreclr tests was about 50% longer. However, no testing was done on other hardware, so it was not clear if the performance issue is ARM64 specific or an overall problem.
Interestingly enough, I've just discovered the following article describing performance issues with the sys_membarrier: https://lttng.org/blog/2018/01/15/membarrier-system-call-performance-and-userspace-rcu/. The reason is that the syscall internally waits until all running threads on the system have gone through a context switch, which could take tens of milliseconds. But the good news mentioned in this article is that starting with Linux 4.14, there is a new flag that can be passed to the sys_membarrier syscall and that makes it to use IPI to implement the memory barrier semantics. And that is much faster. So we should give it a try.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-PAL-coreclr os-linux Linux OS (any supported distro)
Projects
None yet
Development

No branches or pull requests

2 participants