Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: starter: overwrite glibc's internal tid cache on clone() #2837

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dtrudg
Copy link
Member

@dtrudg dtrudg commented Apr 19, 2024

Adapted from: opencontainers/runc#4247

Execution of a container using a PID namespace can fail on certain versions of glibc when Singularity is built with Go 1.22.

This is due to Go 1.22 performing calls using pthread_self which, from glibc 2.25, is not updated for the current TID on clone.

Fixes #2677


Original runc explanation:

Since glibc 2.25, the thread-local cache of the current TID is no longer updated in the child when calling clone(2). This results in very unfortunate behaviour when Go does pthread calls using pthread_self(), which has the wrong TID stored.

The "simple" solution is to forcefully overwrite this cached value. Unfortunately (and unsurprisingly), the layout of "struct pthread" is strictly private and can change without warning.

Luckily, glibc (currently) uses CLONE_CHILD_CLEARTID for all forks (with the child_tid set to the cached &PTHREAD_SELF->tid), meaning that as long as runc is using glibc, when "runc init" is spawned the child process will have a pointer directly to the cached value we want to change. With CONFIG_CHECKPOINT_RESTORE=y kernels on Linux 3.5 and later, we can simply use prctl(PR_GET_TID_ADDRESS). For older kernels we need to memory scan the TLS structure (pthread_self() returns a pointer to the start of the structure so we can "just" scan it for a field containing the current TID and assume that it is the correct field).

Obviously this is all very horrific, and if you are reading this in the future, it almost certainly has caused some horrific bug that I did not forsee. Sorry about that. As far as I can tell, there is no other workable solution that doesn't also depend on the CLONE_CHILD_CLEARTID behaviour of glibc in some way. We cannot "just" do a re-exec after clone(2) for security reasons.

Fixes opencontainers/runc#4233 Signed-off-by: Aleksa Sarai cyphar@cyphar.com

Before submitting a PR, make sure you have done the following:

@dtrudg dtrudg self-assigned this Apr 19, 2024
@dtrudg dtrudg force-pushed the issue-2677 branch 2 times, most recently from bea304e to 5ed2747 Compare April 19, 2024 11:18
Adapted from: opencontainers/runc#4247

Execution of a container using a PID namespace can fail on certain
versions of glibc when Singularity is built with Go 1.22.

This is due to Go 1.22 performing calls using pthread_self which,
from glibc 2.25, is not updated for the current TID on clone.

Fixes sylabs#2677

-----

Original runc explanation:

Since glibc 2.25, the thread-local cache of the current TID is no
longer updated in the child when calling clone(2). This results in
very unfortunate behaviour when Go does pthread calls using
pthread_self(), which has the wrong TID stored.

The "simple" solution is to forcefully overwrite this cached value.
Unfortunately (and unsurprisingly), the layout of "struct pthread"
is strictly private and can change without warning.

Luckily, glibc (currently) uses CLONE_CHILD_CLEARTID for all forks
(with the child_tid set to the cached &PTHREAD_SELF->tid), meaning
that as long as runc is using glibc, when "runc init" is spawned
the child process will have a pointer directly to the cached value
we want to change. With CONFIG_CHECKPOINT_RESTORE=y kernels on
Linux 3.5 and later, we can simply use prctl(PR_GET_TID_ADDRESS).
For older kernels we need to memory scan the TLS structure
(pthread_self() returns a pointer to the start of the structure
so we can "just" scan it for a field containing the current TID
and assume that it is the correct field).

Obviously this is all very horrific, and if you are reading this
in the future, it almost certainly has caused some horrific bug
that I did not forsee. Sorry about that. As far as I can tell,
there is no other workable solution that doesn't also depend on the
CLONE_CHILD_CLEARTID behaviour of glibc in some way. We cannot
"just" do a re-exec after clone(2) for security reasons.

Fixes opencontainers/runc#4233
Signed-off-by: Aleksa Sarai cyphar@cyphar.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

runc doesn't work with go1.22 SIGSEV (signal 11 error) with Go 1.22.0 and Ubuntu 20.04 / Debian 10
1 participant