Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected chdir invoked on container init and start #2772

Closed
Mossaka opened this issue Apr 25, 2024 · 9 comments · Fixed by #2780
Closed

Unexpected chdir invoked on container init and start #2772

Mossaka opened this issue Apr 25, 2024 · 9 comments · Fixed by #2780
Assignees

Comments

@Mossaka
Copy link
Contributor

Mossaka commented Apr 25, 2024

While investigating a performance issue, I observed that the working directory /run/containerd/io.containerd.runtime.v2.task/<namespace>/<containerid>/ becomes inaccessible or gets deleted after executing the shim::wait() call in the runwasi shim process. This deletion prevents the shim process from reading the address file to delete the shim socket. (e.g. ref code)

Logs

I ran bpftrace on unlink and unlinkat syscalls on that paths and found that youki inner process unlinks the bundle path before containerd calls delete-shim (before process 2611761 gets started).

Process started: /usr/local/bin/containerd-shim-wasmtime-v1 PID: 2611672
Process started: /usr/local/bin/containerd-shim-wasmtime-v1 PID: 2611681
PID 2611707 (youki:[2:INIT]): File unlink in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..
PID 2611681 (client_handler): File unlinkat in target directory: /run/containerd/wasmtime/default/testwasm
Process started: /usr/local/bin/containerd-shim-wasmtime-v1 PID: 2611761
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/.testwasm..
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/.testwasm..
PID 569984 (containerd): File unlinkat in target directory: .testwasm
PID 569984 (containerd): File unlinkat in target directory: .testwasm
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..
PID 2611658 (ctr): File unlinkat in target directory: testwasm-stderr
PID 2611658 (ctr): File unlinkat in target directory: testwasm-stdout
PID 2611658 (ctr): File unlinkat in target directory: testwasm-stdin

Specifically, this caught my attention: PID 2611707 (youki:[2:INIT]): File unlink in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..

Question:

I am raising this issue to try to understand why youki does that. This might be the reason why the shim process is not able to delete the socket address after the ttrpc server shuts down.

FYI: @utam0k @jprendes

@utam0k utam0k self-assigned this Apr 28, 2024
@utam0k
Copy link
Member

utam0k commented Apr 28, 2024

I want to make sure something before the investigation:
a. Don't the executor and post hook you passed to libcontainer call unlink?
b. Is there the smallest step to reproduce this?
c. May I ask you to give us before and after syscalls to help us understand?

@Mossaka
Copy link
Contributor Author

Mossaka commented Apr 29, 2024

I will try to reproduce this in youki, getting back to you later.

@Mossaka
Copy link
Contributor Author

Mossaka commented May 1, 2024

Okay I spent more time tracing where the root cause is, and found that after handling the Create request in runwasi, the shim process's current directory has been set to the container root_directory (e.g. /run/youki/<ns>/<id>) by youki at here. And after the Delete request, youki cleans up the container path, and so the shim process doesn't have a current directory anymore.

Question: why does youki chdir to the container directory at init and container_start?

@utam0k
Copy link
Member

utam0k commented May 1, 2024

@Furisto Hi, Thomas. I'd like to know about your comment https://github.com/containers/youki/pull/143/files#r673503679. Is this assuming that console_socket was a relative path?

@utam0k
Copy link
Member

utam0k commented May 1, 2024

Is there a problem here?

let csocketfd = match socket::connect(
csocketfd.as_raw_fd(),
&socket::UnixAddr::new(socket_name).map_err(|err| TTYError::InvalidSocketName {
source: err,
socket_name: socket_name.to_string(),
})?,

@Mossaka Mossaka changed the title Unexpected deletion of bundle path in runwasi shims form libcontainer Unexpected chdir invoked on container init and start May 1, 2024
@utam0k
Copy link
Member

utam0k commented May 3, 2024

Sorry, but I've created another PR to fix it.
#2780

@utam0k utam0k closed this as completed May 3, 2024
@utam0k utam0k reopened this May 3, 2024
@YJDoc2
Copy link
Collaborator

YJDoc2 commented May 13, 2024

Hey @Mossaka , The related PR will release soon, but I had a question with this issue -
You mentioned that

... youki at here. And after the Delete request, youki cleans up the container path, and so the shim process doesn't have a current directory anymore.
Question: why does youki chdir to the container directory at init and container_start?

I'm not sure why youki setting cwd in the container init process would have any issues with shim? The start, run and delete youki processes (created using youki create , youki start and youki delete resp.) would run independent of each other, so probably the only potential cause of the removed dir would be in the delete process right? What would be the issue with start and run processes (and by extension init )?

@Mossaka
Copy link
Contributor Author

Mossaka commented May 17, 2024

I'm not sure why youki setting cwd in the container init process would have any issues with shim?

The shim process's working directory has been changed by youki APIs, and that directory was never changed back (and in fact, was deleted in youki delete). The shim is not using youki as a CLI, but as a library. That's why it doesn't matter chdir is invoked in create, start, or delete steps. Once it's invoked, the shim process's global state is changed.

@YJDoc2
Copy link
Collaborator

YJDoc2 commented May 19, 2024

Hey, thanks for the explanation, you are completely right. I had missed the library invocation aspect completely!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants