-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ExecSync did not return according to the timeout set in the request #10094
Comments
Sorry for missing the comment in closed #9568. @tallclair @rtheis I would like to explain more detail about shim IO handler first. The following chart is about setns process's stdout dataflow.
However, setns-process can have child processes or grandchild processes. In Currently, I don't have better solution to cleanup all the processes created by By default, there is no timeout to drain IO. If user wants to fastfail after timeout, we can set it to Side note: The setns-process and init container should take responsibility to cleanup the orphaned processes #10002 (comment) |
@fuweid thank you. That works as you've noted. I'm concerned with changing the value since you all left the default value to effectively disable the timeout. I'd like a reasonable timeout but don't want data loss either. What is your recommendation? |
I think it is the same problem. test function: It will exec exceed the time, and it will report a fail.In delete function. If the shim return no error , but the io still used by runc fork process , io.wait will wait long time. |
Hi @jokemanfire ,
Yes, the rust-shim opens fifo directly. Without sub-cgroup for each exec process, I don't have good solution to handle this in CRI-side. Ping @abel-von @Burning1020 for help on rust-shim issue, since they are maintainers on rust-shim. |
Hi @fuweid @Burning1020 @abel-von |
@jokemanfire I have tried to solve this problem in the two directions, but all failed:
So basically, I think there is no practical solution for rust-shimv2. As you mentioned to add a timeout to io.wait, maybe it could work. |
When container holds fifo directly, the write side of fifo must be hold by the processes created by exec-init process.
Just my two cents. Correctness is first priority. And if there is performance data, it will be more convincible. |
How about try to use copy-io first , it may not same with go-shim to implementation copy-io @Burning1020 . I will try to do this thing, resolve issues that differ from goruntime implementation。
And show the test performance data , if the performance is worth it . And I think change process.io.wait is more valuable. @fuweid |
Sorry I didn't follow you here. The write side of fifo is hold by processes in container instead of shim, while the CRI-containerd holds the read side of fifo. Even if you have timeout policy, the write side of fifo is still open. How does CRI-Containerd get the closed event? If you already have POC, please file pull request in rust-shim side, thanks |
Thanks for your patience , I know that the process is runc to start that , and hold the fifo write (such like |
@jokemanfire Sorry for missing this comment. I have article to describe fifo things for that in Chinese https://fuweid.com/post/2022-embedshim-kernel-is-my-sidecar/ (google translation for English should work :joy ). Hope it can help. Both containerd and containerd-shim don't wait for it forever. When we call the delete call to shim, containerd-shim will wait for it in 2 seconds and close it. And then containerd will get the notification. REF: https://github.com/containerd/containerd/pull/3691/files |
Thank you very much , this is very useful to me.
that code I have noticed , it shim close the pipe (If I didn't understand correctly) , but this is not applicable to ships that directly use FIFO 。 0.0 |
Description
While both kubernetes/kubernetes#123931 and #9568 have been closed, the problem still exists.
Steps to reproduce the issue
See kubernetes/kubernetes#123931 (comment) for the recreate steps.
Describe the results you received and expected
Exec timeout is honored.
What version of containerd are you using?
1.7.15
Any other relevant information
No response
Show configuration if it is related to CRI plugin.
No response
The text was updated successfully, but these errors were encountered: