Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Req: make wasi_socket_working more configurable instead of hard coding retry times #263

Open
Mossaka opened this issue Apr 23, 2024 · 6 comments

Comments

@Mossaka
Copy link
Member

Mossaka commented Apr 23, 2024

I was looking into an issue regarding the slow startup time of runwasi shims, and by inspecting the traces, I found that the wait_socket_working(&address, 5, 200) call always took a second to finish if the address is there but for some reason not able to establish a connection to the ttrpc client correctly.

code ref: https://github.com/containerd/rust-extensions/blob/main/crates/shim/src/synchronous/mod.rs#L471-L476

I would appreciate if you could clarify the motivation behind call to wait_socket_working. Why coulnd't we proceed to remove the socket if it's already in use and then call start_listener immediately? Also, what's the motivation behind the one second wait time, which has a significant impact on the startup time of the shims?

Here is a modified code that doesn't wait for sockets:

#[allow(clippy::let_unit_value)]
let _listener = match start_listener(&address) {
    Ok(l) => l,
    Err(e) => {
        if e.kind() != std::io::ErrorKind::AddrInUse {
            return Err(Error::IoError {
                context: "".to_string(),
                err: e,
            });
        };
        if socket_exist(&address)? {
            remove_socket(&address)?;
        }
        start_listener(&address).map_err(io_error!(e, ""))?
    }
};
@Mossaka
Copy link
Member Author

Mossaka commented Apr 23, 2024

Does the shim need to handle the deletion of the shim socket address (e.g. /run/containerd/<hash>)? I am not seeing where in the runc-shim delete the socket address.

@mxpv
Copy link
Member

mxpv commented Apr 24, 2024

Yeah, it looks like it just exhausts all retries (which is 1 second) and then removes socket. IMO we can remove it.

pinging @abel-von and @Burning1020 . Any chance you have more context here?

@abel-von
Copy link
Contributor

I think this is for support of multi containers in one pod, containerd will call shim start multiple times, We should have to return the address directly rather than removing the socket and listen again.
I don't know why can't you establish the connection?

@Mossaka
Copy link
Member Author

Mossaka commented Apr 30, 2024

@abel-von that makes sense, but I think the retry timeout is unusually long - a whole second! What are the reasons for trying 200 times for 5ms each? Can we reduce the number of retries?

@abel-von
Copy link
Contributor

abel-von commented May 9, 2024

@Mossaka maybe we can make it a configurable value, but I'd like to know why is the connection not established everytime?

@Mossaka
Copy link
Member Author

Mossaka commented May 16, 2024

maybe we can make it a configurable value

I like that, and changed this issue's title to reflect the request.

but I'd like to know why is the connection not established

There is a bug in runwasi that prevents it from deleting the socket address after ttrpc server closes. Thus it hangs over connection to the socket address.

@Mossaka Mossaka changed the title Perf: shim::spawn takes more than one second to finish Req: make wasi_socket_working more configurable instead of hard coding retry times May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants