Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas - Feel free to post ideas. #10

Open
utam0k opened this issue May 17, 2021 · 92 comments
Open

Ideas - Feel free to post ideas. #10

utam0k opened this issue May 17, 2021 · 92 comments

Comments

@utam0k
Copy link
Member

utam0k commented May 17, 2021

Feel free to post ideas.

@utam0k utam0k pinned this issue May 17, 2021
@YJDoc2
Copy link
Collaborator

YJDoc2 commented May 17, 2021

Hey, this seems like a really cool project, specially as I am interested in both rust and stuff related to OS.
I was going through the code and even though it seems quite understandable, it'd be great if there was a high level guide or even comments giving an outerview. I saw that the design and implementation section is TBD in readme, and even though I don't have much knowledge of container runtime, I would like to help writing docs or guide as it'd help me to understand it as well. Do you have anything specific in mind regarding such guide, or maybe can you open an issue regarding this, so we can discuss this there? Thanks!

@utam0k
Copy link
Member Author

utam0k commented May 17, 2021

@YJDoc2
Thanks for the comment!
Currently, I have no idea about this yet. If you have any good ideas I'd love to hear them.

I saw that the design and implementation section is TBD in readme, and even though I don't have much knowledge of container runtime, I would like to help writing docs or guide as it'd help me to understand it as well.

I think it might be a good idea to start with commenting on the code.

@YJDoc2
Copy link
Collaborator

YJDoc2 commented May 17, 2021

I think as implementation will grow, the documentation will become complicated, so rather than adding in Readme, having a dedicated 'thing' will be better. One way could be adding a repo wiki like this : https://github.com/dthain/basekernel/wiki
Or other option I can think of is having an mdbook like rust language, and maybe hosting it on github pages for this repo.

I will try to start commenting and make a PR. Can you give me any links for references of this? Also, in case of any doubts I'll message in thread or on twitter if you're fine with it.

@utam0k
Copy link
Member Author

utam0k commented May 17, 2021

It's nice!

I will try to start commenting and make a PR.

You would use https://docs.rs, right?

Of course! I welcome questions from you on Twitter and elsewhere.

Also, in case of any doubts I'll message in thread or on twitter if you're fine with it.

@YJDoc2
Copy link
Collaborator

YJDoc2 commented May 17, 2021

Hey,As far as I know, docs.rs automatically makes and hosts the html documentation using the doc comments in the source, when crates are uploaded on crates.io .
I was talking about adding the in-source comments to explain structures and fields and functions etc, using doc comments according to conventions in
https://doc.rust-lang.org/book/ch14-02-publishing-to-crates-io.html
https://github.com/rust-lang/rfcs/blob/master/text/1574-more-api-documentation-conventions.md#appendix-a-full-conventions-text
If you had something else in mind, let me know because I haven't worked specifically with docs.rs before, even though I have done documentation commenting for some of my projects.

@utam0k
Copy link
Member Author

utam0k commented May 18, 2021

@YJDoc2
I feel that docs.rs also automatically generates the descriptions for struct and functions, etc.
https://docs.rs/futures/0.3.15/futures/io/struct.AllowStdIo.html
This is the code that will generate the comments for this document
https://docs.rs/futures-util/0.3.15/src/futures_util/io/allow_std.rs.html#43

At any rate, the current situation is that there is nothing to comment on, so adding a comment would be very meaningful and appreciated.

@utam0k
Copy link
Member Author

utam0k commented May 19, 2021

I created utam0k#14

@tsturzl
Copy link
Collaborator

tsturzl commented May 20, 2021

I was thinking it might be possible to use async/await to concurrently handle cgroup controller configuration. This might be a pretty minor improvement to performance since these IO operations are relatively small, but might be worth looking into.

@utam0k
Copy link
Member Author

utam0k commented May 22, 2021

@tsturzl
That's a very good idea. I was actually wondering if I could use some of that.
According to railcar's article, it seems that creating a cgroup waits for a kernel lock.
I'm thinking that using async/await for cgroups might actually improve performance a bit.
Are you interested in this?

@tsturzl
Copy link
Collaborator

tsturzl commented May 22, 2021

@utam0k I've been reading through some of the runc source while writing the memory controller. It seems like order of writes to certain files within a controller matters in some situations. I believe it might have to do with validation between 2 values in the kernel. See: https://github.com/opencontainers/runc/blob/master/libcontainer/cgroups/fs/memory.go#L89

I'd be happy to take look into it. Maybe once the cgroups controllers all implemented so I don't disrupt anyone's work. I think perhaps a good starting point might be to just ensure order of writes within each controller. That way each controller can be configured concurrently, but still ensure writes within a controller are happening in a certain order to avoid validation issues.

@utam0k
Copy link
Member Author

utam0k commented May 22, 2021

@tsturzl
Your ability to read the runc code so quickly is amazing! I'm still getting used to it and can only read what I need little by little.
The use of async/await is a great advantage that youki has implemented in Rust and I would love to incorporate it.
Can you create an issue about it? I'll assign it to you.
And when it's done, I'd love to invite you to be a member of this repository and work with you.

@stappersg
Copy link

IPv6 support. See also #24

@tsturzl
Copy link
Collaborator

tsturzl commented May 25, 2021

Should Youki come up with some kind of central objectives or further out design goals? Perhaps it's too soon to tell, but some easier objectives could be just an emphasis on safety and speed. Eventually it might be nice to have some kind of goal beyond just having a workable runtime, since it seems like we're surprisingly close to having a totally functional runtime. Something to think about and discuss maybe?

@utam0k
Copy link
Member Author

utam0k commented May 26, 2021

@tsturzl
I think it's a great time to think about some big final goals.
When I first started making it, I started with the same thoughts as those in the railcar release blog. So basically, I think there is a big point in implementing it in Rust, just as there is in the linux kernel. It's a big advantage that Rust can break through the linguistic difficulties of Go and C, which are the current typical implementations.
https://blogs.oracle.com/developers/building-a-container-runtime-in-rust

Also, I would like to challenge performance.
However, if you have any other good goals I would love to hear them. I think youki right now is a pretty good place to try it out as it is not yet in the practical stage.
I think it is of course possible to offer it as a crate as well as a crun.
If you have any interesting challenges, I'd love to hear about them.

@utam0k
Copy link
Member Author

utam0k commented May 26, 2021

As one major policy, we would like to consider a style of preparing resources that can be prepared in advance, instead of preparing resources at the time of container creation.
In particular, I would like to consider whether this can be done with cgroups.
I would like to consider if it is possible to prepare a sub-command for pre-preparation or prepare some resources at the time of initial creation, and if there are pre-prepared resources, use them.

@tsturzl
Copy link
Collaborator

tsturzl commented May 26, 2021

@utam0k I saw this in the blog post for railcar, but I'm really curious how this is done. I wonder if updating a cgroup is less costly than creating a new one, so maybe always keeping a cgroup on standby ready to have the last portion of configuration done on the next container launch. Or perhaps some notion of caching some resources for reuse. It'll be interesting to profile some of these approaches.

@utam0k
Copy link
Member Author

utam0k commented May 26, 2021

@tsturzl For example, how about starting to create cgroups asynchronously at the beginning of the create command as a first step?

@tsturzl
Copy link
Collaborator

tsturzl commented May 26, 2021

@utam0k it could be interesting to make all of youki's IO operation async, and then we could possibly kick off cgroups and the rest of the startup concurrently. It's possible with async runtimes like tokio to do m:n threading, where blocking operations can be pushed off on another thread in a thread pool. The question for something like that is if thread startup defeats the purpose entirely since the pool won't exist for long or see much reuse of threads, but I like the idea of not blocking any of the work whenever possible.

@utam0k
Copy link
Member Author

utam0k commented May 26, 2021

@tsturzl
For now, I think it would be better to apply async/await in the current processing order, and then separate the steps of creating and applying cgroups.
I think the project in this issue will be an interesting development. If this can be achieved, I think it will be a great feature of youki.
utam0k#17

@utam0k
Copy link
Member Author

utam0k commented Jun 11, 2021

@tsturzl
Hi! I found clone(3) system call. This is a very interesting feature for cgroups and I would love to use it.
I'd like to hear your opinions and level of interest.
https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.7-clone3-new-cgroup

@tsturzl
Copy link
Collaborator

tsturzl commented Jun 11, 2021

@utam0k This seems really useful! I'm under the impression that youki forks itself twice current, once to essentially create the namespace and another to act as the init process and handle some of the container startup. I'm not fully read up on how this is all done, but it seems like this could potentially save us from forking out the child process and only forking out the init process. Am I correct on this? If we were to implement this would we want to support older kernel versions and have the features selected at build time? Seems like with this and some of the things I've discussed with async file operations we are starting to look towards newer kernel features a lot. It almost begs the question of whether we should consider Youki on the bleeding edge and just require that you use a modern kernel to run it, or if we should build out support for old kernels in addition to some of these new features. It might make sense to focus on supporting the latest and greatest kernel features since it might take a while for Youki to see any kind of adoption, and by the time that happens it might be more common place that people are running the kernel versions that support Youki. Development efforts could be slowed by trying to keep things backwards compatible with old kernel versions.

@tsturzl
Copy link
Collaborator

tsturzl commented Jun 11, 2021

Frankly the more I look at supporting async file operations the more lib_uring seems to be the better option. Currently mio support epoll, however in epoll alone doesn't provide the feature set needed to do async file operations. So currently mio, and thus tokio, doesn't support any means of doing async file IO. Linux has actually 2 different AIO implementations one that is referred to as POSIX AIO which is apparently not well implemented, and then libAIO which is Linux specific. I believe the latter option supports some notion of passing a function pointer into the kernel for a callback. Currently a project implements this for mio: https://github.com/asomers/mio-aio
This project is pretty small though, and the tokio community seems to be building up to supporting io-uring. They already have a low level crate for this, and have a proposal for using that as an optional API for async IO in the future.

So it seems like newer kernel features really seem useful to us. It would also put us out ahead of both runc and crun in terms of efficiency. Maybe this is a discussion worth having, what kernel versions do we want to support?

@tsturzl
Copy link
Collaborator

tsturzl commented Jun 11, 2021

Getting back to the topic though. We use either the nix crate to handle our forks, perhaps he can make a PR to the nix crate to support this there? The support already seems to be there in libc since libc is pretty much raw bindings to the system's libs.

@utam0k
Copy link
Member Author

utam0k commented Jun 12, 2021

@tsturzl
Fortunately, there is no demand yet to use youki with older kernels, so I want to support newer linux kernels as much as possible. Let's leave that to runc. How about targeting the widely used ubuntu20.04 as a standard here? That is the linux kernel 5.4.
io_uring has been around since 5.1, so let's use it aggressively.
clone3 is too up-to-date since it is from 5.7, but it seems to have a lot of advantages for youki. So I'd like to do something to support both the current fork style.

I believe that actively using the latest kernel features is one of the features of youki that could be written in the README.

I would like to make contributions to nix as much as possible.

@utam0k
Copy link
Member Author

utam0k commented Jun 12, 2021

@Furisto, please let me know if you have any thoughts on this.

@tsturzl
Copy link
Collaborator

tsturzl commented Jun 12, 2021

@utam0k I was actually going to suggest the same! Just track the kernel version of the latest Ubuntu LTS. I think tracking kernel improvements would be useful and a good angle for youki.

@tsturzl
Copy link
Collaborator

tsturzl commented Jun 12, 2021

@utam0k I've been hacking around with lib-uring tonight, and while I think I have a pretty clear path forward here I think working on it while we're trying to get cgroups finished up is going to be contentious and result in a lot of conflicts that I'll probably spend more time than I'd like resolving. I think my effort now might be best spent trying to push the ball forward on cgroups v2. I think @Furisto did a great job laying the ground work, but it hasn't yet garnered a lot of attention from contributors yet.

@utam0k
Copy link
Member Author

utam0k commented Jun 12, 2021

@tsturzl That's great!
Talk to him about joint work and try it.
However, I'm excited about this feature and can't wait to see it.

@utam0k
Copy link
Member Author

utam0k commented Jan 6, 2022

@darleybarreto Thanks for telling me! I'll check it later ;)

@timchenxiaoyu
Copy link

need rust shim for memory friendly!

@titaneric
Copy link

Hi, is it possible to run the youki by calling gRPC directly?

I am not sure is it suppose to be provided by youki or containers.

As far I know, containers provide OCI command to run the container, and youki is a container runtime implemented OCI.

However I notice that youki also provide crate to run the container, too, which is in here.

I want to have the most native way to run the container directly instead of calling docker run, as fast as possible.

Rust docker client libray like shiplift and bollard are good, but I am looking for more advanced one.

Could you guys give me some advices?

@yihuaf
Copy link
Collaborator

yihuaf commented Mar 14, 2022

Hi, is it possible to run the youki by calling gRPC directly?

Can you elaborate on what do you mean? Are you looking for something similar to the docker backend? There is a possibility that you can use CRI interface with cri-o or containerd. The CRI apis are meant for kubelet consumption, but it is a well defined gRPC interface.

If you are looking to just launch containers using the OCI interface, you can build something on top of youki. Youki supports the OCI spec and doesn't provide grpc out of box. It is a low level components compared to kubelet and docker.

I want to have the most native way to run the container directly instead of calling docker run, as fast as possible.

Again, can you elaborate here? What do you mean by more native and fast here? Can you explain your use cases?

I am looking for more advanced one.

Again, OCI compared to docker is not necessarily more advanced. It is a lower level of abstraction.

@tsturzl
Copy link
Collaborator

tsturzl commented Mar 14, 2022

@yihuaf We have yet to do a full cri-o or containerd evaluation yet. I'm supposed to be doing the containerd evaluation, but a lot of things have been changing to support CI testing for different high level runtime. Last I heard was CRI-O is mostly working.

@titaneric
Copy link

thank you guys. I did not expect to receive these feedback so quickly. Obviously, I am consufed between the meaning of OCI and CRI here.

Let me rephrase my question. I want a Rust library to run a container launched by OCI compatible runtime instead of calling docker or nerdctl command directly.

My use case is I want to rewrite the backend of Rust Playground by myself and I want to minimize the burden of running container who executing the potentially evail code. The container runtime including youki (fast start-up)and gvisor(satefy sandbox) are in my candidate list.

Any comments are helpful

@yihuaf
Copy link
Collaborator

yihuaf commented Mar 15, 2022

The first question would be to understand what isolation level is good enough for the "evil" code. Do we need VM-level isolation or hardened container-level isolation is good enough? If you want VM-level isolation, I would suggest you take a look at firecracker. You would need to decide on the trade-off between startup time vs. security.

Assuming we want to use containers. The first question is if we want to use OCI, such as runc or youki. OCI does not take care of downloading images, unpacking images into OCI runtime bundle, life cycle, and a few other things. There is currently a gap between what Youki offers (OCI) and what docker or CRI offers, and I believe there is currently no rust solution in this space. Currently, I have a closed-sourced project using skopeo + umoci + runc (Youki) and some rust code glue to close the gap, and maybe that is good enough for your use case.

Now, if you decided that it is the right tool to move to OCI level, then using Youki as a library or as a CLI is good. Note, you will have to take care of creating the OCI bundle in your own code. You can have a rust container ready on a host, have it as a base, send the rust code to the host, union fs to create a new rootfs with the code, create a OCI bundle config, and call Youki with the config.

With that being said, since you mentioned gRPC, I suspect your usecase is at a higher level of abstraction compared to OCI. Without knowing more of your requirement, I would start with docker or CRI (containerd, cri-o).

Side note: There is a lack of rust alternatives in the higher level of the container ecosystem. A lot of people working on this project would love to contribute. For example, there is currently no good library to manipulate container images like the skopeo and umoci. Personally, I wish I can have more time to work on some of these ideas.

@titaneric
Copy link

Again, thanks for thorough explanation and sharing ideas.

I would like to give firecracker a try (VM for safety, fast start-up and REST API) and have reference to great post by jvns.

Still new in Rust, container, and VM, but I am willing to take part into the Rust community and container ecosystem. Looking forward to some possibility if I am ready .

Thank you all, really

@utam0k
Copy link
Member Author

utam0k commented Mar 16, 2022

@timchenxiaoyu @yihuaf @tsturzl Thanks for the great conversation! I've read.

@zamazan4ik
Copy link

zamazan4ik commented Sep 24, 2023

Here I am posting an idea for optimizing Youki with Profile-Guided Optimization (PGO). Recently I started evaluating PGO across multiple software domains - all my current results are available here: https://github.com/zamazan4ik/awesome-pgo . For Youki I did some quick benchmarks on my local Linux machine and want to share the actual performance numbers.

Test environment

  • Fedora 38
  • Linux kernel 6.4.15-200.fc38.x86_64
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Rust rustc 1.72.0 (5680fa18f 2023-08-23)
  • Youki version: latest commit (646c1034f78454904cc3e1ccec2cd8dc270ab3fd commit) in the main branch

Benchmark

As a benchmark, I use the suggested in the README file workload with sudo ./youki create -b tutorial a && sudo ./youki start a && sudo ./youki delete -f a

youki_release is built with just youki-release. PGO optimized build is done with cargo-pgo (cargo pgo build + run the benchmark with the Instrumented Youki + cargo pgo optimize build). As a training workload, I use the benchmark itself.

Results

The results are presented in hyperfine format. All benchmarks are done multiple times, in different order, etc - the results are reproducible.

sudo hyperfine --prepare 'sudo sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' --warmup 100 --min-runs 500 'sudo ./youki_release create -b tutorial a && sudo ./youki_release start a && sudo ./youki_release delete -f a' 'sudo ./youki_optimized create -b tutorial a && sudo ./youki_optimized start a && sudo ./youki_optimized delete -f a'
Benchmark 1: sudo ./youki_release create -b tutorial a && sudo ./youki_release start a && sudo ./youki_release delete -f a
  Time (mean ± σ):      78.6 ms ±   3.7 ms    [User: 11.2 ms, System: 43.9 ms]
  Range (min … max):    70.9 ms …  97.8 ms    500 runs

Benchmark 2: sudo ./youki_optimized create -b tutorial a && sudo ./youki_optimized start a && sudo ./youki_optimized delete -f a
  Time (mean ± σ):      77.4 ms ±   3.6 ms    [User: 10.9 ms, System: 44.1 ms]
  Range (min … max):    70.6 ms …  90.0 ms    500 runs

Summary
  sudo ./youki_optimized create -b tutorial a && sudo ./youki_optimized start a && sudo ./youki_optimized delete -f a ran
    1.02 ± 0.07 times faster than sudo ./youki_release create -b tutorial a && sudo ./youki_release start a && sudo ./youki_release delete -f a

Just for reference, I also share the results for Instrumentation mode:

LLVM_PROFILE_FILE=/home/zamazan4ik/open_source/youki/target/pgo-profiles/youki_%m_%p.profraw sudo hyperfine --prepare 'sudo sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' --warmup 10 --min-runs 100 'sudo ./youki_instrumented create -b tutorial a && sudo ./youki_instrumented start a && sudo ./youki_instrumented delete -f a'
Benchmark 1: sudo ./youki_instrumented create -b tutorial a && sudo ./youki_instrumented start a && sudo ./youki_instrumented delete -f a
  Time (mean ± σ):     161.1 ms ±   3.3 ms    [User: 20.3 ms, System: 116.8 ms]
  Range (min … max):   154.8 ms … 170.7 ms    100 runs

According to the tests, PGO helps with achieving quite better performance (1-2%). Not a great win but it's not bad "just" for a compiler option. On a scale, even 1% is a good thing to achieve.

Further steps

If you think that it's worth it, I think we can perform more robust PGO benchmarks for Youki. And then document the results of the project. So other people will be able to optimize Youki for their own workloads.

@utam0k
Copy link
Member Author

utam0k commented Sep 25, 2023

@zamazan4ik I am interested in PGO. First of all, may I ask you to create about using PGO?

If you think that it's worth it, I think we can perform more robust PGO benchmarks for Youki. And then document the results of the project. So other people will be able to optimize Youki for their own workloads.

It sounds great to me 💯 Personality, I want to learn PGO. Let's give it a try!

@zamazan4ik
Copy link

Sure! Here it is: #2386

@the8472
Copy link

the8472 commented Dec 27, 2023

More of a question, does running youki with terminal: false + detach have the same properties as runc's detached passthrough mode in that it passes file descriptors 0-2 directly to the child and there remains no shim process between the parent and the containerized child?

@utam0k
Copy link
Member Author

utam0k commented Dec 31, 2023

More of a question, does running youki with terminal: false + detach have the same properties as runc's detached passthrough mode in that it passes file descriptors 0-2 directly to the child and there remains no shim process between the parent and the containerized child?

Thanks for your question. I couldn't understand what you pointed at shim process. Does it mean double-fork or containerd-shim?

@the8472
Copy link

the8472 commented Jan 1, 2024

If process A uses youki to spawn containerized process B then anything sitting between A and B in the process tree would be a shim process. Be it conmon, containerd-shim or anything else. Reparenting to a daemon would also be undesirable.

@utam0k
Copy link
Member Author

utam0k commented Jan 1, 2024

@the8472 As far as I know, youki doesn't have this option. May I ask you to create an issue and implement it?

@gleicon
Copy link

gleicon commented Jan 9, 2024

Not much an idea but I wanted to run my gpu workloads using youki. It's so practical and it seems that it just missing a way to share or access GPUs. Do you know if anyone managed to do it ? I can help coding if there are some directions !

@YJDoc2
Copy link
Collaborator

YJDoc2 commented Jan 9, 2024

Not much an idea but I wanted to run my gpu workloads using youki. It's so practical and it seems that it just missing a way to share or access GPUs. Do you know if anyone managed to do it ? I can help coding if there are some directions !

Hey, while I investigate more on this, can I ask you to check something : If my understanding is correct, the nvidia gpu support (specifically nvidia) is done by container pre-start and such hooks, and does not require any special functionality from runtime at all. Can you check if you can run a gpu workload (simply listing gpu/getting gpu stats would suffice) on your machine using docker+runc/crun and then try the same with youki? If there is no special runtime facility required then both should work similarly.

Also, am I understanding your question wrong? Do you mean running gpu workload directly with youki without having something like docker?

@gleicon
Copy link

gleicon commented Jan 9, 2024

I mean running the GPU workload and ditch docker for good, for instance running llama.cpp in a simple container. I will try to run the tests you proposed. I see that nvidia has a framework and that https://github.com/Arc-Compute/LibVF.IO/ abstract a good part of that for other GPUs. I think you are right, you attribute the capability to the namespace/container at the setup time then through MMIO or any other magic channel libraries can "see" the GPU. Does that makes sense ?

@YJDoc2
Copy link
Collaborator

YJDoc2 commented Jan 9, 2024

you attribute the capability to the namespace/container at the setup time then through MMIO or any other magic channel libraries can "see" the GPU. Does that makes sense

Basically yes ; IIUC, the main issue with using nvidia gpu like any other device is that because nvidia drivers are non-gpl/proprietary , they are not registered like other devices in the kernel. The driver then does some "stuff" to make the gpu appear as a device. However when it comes to runtime, the 'not registering properly' causes issues in mounting that device into the container. I saw a couple of implementation problem for directly mounting gpu /dev/... files into container in runc's issues and PRs. All in all, it is seems complected.

Unless there is a strong request for youki-native support of gpu, I don't think we will be doing anything soon. Another major hurdle here is that there are no good/supported emulators for validation of our code, which means the developer and the reviewer both must have gpus to develop code and test it. It also is not testable in CI. Unless there is also a certain use for having support of such feature ; similar to how we currently support wasm : runwasi is using some of youki's libraries for their purposes, so our wasm support gets used by them. Unless someone is wanting such native gpu support, this feature has a risk of stagnation and unknown breakages.

On a more personal note, I do think the suggestion is quite attractive. Having such a support directly at runtime level can solve some issues I can think of with container/gpu interaction, and also would create a much seamless experience. That said the concerns I said before still stands, and it certainly does not seem a simple issue to tackle.

@gleicon
Copy link

gleicon commented Jan 10, 2024

Understood. I am still figuring out how to execute the tests you proposed. I'm thinking that with local models as whisper/llama.cpp and many others having a way to package/coordinate and share resources. I did some of that with qemu but as you said, depending on the GPU that's not feasible. I'm curious to see how https://modal.com/ and other providers are using Rust to do GPU containers. That could be huge.

@utam0k
Copy link
Member Author

utam0k commented Jan 10, 2024

@gleicon Probably, nvidia-container-tookit provided by Nvidia is what you want.
https://github.com/NVIDIA/nvidia-container-toolkit

@YJDoc2
Copy link
Collaborator

YJDoc2 commented Jan 10, 2024

@gleicon
Hey, so looking more into this, I think youki (and other runtimes) are already having "full" support for gpu. Even right now, you can manually add the gpu devices in /dev/nvidia* via either docker or the config.json file, and they will get mounted in the container, where they can be accessed as gpu devices. Looking on some issues on runc opencontainers/runc#3671 and specially opencontainers/runc#3708 , the core problem here is that as the nvidia drivers are proprietary, they do not register with kernel like "normal" drivers. The files created in /dev are by the driver and not by the kernel. So some events can cause changes to them which can break the mounting of the gpu devices in container. the nvidia toolkit fixes this by creating and monitoring symlinks to the actual drivers, and auto-updating the symlinks if anything changes. That way we can mount those symlinks as devices in container and stuff does not break.

Also, a quick validation that gpu are correctly accessible in container is using nvidia-smi program, which lists the gpu on system. On a side note, the container itself will need to have nvidia gpu drivers in the image apart from mounting the devices.

@gleicon
Copy link

gleicon commented Jan 12, 2024

There is a debate o gpu passthroug using virtio and mmu that I'm not expert but testing (mainly docker and firecracker) has a series of limitation. Firecracker is an outlier as they state that theyr io-mmu approach was built from ground up with goals that would conflict with enabling gpu (as per their issues and discussion). I dont have an nbidia at hand - my main mmu is apple silicon and ati. Nvidia drivers and frameworks works as libvf.io IIUC. I am trying to do a clean setup again and compare a container running whisper cpp across them to compare. Just seeing /dev/gpu doesnt seems to do the magic in my current setup

@utam0k
Copy link
Member Author

utam0k commented Jan 14, 2024

@gleicon I think what you mentioned should be responsible for high-level runtime, not OCI runtime. What do you expect for youki?

@gleicon
Copy link

gleicon commented Jan 15, 2024

I expect something like this on lxc (which is where I test the basics of namespace for better understanding) for all GPUs: https://ubuntu.com/tutorials/gpu-data-processing-inside-lxd#6-add-your-gpu-to-the-container - at some point you have to attach or present the gpu to the container. What I aim to by using youki or a simpler/leaner runtime besides using rust is to run local llms sharing a GPU in a simpler way. It may be due that I don't have a NVIDIA GPU but support is not uniform. But thanks, I have to do more of my homework and find a NVIDIA setup !

@utam0k
Copy link
Member Author

utam0k commented Jan 16, 2024

@gleicon I always welcome learning. Come back here when you find good ideas 😍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests