Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On some systems, mounting cgroup v1 by setup_emulated_subsystem may missing some subsystems #2743

Open
JCKeep opened this issue Apr 5, 2024 · 9 comments
Assignees

Comments

@JCKeep
Copy link

JCKeep commented Apr 5, 2024

I'm writing an oci runtime using some youki's source code by learning, and find some bugs.

On my system, /proc/self/cgroup file be like:

# cat /proc/self/cgroup
11:perf_event:/
10:freezer:/
9:memory:/
8:pids:/
7:devices:/
6:cpuacct,cpu:/
5:blkio:/
4:cpuset:/
3:hugetlb:/
2:net_prio,net_cls:/
1:name=systemd:/user.slice/user-0.slice/session-264.scope

the cpuacct,cpu and net_prio,net_cls was unsorted, so in function mount_cgroup_v1:

        let process_cgroups: HashMap<String, String> = Process::myself()?
            .cgroups()?
            .into_iter()
            .map(|c| {
                let hierarchy = c.hierarchy;
                // When youki itself is running inside a container, the cgroup path
                // will include the path of pid-1, which needs to be stripped before
                // mounting.
                let root_pathname = root_cgroups
                    .iter()
                    .find(|c| c.hierarchy == hierarchy)
                    .map(|c| c.pathname.as_ref())
                    .unwrap_or("");
                let path = c
                    .pathname
                    .strip_prefix(root_pathname)
                    .unwrap_or(&c.pathname);
                (c.controllers.join(","), path.to_owned())
            })
            .collect();

process_cgroups will record cpuacct,cpu and net_prio,net_cls, but system comount in name cpu,cpuacct and net_cls,net_prio, so when using setup_emulated_subsystem it find no subsystem to bind.

// in function setup_emulated_subsystem, it will search `cpu,cpuacct` and `net_cls,net_prio`,
// but process_cgroups store `cpuacct,cpu` and `net_prio,net_cls`
if let Some(proc_path) = process_cgroups.get(named_hierarchy.as_ref()) {

So when I test, it will not mount some cgroupv1 subsystems and not generate error.

I fixed it on my system like this, mybe this will work for other system?

        let process_cgroups: HashMap<String, String> = Process::myself()?
            .cgroups()?
            .into_iter()
            .map(|c| {
                let hierarchy = c.hierarchy;
                // When youki itself is running inside a container, the cgroup path
                // will include the path of pid-1, which needs to be stripped before
                // mounting.
                let root_pathname = root_cgroups
                    .iter()
                    .find(|c| c.hierarchy == hierarchy)
                    .map(|c| c.pathname.as_ref())
                    .unwrap_or("");
                let path = c
                    .pathname
                    .strip_prefix(root_pathname)
                    .unwrap_or(&c.pathname);
                
                // some system the controllers may unsorted
                (c.controllers.iter().sorted().join(","), path.to_owned())
            })
            .collect();
@utam0k utam0k self-assigned this Apr 7, 2024
@utam0k
Copy link
Member

utam0k commented Apr 7, 2024

Thanks for your report, @JCKeep

the cpuacct,cpu and net_prio,net_cls was unsorted,

What does this mean? May I ask you to give me a more specific example or something?

@JCKeep
Copy link
Author

JCKeep commented Apr 7, 2024

@utam0k youki scan /proc/self/cgroup to collect the cgroups, when call setup_emulated_subsystem to bind mount cgroup subsystems, it will result in errors but not generate error.

for example, it may not mount net_cls,net_prio subsystem, because my /proc/self/cgroup is

[root@iZbp16aq5lgolei74n44yqZ ~]# cat /proc/self/cgroup
11:perf_event:/
10:freezer:/
9:memory:/
8:pids:/
7:devices:/
6:cpuacct,cpu:/
5:blkio:/
4:cpuset:/
3:hugetlb:/
2:net_prio,net_cls:/
1:name=systemd:/user.slice/user-0.slice/session-558.scope

in setup_emulated_subsystem, it will search process_cgroups by name "net_cls,net_prio", but it store "net_prio,net_cls"

#[cfg(feature = "v1")]
    fn setup_emulated_subsystem(
        &self,
        cgroup_mount: &SpecMount,
        options: &MountOptions,
        subsystem_name: &str,
        named: bool,
        host_mount: &Path,
        process_cgroups: &HashMap<String, String>,
    ) -> Result<()> {
        tracing::debug!("Mounting (emulated) {:?} cgroup subsystem", subsystem_name);
        let named_hierarchy: Cow<str> = if named {
            format!("name={subsystem_name}").into()
        } else {
            subsystem_name.into()
        };

        // we search "net_cls,net_prio", but it store "net_prio,net_cls"
        if let Some(proc_path) = process_cgroups.get(named_hierarchy.as_ref()) {
        ....
        } else {
            tracing::warn!("Could not mount {:?} cgroup subsystem", subsystem_name);
        }

图片

@utam0k
Copy link
Member

utam0k commented Apr 7, 2024

I understood this problem. I have a question. How does runc handle this case? Do you know?

@JCKeep
Copy link
Author

JCKeep commented Apr 7, 2024

@utam0k runc works
图片

@utam0k
Copy link
Member

utam0k commented Apr 7, 2024

@JCKeep We should fix it, but I wonder how runc treats it.

@JCKeep
Copy link
Author

JCKeep commented Apr 7, 2024

@utam0k runc use the mount options of /proc/self/mountinfo to get the cgroup subsystem mounts.
On my system, /proc/self/mountinfo like:
图片
the subsystem listed in /proc/self/cgroup will match cgroup mount options.

such as:
图片

@utam0k
Copy link
Member

utam0k commented Apr 7, 2024

@JCKeep Which code?

@JCKeep
Copy link
Author

JCKeep commented Apr 7, 2024

@utam0k you may have a look at function mountCgroupV1 and getCgroupMountsHelper

@utam0k
Copy link
Member

utam0k commented Apr 8, 2024

Why don't you align to runc's implementation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants