Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runc differences given the same config.json #2756

Open
jeromegn opened this issue Apr 11, 2024 · 2 comments
Open

runc differences given the same config.json #2756

jeromegn opened this issue Apr 11, 2024 · 2 comments
Assignees
Labels

Comments

@jeromegn
Copy link

I've been troubleshooting some issues executing into a running container and as I tried comparing with runc, I noticed it worked as I expected it.

Here's the setup procedure I have for my test:

$ mkdir mycontainer

$ cd mycontainer

$ mkdir rootfs

$ docker export $(docker create debian:bookworm-slim) | tar -C rootfs -xvf -
Spec config.json:
{
	"ociVersion": "1.0.2-dev",
	"process": {
		"terminal": false,
		"user": {
			"uid": 0,
			"gid": 0
		},
		"args": [
			"sleep", "100000"
		],
		"env": [
			"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
			"TERM=xterm"
		],
		"cwd": "/",
		"capabilities": {
			"bounding": [
				"CAP_AUDIT_WRITE",
				"CAP_CHOWN",
				"CAP_DAC_OVERRIDE",
				"CAP_FOWNER",
				"CAP_FSETID",
				"CAP_KILL",
				"CAP_MKNOD",
				"CAP_NET_BIND_SERVICE",
				"CAP_NET_RAW",
				"CAP_SETFCAP",
				"CAP_SETGID",
				"CAP_SETPCAP",
				"CAP_SETUID",
				"CAP_SYS_ADMIN",
				"CAP_SYS_CHROOT"
			],
			"effective": [
				"CAP_AUDIT_WRITE",
				"CAP_CHOWN",
				"CAP_DAC_OVERRIDE",
				"CAP_FOWNER",
				"CAP_FSETID",
				"CAP_KILL",
				"CAP_MKNOD",
				"CAP_NET_BIND_SERVICE",
				"CAP_NET_RAW",
				"CAP_SETFCAP",
				"CAP_SETGID",
				"CAP_SETPCAP",
				"CAP_SETUID",
				"CAP_SYS_ADMIN",
				"CAP_SYS_CHROOT"
			],
			"permitted": [
				"CAP_AUDIT_WRITE",
				"CAP_CHOWN",
				"CAP_DAC_OVERRIDE",
				"CAP_FOWNER",
				"CAP_FSETID",
				"CAP_KILL",
				"CAP_MKNOD",
				"CAP_NET_BIND_SERVICE",
				"CAP_NET_RAW",
				"CAP_SETFCAP",
				"CAP_SETGID",
				"CAP_SETPCAP",
				"CAP_SETUID",
				"CAP_SYS_ADMIN",
				"CAP_SYS_CHROOT"
			],
			"ambient": [
				"CAP_AUDIT_WRITE",
				"CAP_CHOWN",
				"CAP_DAC_OVERRIDE",
				"CAP_FOWNER",
				"CAP_FSETID",
				"CAP_KILL",
				"CAP_MKNOD",
				"CAP_NET_BIND_SERVICE",
				"CAP_NET_RAW",
				"CAP_SETFCAP",
				"CAP_SETGID",
				"CAP_SETPCAP",
				"CAP_SETUID",
				"CAP_SYS_ADMIN",
				"CAP_SYS_CHROOT"
			]
		},
		"rlimits": [
			{
				"type": "RLIMIT_NOFILE",
				"hard": 1024,
				"soft": 1024
			}
		],
		"noNewPrivileges": true
	},
	"root": {
		"path": "rootfs",
		"readonly": false
	},
	"hostname": "runc",
	"mounts": [
		{
			"destination": "/proc",
			"type": "proc",
			"source": "proc"
		},
		{
			"destination": "/dev",
			"type": "tmpfs",
			"source": "tmpfs",
			"options": [
				"nosuid",
				"strictatime",
				"mode=755",
				"size=65536k"
			]
		},
		{
			"destination": "/dev/pts",
			"type": "devpts",
			"source": "devpts",
			"options": [
				"nosuid",
				"noexec",
				"newinstance",
				"ptmxmode=0666",
				"mode=0620",
				"gid=5"
			]
		},
		{
			"destination": "/dev/shm",
			"type": "tmpfs",
			"source": "shm",
			"options": [
				"nosuid",
				"noexec",
				"nodev",
				"mode=1777",
				"size=65536k"
			]
		},
		{
			"destination": "/dev/mqueue",
			"type": "mqueue",
			"source": "mqueue",
			"options": [
				"nosuid",
				"noexec",
				"nodev"
			]
		},
		{
			"destination": "/sys",
			"source": "/sys",
			"options": [
				"rbind",
				"nosuid",
				"noexec",
				"nodev",
				"ro"
			]
		},
		{
			"destination": "/sys/fs/cgroup",
			"type": "cgroup",
			"source": "cgroup",
			"options": [
				"nosuid",
				"noexec",
				"nodev",
				"relatime",
				"ro"
			]
		}
	],
	"linux": {
		"resources": {
			"devices": [
				{
					"allow": false,
					"access": "rwm"
				}
			]
		},
		
		"namespaces": [
			{
				"type": "pid"
			},
			{
				"type": "ipc"
			},
			{
				"type": "uts"
			},
			{
				"type": "mount"
			},
			{
				"type": "cgroup"
			}
		],
		"maskedPaths": [
			"/proc/acpi",
			"/proc/asound",
			"/proc/kcore",
			"/proc/keys",
			"/proc/latency_stats",
			"/proc/timer_list",
			"/proc/timer_stats",
			"/proc/sched_debug",
			"/sys/firmware",
			"/proc/scsi"
		],
		"readonlyPaths": [
			"/proc/bus",
			"/proc/fs",
			"/proc/irq",
			"/proc/sys",
			"/proc/sysrq-trigger"
		]
	}
}
Running with runc:
$ sudo runc --debug create runctest
DEBU[0000] nsexec[458717]: => nsexec container setup    
DEBU[0000] nsexec-0[458717]: ~> nsexec stage-0          
DEBU[0000] nsexec-0[458717]: spawn stage-1              
DEBU[0000] nsexec-0[458717]: -> stage-1 synchronisation loop 
DEBU[0000] nsexec-1[458720]: ~> nsexec stage-1          
DEBU[0000] nsexec-1[458720]: unshare remaining namespaces (except cgroupns) 
DEBU[0000] nsexec-1[458720]: spawn stage-2              
DEBU[0000] nsexec-1[458720]: request stage-0 to forward stage-2 pid (458721) 
DEBU[0000] nsexec-0[458717]: stage-1 requested pid to be forwarded 
DEBU[0000] nsexec-0[458717]: forward stage-1 (458720) and stage-2 (458721) pids to runc 
DEBU[0000] nsexec-1[458720]: signal completion to stage-0 
DEBU[0000] nsexec-1[458720]: <~ nsexec stage-1          
DEBU[0000] nsexec-2[1]: ~> nsexec stage-2               
DEBU[0000] nsexec-0[458717]: stage-1 complete           
DEBU[0000] nsexec-0[458717]: <- stage-1 synchronisation loop 
DEBU[0000] nsexec-0[458717]: -> stage-2 synchronisation loop 
DEBU[0000] nsexec-0[458717]: signalling stage-2 to run  
DEBU[0000] nsexec-2[1]: unshare cgroup namespace        
DEBU[0000] nsexec-2[1]: signal completion to stage-0    
DEBU[0000] nsexec-2[1]: <= nsexec container setup       
DEBU[0000] nsexec-2[1]: booting up go runtime ...       
DEBU[0000] nsexec-0[458717]: stage-2 complete           
DEBU[0000] nsexec-0[458717]: <- stage-2 synchronisation loop 
DEBU[0000] nsexec-0[458717]: <~ nsexec stage-0          
DEBU[0000] child process in init()                      
DEBU[0000] init: closing the pipe to signal completion  

$ sudo runc --debug start runctest

$ sudo runc --debug exec -t runctest /bin/bash
DEBU[0000] nsexec[458594]: => nsexec container setup    
DEBU[0000] nsexec[458594]: set process as non-dumpable  
DEBU[0000] nsexec-0[458594]: ~> nsexec stage-0          
DEBU[0000] nsexec-0[458594]: spawn stage-1              
DEBU[0000] nsexec-0[458594]: -> stage-1 synchronisation loop 
DEBU[0000] nsexec-1[458597]: ~> nsexec stage-1          
DEBU[0000] nsexec-1[458597]: setns(0x8000000) into ipc namespace (with path /proc/458113/ns/ipc) 
DEBU[0000] nsexec-1[458597]: setns(0x4000000) into uts namespace (with path /proc/458113/ns/uts) 
DEBU[0000] nsexec-1[458597]: setns(0x20000000) into pid namespace (with path /proc/458113/ns/pid) 
DEBU[0000] nsexec-1[458597]: setns(0x20000) into mnt namespace (with path /proc/458113/ns/mnt) 
DEBU[0000] nsexec-1[458597]: setns(0x2000000) into cgroup namespace (with path /proc/458113/ns/cgroup) 
DEBU[0000] nsexec-1[458597]: unshare remaining namespaces (except cgroupns) 
DEBU[0000] nsexec-1[458597]: spawn stage-2              
DEBU[0000] nsexec-1[458597]: request stage-0 to forward stage-2 pid (458598) 
DEBU[0000] nsexec-0[458594]: stage-1 requested pid to be forwarded 
DEBU[0000] nsexec-0[458594]: forward stage-1 (458597) and stage-2 (458598) pids to runc 
DEBU[0000] nsexec-1[458597]: signal completion to stage-0 
DEBU[0000] nsexec-1[458597]: <~ nsexec stage-1          
DEBU[0000] nsexec-0[458594]: stage-1 complete           
DEBU[0000] nsexec-0[458594]: <- stage-1 synchronisation loop 
DEBU[0000] nsexec-0[458594]: -> stage-2 synchronisation loop 
DEBU[0000] nsexec-0[458594]: signalling stage-2 to run  
DEBU[0000] nsexec-2[17]: ~> nsexec stage-2              
DEBU[0000] nsexec-2[17]: signal completion to stage-0   
DEBU[0000] nsexec-2[17]: <= nsexec container setup      
DEBU[0000] nsexec-2[17]: booting up go runtime ...      
DEBU[0000] nsexec-0[458594]: stage-2 complete           
DEBU[0000] nsexec-0[458594]: <- stage-2 synchronisation loop 
DEBU[0000] nsexec-0[458594]: <~ nsexec stage-0          
DEBU[0000] child process in init()                      
DEBU[0000] setns_init: about to exec                    
DEBU[0000]signals.go:102 main.(*signalHandler).forward() sending signal to process urgent I/O condition 
root@runc:/# apt update
Ign:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Ign:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Ign:3 http://security.ubuntu.com/ubuntu jammy-security InRelease
Ign:4 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Ign:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Ign:3 http://security.ubuntu.com/ubuntu jammy-security InRelease         
Ign:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease           
Ign:4 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
0% [Working]^C
root@runc:/# 
exit
Running with youki:
$ sudo ../youki/target/debug/youki create -b mycontainer youkitest
DEBUG youki: started by user 0 with ArgsOs { inner: ["../youki/target/debug/youki", "create", "-b", "mycontainer", "youkitest"] }
DEBUG libcontainer::user_ns: this container does NOT create a new user namespace
DEBUG libcontainer::container::init_builder: container directory will be "/run/youki/youkitest"
DEBUG libcontainer::container::container: Save container status: Container { state: State { oci_version: "v1.0.2", id: "youkitest", status: Creating, pid: None, bundle: "/home/jerome/src/github.com/superfly/init/mycontainer", annotations: Some({}), created: None, creator: None, use_systemd: false, clean_up_intel_rdt_subdirectory: None }, root: "/run/youki/youkitest" } in "/run/youki/youkitest"
DEBUG libcontainer::user_ns: this container does NOT create a new user namespace
DEBUG libcontainer::notify_socket: create notify listener socket_path="/run/youki/youkitest/notify.sock"
DEBUG libcontainer::notify_socket: the cwd to create the notify socket cwd="/run/youki/youkitest"
 INFO libcgroups::common: cgroup manager V2 will be used
 WARN libcgroups::v2::util: Controller rdma is not yet implemented.
 WARN libcgroups::v2::util: Controller misc is not yet implemented.
DEBUG libcgroups::v2::hugetlb: Apply hugetlb cgroup v2 config
DEBUG libcgroups::v2::io: Apply io cgroup v2 config
DEBUG libcgroups::v2::pids: Apply pids cgroup v2 config
 WARN libcgroups::v2::util: Controller rdma is not yet implemented.
 WARN libcgroups::v2::util: Controller misc is not yet implemented.
DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Pid, path: None }
DEBUG libcontainer::process::channel: sending init pid (Pid(457403))
DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Uts, path: None }
DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Ipc, path: None }
DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Cgroup, path: None }
DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Mount, path: None }
DEBUG libcontainer::rootfs::rootfs: prepare rootfs rootfs="/home/jerome/src/github.com/superfly/init/mycontainer/rootfs"
DEBUG libcontainer::rootfs::rootfs: mount root fs "/home/jerome/src/github.com/superfly/init/mycontainer/rootfs"
DEBUG libcontainer::rootfs::mount: mounting Mount { destination: "/proc", typ: Some("proc"), source: Some("proc"), options: None }
DEBUG libcontainer::rootfs::mount: mounting with options: MountOptionConfig { flags: MsFlags(0x0), data: "", rec_attr: None }
DEBUG libcontainer::rootfs::mount: mounting Mount { destination: "/dev", typ: Some("tmpfs"), source: Some("tmpfs"), options: Some(["nosuid", "strictatime", "mode=755", "size=65536k"]) }
DEBUG libcontainer::rootfs::mount: mounting with options: MountOptionConfig { flags: MsFlags(MS_NOSUID), data: "mode=755,size=65536k", rec_attr: None }
DEBUG libcontainer::rootfs::mount: mounting Mount { destination: "/dev/pts", typ: Some("devpts"), source: Some("devpts"), options: Some(["nosuid", "noexec", "newinstance", "ptmxmode=0666", "mode=0620", "gid=5"]) }
DEBUG libcontainer::rootfs::mount: mounting with options: MountOptionConfig { flags: MsFlags(MS_NOSUID | MS_NOEXEC), data: "newinstance,ptmxmode=0666,mode=0620,gid=5", rec_attr: None }
DEBUG libcontainer::rootfs::mount: mounting Mount { destination: "/dev/shm", typ: Some("tmpfs"), source: Some("shm"), options: Some(["nosuid", "noexec", "nodev", "mode=1777", "size=65536k"]) }
DEBUG libcontainer::rootfs::mount: mounting with options: MountOptionConfig { flags: MsFlags(MS_NOSUID | MS_NODEV | MS_NOEXEC), data: "mode=1777,size=65536k", rec_attr: None }
DEBUG libcontainer::rootfs::mount: mounting Mount { destination: "/dev/mqueue", typ: Some("mqueue"), source: Some("mqueue"), options: Some(["nosuid", "noexec", "nodev"]) }
DEBUG libcontainer::rootfs::mount: mounting with options: MountOptionConfig { flags: MsFlags(MS_NOSUID | MS_NODEV | MS_NOEXEC), data: "", rec_attr: None }
DEBUG libcontainer::rootfs::mount: mounting Mount { destination: "/sys", typ: None, source: Some("/sys"), options: Some(["rbind", "nosuid", "noexec", "nodev", "ro"]) }
DEBUG libcontainer::rootfs::mount: mounting with options: MountOptionConfig { flags: MsFlags(MS_RDONLY | MS_NOSUID | MS_NODEV | MS_NOEXEC | MS_BIND | MS_REC), data: "", rec_attr: None }
DEBUG libcontainer::rootfs::mount: mounting Mount { destination: "/sys/fs/cgroup", typ: Some("cgroup"), source: Some("cgroup"), options: Some(["nosuid", "noexec", "nodev", "relatime", "ro"]) }
DEBUG libcontainer::rootfs::mount: Mounting cgroup v2 filesystem
DEBUG libcontainer::rootfs::mount: Mount { destination: "/sys/fs/cgroup", typ: Some("cgroup2"), source: Some("cgroup"), options: Some([]) }
DEBUG libcontainer::rootfs::mount: mounting with options: MountOptionConfig { flags: MsFlags(MS_RDONLY | MS_NOSUID | MS_NODEV | MS_NOEXEC), data: "", rec_attr: None }
ERROR libcontainer::rootfs::mount: mount of "/sys/fs/cgroup" failed. EBUSY: Device or resource busy
DEBUG libcontainer::rootfs::mount: Mount { destination: "/sys/fs/cgroup", typ: Some("bind"), source: Some("/sys/fs/cgroup/"), options: Some([]) }
DEBUG libcontainer::rootfs::mount: mounting with options: MountOptionConfig { flags: MsFlags(MS_RDONLY | MS_NOSUID | MS_NODEV | MS_NOEXEC | MS_BIND), data: "", rec_attr: None }
DEBUG libcontainer::process::container_init_process: readonly path "/proc/bus" mounted
DEBUG libcontainer::process::container_init_process: readonly path "/proc/fs" mounted
DEBUG libcontainer::process::container_init_process: readonly path "/proc/irq" mounted
DEBUG libcontainer::process::container_init_process: readonly path "/proc/sys" mounted
DEBUG libcontainer::process::container_init_process: readonly path "/proc/sysrq-trigger" mounted
DEBUG libcontainer::capabilities: reset all caps
DEBUG libcontainer::capabilities: dropping bounding capabilities to Some({Setpcap, Mknod, SysChroot, Setuid, AuditWrite, Setgid, Setfcap, SysAdmin, NetRaw, Chown, Fsetid, DacOverride, NetBindService, Kill, Fowner})
ERROR libcontainer::capabilities: failed to set ambient capabilities: failed to set capabilities: caps error: PR_CAP_AMBIENT_RAISE failure: Operation not permitted (os error 1)
DEBUG libcontainer::workload::default: found executable in executor executable="/usr/bin/sleep"
DEBUG libcontainer::process::container_main_process: init pid is Pid(457403)
DEBUG libcontainer::container::container: Save container status: Container { state: State { oci_version: "v1.0.2", id: "youkitest", status: Created, pid: Some(457403), bundle: "/home/jerome/src/github.com/superfly/init/mycontainer", annotations: None, created: Some(2024-04-11T00:11:46.035245621Z), creator: Some(0), use_systemd: false, clean_up_intel_rdt_subdirectory: Some(false) }, root: "/run/youki/youkitest" } in "/run/youki/youkitest"

$ sudo ../youki/target/debug/youki start youkitest
DEBUG youki: started by user 0 with ArgsOs { inner: ["../youki/target/debug/youki", "start", "youkitest"] }
DEBUG libcontainer::notify_socket: notify container start
DEBUG libcontainer::notify_socket: notify finished
DEBUG libcontainer::container::container: Save container status: Container { state: State { oci_version: "v1.0.2", id: "youkitest", status: Running, pid: Some(457403), bundle: "/home/jerome/src/github.com/superfly/init/mycontainer", annotations: None, created: Some(2024-04-11T00:11:46.035245621Z), creator: Some(0), use_systemd: false, clean_up_intel_rdt_subdirectory: Some(false) }, root: "/run/youki/youkitest" } in "/run/youki/youkitest"

$ sudo ../youki/target/debug/youki exec -t youkitest /bin/bash
DEBUG youki: started by user 0 with ArgsOs { inner: ["../youki/target/debug/youki", "exec", "-t", "youkitest", "/bin/bash"] }
DEBUG libcontainer::user_ns: this container does NOT create a new user namespace
DEBUG libcontainer::user_ns: this container does NOT create a new user namespace
DEBUG libcontainer::notify_socket: create notify listener socket_path="/run/youki/youkitest/tenant-notify-5929bea.sock"
DEBUG libcontainer::notify_socket: the cwd to create the notify socket cwd="/run/youki/youkitest"
 INFO libcgroups::common: cgroup manager V2 will be used
 WARN libcgroups::v2::util: Controller rdma is not yet implemented.
 WARN libcgroups::v2::util: Controller misc is not yet implemented.
DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Pid, path: Some("/proc/457403/ns/pid") }
DEBUG libcontainer::process::channel: sending init pid (Pid(457525))
DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Uts, path: Some("/proc/457403/ns/uts") }
DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Ipc, path: Some("/proc/457403/ns/ipc") }
DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Network, path: Some("/proc/457403/ns/net") }
DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Cgroup, path: Some("/proc/457403/ns/cgroup") }
DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Mount, path: Some("/proc/457403/ns/mnt") }
DEBUG libcontainer::process::container_init_process: readonly path "/proc/bus" mounted
DEBUG libcontainer::process::container_init_process: readonly path "/proc/fs" mounted
DEBUG libcontainer::process::container_init_process: readonly path "/proc/irq" mounted
DEBUG libcontainer::process::container_init_process: readonly path "/proc/sys" mounted
DEBUG libcontainer::process::container_init_process: readonly path "/proc/sysrq-trigger" mounted
DEBUG libcontainer::capabilities: reset all caps
DEBUG libcontainer::capabilities: dropping bounding capabilities to Some({NetBindService, AuditWrite, Kill})
DEBUG libcontainer::process::container_main_process: init pid is Pid(457525)
DEBUG libcontainer::notify_socket: notify container start
DEBUG libcontainer::notify_socket: notify finished
DEBUG libcontainer::notify_socket: received: start container
DEBUG libcontainer::workload::default: executing workload with default handler
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
bash: /root/.bashrc: Permission denied
root@runc:/# apt update
Reading package lists... Done
E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)

This is similar to the issue I was troubleshooting, but different:

root@runc:/# apt update
Reading package lists... Done
E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)

Any idea what might be causing this? I would expect both runtimes to work more or less the same (at least it shouldn't error). I'm going to dig deeper into the differences.

One thing I noticed from the logs is that youki is trying to setup a network namespace when there is none defined in the config.json, but only when using youki exec:

DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Network, path: Some("/proc/457403/ns/net") }
@jeromegn
Copy link
Author

One thing I noticed from the logs is that youki is trying to setup a network namespace when there is none defined in the config.json, but only when using youki exec:

DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Network, path: Some("/proc/457403/ns/net") }

I've fixed this locally. The TenantBuilder will try to use all namespaces for the process, regardless of what namespaces were configured in the spec.

@utam0k
Copy link
Member

utam0k commented Apr 13, 2024

What you said is here, isn't it?

let init_process = procfs::process::Process::new(container_pid.as_raw())?;
let ns = self.get_namespaces(init_process.namespaces()?.0)?;

Do you have any chance to contribute?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants