virtcontainers

virtcontainers is a Go library that can be used to build hardware-virtualized container runtimes.

Background

The few existing VM-based container runtimes (Clear Containers, runv, rkt's kvm stage 1) all share the same hardware virtualization semantics but use different code bases to implement them. virtcontainers's goal is to factorize this code into a common Go library.

Ideally, VM-based container runtime implementations would become translation layers from the runtime specification they implement (e.g. the OCI runtime-spec or the Kubernetes CRI) to the virtcontainers API.

Out of scope

Implementing a container runtime tool is out of scope for this project. Any tools or executables in this repository are only provided for demonstration or testing purposes.

virtcontainers and CRI

virtcontainers's API is loosely inspired by the Kubernetes CRI because we believe it provides the right level of abstractions for containerized pods. However, despite the API similarities between the two projects, the goal of virtcontainers is not to build a CRI implementation, but instead to provide a generic, runtime-specification agnostic, hardware-virtualized containers library that other projects could leverage to implement CRI themselves.

Design

Pods

The virtcontainers execution unit is a pod, i.e. virtcontainers users start pods where containers will be running.

virtcontainers creates a pod by starting a virtual machine and setting the pod up within that environment. Starting a pod means launching all containers with the VM pod runtime environment.

Hypervisors

The virtcontainers package relies on hypervisors to start and stop virtual machine where pods will be running. An hypervisor is defined by an Hypervisor interface implementation, and the default implementation is the QEMU one.

Agents

During the lifecycle of a container, the runtime running on the host needs to interact with the virtual machine guest OS in order to start new commands to be executed as part of a given container workload, set new networking routes or interfaces, fetch a container standard or error output, and so on. There are many existing and potential solutions to resolve that problem and virtcontainers abstracts this through the Agent interface.

API

The high level virtcontainers API is the following one:

Pod API

CreatePod(podConfig PodConfig) creates a Pod. The virtual machine is started and the Pod is prepared.
DeletePod(podID string) deletes a Pod. The virtual machine is shut down and all information related to the Pod are removed. The function will fail if the Pod is running. In that case StopPod() has to be called first.
StartPod(podID string) starts an already created Pod. The Pod and all its containers are started.
RunPod(podConfig PodConfig) creates and starts a Pod. This performs CreatePod() + StartPod().
StopPod(podID string) stops an already running Pod. The Pod and all its containers are stopped.
StatusPod(podID string) returns a detailed Pod status.
ListPod() lists all Pods on the host. It returns a detailed status for every Pod.

Container API

CreateContainer(podID string, containerConfig ContainerConfig) creates a Container on an existing Pod.
DeleteContainer(podID, containerID string) deletes a Container from a Pod. If the Container is running it has to be stopped first.
StartContainer(podID, containerID string) starts an already created Container. The Pod has to be running.
StopContainer(podID, containerID string) stops an already running Container.
EnterContainer(podID, containerID string, cmd Cmd) enters an already running Container and runs a given command.
StatusContainer(podID, containerID string) returns a detailed Container status.

An example tool using the virtcontainers API is provided in the hack/virtc package.

Networking

Virtcontainers implements two different way of setting up pod's network:

CNM

CNM lifecycle

RequestPool
CreateNetwork
RequestAddress
CreateEndPoint
CreateContainer
Create config.json
Create PID and network namespace
ProcessExternalKey
JoinEndPoint
LaunchContainer
Launch
Run container

Runtime network setup with CNM

Read config.json
Create the network namespace (code)
Call the prestart hook (from inside the netns) (code)
Scan network interfaces inside netns and get the name of the interface created by prestart hook (code)
Create bridge, TAP, and link all together with network interface previously created (code)
Start VM inside the netns and start the container (code)

Drawbacks of CNM

There are three drawbacks about using CNM instead of CNI:

The way we call into it is not very explicit: Have to re-exec dockerd binary so that it can accept parameters and execute the prestart hook related to network setup.
Implicit way to designate the network namespace: Instead of explicitely giving the netns to dockerd, we give it the PID of our runtime so that it can find the netns from this PID. This means we have to make sure being in the right netns while calling the hook, otherwise the veth pair will be created with the wrong netns.
No results are back from the hook: We have to scan the network interfaces to discover which one has been created inside the netns. This introduces more latency in the code because it forces us to scan the network in the CreatePod path, which is critical for starting the VM as quick as possible.

CNI

Runtime network setup with CNI

Create the network namespace (code)
Get CNI plugin information (code)
Start the plugin (providing previously created netns) to add a network described into /etc/cni/net.d/ directory. At that time, the CNI plugin will create the cni0 network interface and a veth pair between the host and the created netns. It links cni0 to the veth pair before to exit. (code)
Create bridge, TAP, and link all together with network interface previously created (code)
Start VM inside the netns and start the container (code)

Name		Name	Last commit message	Last commit date
Latest commit History 627 Commits
.ci		.ci
documentation/network		documentation/network
hack/virtc		hack/virtc
hook/mock		hook/mock
pause		pause
pkg		pkg
shim/mock		shim/mock
test/cni		test/cni
utils		utils
vendor		vendor
.gitignore		.gitignore
.pullapprove.yml		.pullapprove.yml
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
Gopkg.lock		Gopkg.lock
Gopkg.toml		Gopkg.toml
LICENSE		LICENSE
Makefile		Makefile
NEWS		NEWS
OWNERS		OWNERS
README.md		README.md
agent.go		agent.go
agent_test.go		agent_test.go
api.go		api.go
api_test.go		api_test.go
cc_proxy.go		cc_proxy.go
cc_shim.go		cc_shim.go
cc_shim_test.go		cc_shim_test.go
cni.go		cni.go
cnm.go		cnm.go
cnm_test.go		cnm_test.go
container.go		container.go
container_test.go		container_test.go
doc.go		doc.go
errors.go		errors.go
example_pod_run_test.go		example_pod_run_test.go
filesystem.go		filesystem.go
filesystem_test.go		filesystem_test.go
hook.go		hook.go
hook_test.go		hook_test.go
hyperstart.go		hyperstart.go
hyperstart_test.go		hyperstart_test.go
hypervisor.go		hypervisor.go
hypervisor_test.go		hypervisor_test.go
lock.json		lock.json
manifest.json		manifest.json
mock_hypervisor.go		mock_hypervisor.go
mock_hypervisor_test.go		mock_hypervisor_test.go
mount.go		mount.go
mount_test.go		mount_test.go
network.go		network.go
network_test.go		network_test.go
noop_agent.go		noop_agent.go
noop_agent_test.go		noop_agent_test.go
noop_network.go		noop_network.go
noop_proxy.go		noop_proxy.go
noop_shim.go		noop_shim.go
noop_shim_test.go		noop_shim_test.go
nsenter.go		nsenter.go
nsenter_test.go		nsenter_test.go
pod.go		pod.go
pod_test.go		pod_test.go
proxy.go		proxy.go
proxy_test.go		proxy_test.go
qemu.go		qemu.go
qemu_test.go		qemu_test.go
shim.go		shim.go
shim_test.go		shim_test.go
spawner.go		spawner.go
spawner_test.go		spawner_test.go
sshd.go		sshd.go
syscall.go		syscall.go
syscall_test.go		syscall_test.go
types.go		types.go
types_test.go		types_test.go
utils.go		utils.go
utils_test.go		utils_test.go
virtcontainers_test.go		virtcontainers_test.go

License

GabyCT/virtcontainers

Folders and files

Latest commit

History