kata container 初探
kata containers是由OpenStack基金会管理,但独立于OpenStack项目之外的容器项目。 它是一个可以使用容器镜像以超轻量级虚机的形式创建容器的运行时工具。 kata containers整合了Intel的 Clear Containers 和 Hyper.sh 的 runV, 能够支持不同平台的硬件 (x86-64,arm等),并符合OCI(Open Container Initiative)规范。 目前项目包含几个配套组件,即Runtime,Agent,Proxy,Shim,Kernel等。目前Kata Containers的运行时还没有整合,即Clear containers 和 runV还是独立的。
kata container 架构
kata container实质上是在虚拟机内部使用container(基于runc的实现)。 kata-container使用虚拟化软件(qemu-lite优化过的qemu), 通过已经将kata-agent 安装的kernel & intrd image,启动过一个轻量级的虚拟机, 使用nvdimm将initrd image映射到guest vm中。然后由kata-agent为container创建对应的namespace和资源。 Guest VM作为实质上的sandbox可以完全与host kernel进行隔离。
kata container 原理,如图所示。
- kata-runtime:实现OCI接口,可以通过CRI-O 与kubelet对接作为k8s runtime server, containerd对接docker engine,创建运行container/pod的VM
- kata-proxy: 每一个container都会由一个kata-proxy进程,kata-proxy负责与kata-agent通讯,当guest vm启动后,kata-agent会随之启动并使用qemu virtio serial console 进行通讯
- kata-agent: 运行在guest vm中的进程, 主要依赖于libcontainer项目,重用了大部分的runc代码,为container创建namespace(NS, UTS, IPC and PID).
- kata-shim: 作为guest vm标准输入输出的接口,exec命令就是同kata-shim实现的
k8s 与kata container
kata container是hypverisor container阵营的container runtime项目,支持OCI标准。k8s想要创建kata container类型 pods需要的是cri shim即能够提供CRI的服务。k8s孵化项目CRI-O就是可以提供CRI并能够与满足OCI container runtime通讯的项目 k8s与kata container的work flow 如下
+---------------+ +--->|container |
+---------------+ | cri-o | | +-----------+
| kubelet | | | +-------------+ |
| +-------------+ cri protobuf +-------------+ |<--->| container +<-+ +-----------+
| | grpc client |<------------->| grpc server | | | runtime +<----->|container |
| +-------------| +-------------+ | +-------------+ +-----------+
+---------------+ | |
+---------------+
- k8s调用kubelet在node上启动一个pod,kubelet通过gRPC调用cri-o启动pod。
- cri-o 使用
containers/image
从image registry获取image - 调用
containers/stroage
将image解压成root filesystems - cri-o根据kubelet api请求,创建OCI runtime spec文件
- cri-o调用container runtime(runc/kata container)创建container
- 每一个container都有一个
conmon
进程监控,用于处理container logs和exits code - Pod网络CNI是直接调用了
CNI plugin
cri-o 架构图
安装kata container runtime
环境信息
os: CentOS Linux release 7.4.1708 (Core)
docker 1.12.6
etcd: 3.2.11
go: 1.9.4
kubenetes: 1.10.5
step 1: 获取源码并执行编译安装
kata-runtime kata-proxy kata-shim
go get -d -u github.com/kata-containers/runtime github.com/kata-containers/proxy github.com/kata-containers/shim
cd $GOPATH/src/github.com/kata-containers/runtime
make && make install
cd ${GOPATH}/src/github.com/kata-containers/proxy
make && make install
cd ${GOPATH}/src/github.com/kata-containers/shim
make && make install
step 2: 运行kata-check
检查环境是否满足kata container的要求
kata container要求宿主机具有硬件虚拟化的能力
# kata-runtime kata-check
INFO[0000] CPU property found description="Intel Architecture CPU" name=GenuineIntel pid=156730 source=runtime type=attribute
INFO[0000] CPU property found description="Virtualization support" name=vmx pid=156730 source=runtime type=flag
INFO[0000] CPU property found description="64Bit CPU" name=lm pid=156730 source=runtime type=flag
INFO[0000] CPU property found description=SSE4.1 name=sse4_1 pid=156730 source=runtime type=flag
INFO[0000] kernel property found description="Host kernel accelerator for virtio network" name=vhost_net pid=156730 source=runtime type=module
INFO[0000] kernel property found description="Kernel-based Virtual Machine" name=kvm pid=156730 source=runtime type=module
INFO[0000] kernel property found description="Intel KVM" name=kvm_intel pid=156730 source=runtime type=module
WARN[0000] kernel module parameter has unexpected value description="Intel KVM" expected=Y name=kvm_intel parameter=nested pid=156730 source=runtime type=module value=N
INFO[0000] Kernel property value correct description="Intel KVM" expected=Y name=kvm_intel parameter=unrestricted_guest pid=156730 source=runtime type=module value=Y
INFO[0000] kernel property found description="Host kernel accelerator for virtio" name=vhost pid=156730 source=runtime type=module
INFO[0000] System is capable of running Kata Containers name=kata-runtime pid=156730 source=runtime
INFO[0000] device available check-type=full device=/dev/kvm name=kata-runtime pid=156730 source=runtime
INFO[0000] feature available check-type=full feature=create-vm name=kata-runtime pid=156730 source=runtime
INFO[0000] System can currently create Kata Containers name=kata-runtime pid=156730 source=runtime
step 3: qemu-lite 安装
$ source /etc/os-release
$ sudo yum -y install yum-utils
$ sudo -E VERSION_ID=$VERSION_ID yum-config-manager --add-repo "http://download.opensuse.org/repositories/home:/katacontainers:/release/CentOS_${VERSION_ID}/home:katacontainers:release.repo"
yum -y install qemu-lite
step 4: 准备kata container image
- initrd image
initrd(boot loader initialized RAM disk)就是由boot loader初始化时加载的ram disk。initrd是一个被压缩过的小型根目录, 这个目录中包含了启动阶段中必须的驱动模块,可执行文件和启动脚本。当系统启动的时候,booloader会把initrd文件读到内存中, 然后把initrd的起始地址告诉内核。内核在运行过程中会解压initrd,然后把 initrd挂载为根目录,然后执行根目录中的/initrc脚本, 您可以在这个脚本中运行initrd中的udevd,让它来自动加载设备驱动程序以及 在/dev目录下建立必要的设备节点。在udevd自动加载磁盘驱动程序之后, 就可以mount真正的根目录,并切换到这个根目录中。
go get github.com/kata-containers/agent github.com/kata-containers/osbuilder
# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
image-builder-osbuilder latest 092d50027bf2 40 minutes ago 456.3 MB
centos-rootfs-osbuilder latest 27375c3d3491 About an hour ago 798.9 MB
- rootfs image
- 执行rootfs生成脚本,脚本执行完成后,能看到rootfs的文件夹 rootfs_Centos
cd /root/.golang/src/github.com/kata-containers/osbuilder/rootfs-builder export USE_DOCKER=true ./rootfs.sh centos
- 执行rootfs image build 脚本
cd /root/.golang/src/github.com/kata-containers/osbuilder/image-builder
image_builder.sh /root/.golang/src/github.com/kata-containers/osbuilder/rootfs-builder/rootfs-Centos
- 执行initrd image build脚本
image build细节
- rootfs 生成
可以通过设置环境变量exprot DEBUG=true
执行脚本,能看到更多的细节。rootfs.sh脚本目的就是生成distributor的根文件系统
在使用USER_DOCKER=true
时,实际上是build一个centos-rootfs-osbuilder的docker image,然后从docker image创建一个
container,并在container内部执行rootfs.sh的脚本,把根文件系统导出来。
生成centos-root-osbuilder image如下
From centos:7
RUN yum -y update && yum install -y git make gcc coreutils
# This will install the proper golang to build Kata components
RUN cd /tmp ; curl -OL https://storage.googleapis.com/golang/go1.9.2.linux-amd64.tar.gz
RUN tar -C /usr/ -xzf /tmp/go1.9.2.linux-amd64.tar.gz
ENV GOROOT=/usr/go
ENV PATH=$PATH:$GOROOT/bin:$GOPATH/bin
创建image builder container 开始build rootfs
docker run --rm --runtime runc --env https_proxy= --env http_proxy=
--env AGENT_VERSION=master --env ROOTFS_DIR=/rootfs --env GO_AGENT_PKG=github.com/kata-containers/agent
--env AGENT_BIN=kata-agent --env AGENT_INIT=no --env GOPATH=/root/.golang --env KERNEL_MODULES_DIR=
--env EXTRA_PKGS= --env OSBUILDER_VERSION=unknown
-v /root/.golang/src/github.com/kata-containers/osbuilder/rootfs-builder:/osbuilder
-v /root/.golang/src/github.com/kata-containers/osbuilder/rootfs-builder/rootfs-Centos:/rootfs
-v /root/.golang/src/github.com/kata-containers/osbuilder/rootfs-builder/../scripts:/scripts
-v /root/.golang/src/github.com/kata-containers/osbuilder/rootfs-builder/rootfs-Centos:/root/.golang/src/github.com/kata-containers/osbuilder/rootfs-builder/rootfs-Centos
-v /root/.golang:/root/.golang centos-rootfs-osbuilder bash /osbuilder/rootfs.sh centos
- rootfs image build
创建rootfs image的过程,简单来讲就是创建了一个raw格式的image,分区,拷贝rootfs的目录到分区内,脚本里root分区的文件系统是ext4
qemu-img create -q -f raw "${IMAGE}" "${IMG_SIZE}M"
parted "${IMAGE}" --script "mklabel gpt" \
"mkpart ${FS_TYPE} 1M -1M"
### ......
cp -a "${ROOTFS}"/* ${MOUNT_DIR}
docker run --rm --runtime runc --privileged --env IMG_SIZE= --env AGENT_INIT=no
-v /dev:/dev -v /root/.golang/src/github.com/kata-containers/osbuilder/image-builder:/osbuilder
-v /root/.golang/src/github.com/kata-containers/osbuilder/image-builder/../scripts:/scripts
-v /root/.golang/src/github.com/kata-containers/osbuilder/rootfs-builder/rootfs-Centos:/rootfs
-v /root/.golang/src/github.com/kata-containers/osbuilder/image-builder:/image image-builder-osbuilder
bash /osbuilder/image_builder.sh -o /image/kata-containers.img /rootfs
- 生成initrd image
生成initrd image的过程比较简单,主要是如下命令,其实就是把rootfs打包成initrd.img
( cd "${ROOTFS}" && find . | cpio -H newc -o | gzip -9 ) > "${IMAGE_DIR}"/"${IMAGE_NAME}"
Guest Kernel image
TODO
CRI-O安装
根据CRI-O官网,匹配k8s版本,选择1.10
step 1: 安装依赖包
yum install -y \
btrfs-progs-devel \
device-mapper-devel \
git \
glib2-devel \
glibc-devel \
glibc-static \
go \
golang-github-cpuguy83-go-md2man \
gpgme-devel \
libassuan-devel \
libgpg-error-devel \
libseccomp-devel \
libselinux-devel \
ostree-devel \
pkgconfig \
runc \
skopeo-containers
step 2: 现在源码切换到版本分支,并编译安装
git clone https://github.com/kubernetes-incubator/cri-o
git checkout -b release-1.10 remotes/origin/release-1.10
make install.tools
make BUILDTAGS=""
make install
make install.config
step 3: cni 网络配置
go get -u -d github.com/containernetworking/plugins
cd plugins
./build/sh
mkdir -p /opt/cni/bin
cp bin/* /opt/cni/bin
## 添加网络配置文件
mkdir /etc/cni/net.d
cp $GOPATH/src/github.com/kubernetes-incubator/cri-o/contrib/* /etc/cni/net.d
## 创建cni0 bridge
brctl addbr cni0
step 4: 修改/etc/crio/crio.conf
[crio.runtime]
manage_network_ns_lifecycle = true
runtime = "/usr/bin/runc"
runtime_untrusted_workload = "/usr/bin/kata-runtime"
default_workload_trust = "untrusted"
step 5: 启动cri-o
make install.systemd
systemctl start crio
step 6: 查看conmon进程
conmon是cri-o启动的进程,看下crio的日志,可以看到当crio接收到容器创建请求时,会启动运行conmon命令
running conmon: /usr/local/libexec/crio/conmon args=[-c 783731ce2309dbfcb435a2ff47abf768d11916e886dc7a0c9b1f2f9d9fbeea9f -u
783731ce2309dbfcb435a2ff47abf768d11916e886dc7a0c9b1f2f9d9fbeea9f -r /usr/local/bin/kata-runtime -b /var/run/containers/storage/overlay-containers/783731ce2309dbfcb435a
2ff47abf768d11916e886dc7a0c9b1f2f9d9fbeea9f/userdata -p /var/run/containers/storage/overlay-containers/783731ce2309dbfcb435a2ff47abf768d11916e886dc7a0c9b1f2f9d9fbeea9f
/userdata/pidfile -l /var/log/pods/8b67a313-8a39-11e8-909c-246e96275bc0/783731ce2309dbfcb435a2ff47abf768d11916e886dc7a0c9b1f2f9d9fbeea9f.log --exit-dir /var/run/crio/e
xits --socket-dir-path /var/run/crio]
running conmon: /usr/local/libexec/crio/conmon args=[-c 56d5fdeac760e903757db90da588cae7bb7a764baf4c2b2d49110114ba6d2baa -u
56d5fdeac760e903757db90da588cae7bb7a764baf4c2b2d49110114ba6d2baa -r /usr/local/bin/kata-runtime -b /var/run/containers/storage/overlay-containers/56d5fdeac760e903757db90da588cae7bb7a764baf4c2b2d49110114ba6d2baa/userdata -p /var/run/containers/storage/overlay-containers/56d5fdeac760e903757db90da588cae7bb7a764baf4c2b2d49110114ba6d2baa
/userdata/pidfile -l /var/log/pods/8b67a313-8a39-11e8-909c-246e96275bc0/nginx/0.log --exit-dir /var/run/crio/exits --socket-dir-path /var/run/crio]
conmon -c 2ba098d0682c9f1623f52f18ea5320087ab9b252ee22c83b3fdf9ea45d789322 -u 2ba098d0682c9f1623f52f18ea5320087ab9b252ee22c83b3fdf9ea45d789322
|-kata-proxy -listen-socket unix:///run/vc/sbs/2ba098d0682c9f1623f52f18ea5320087ab9b252ee22c83b3fdf9ea45d789322/proxy.sock -mux-socket/run
| `-7*[{kata-proxy}]
|-kata-shim -agent unix:///run/vc/sbs/2ba098d0682c9f1623f52f18ea5320087ab9b252ee22c83b3fdf9ea45d789322/proxy.sock -container2ba098d0682c9f1
| `-8*[{kata-shim}]
|-qemu-lite-syste -name sandbox-2ba098d0682c9f1623f52f18ea5320087ab9b252ee22c83b3fdf9ea45d789322 -uuid 83236965-4bac-4f32-a385-12c3c90c3d11 -machinepc,
| `-2*[{qemu-lite-syste}]
`-{conmon}
conmon -c 45c1ee0637fdc33324edeb63f3b8eeaffed1e683cc1cfe9d32a45d178fbb658e -u 45c1ee0637fdc33324edeb63f3b8eeaffed1e683cc1cfe9d32a45d178fbb658e
|-kata-shim -agent unix:///run/vc/sbs/2ba098d0682c9f1623f52f18ea5320087ab9b252ee22c83b3fdf9ea45d789322/proxy.sock -container45c1ee0637fdc33
| `-9*[{kata-shim}]
`-{conmon}
step 7: 修改k8s环境变量,并重启k8s cluster
CGROUP_DRIVER=systemd \
CONTAINER_RUNTIME=remote \
CONTAINER_RUNTIME_ENDPOINT='unix:///var/run/crio/crio.sock --runtime-request-timeout=15m' \
./hack/local-up-cluster.sh
step 8: 查看k8s服务状态
cluster/kubectl.sh get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health": "true"}
step 9: 创建测试pods
cat >ngnix_untrusted.yam <<EON
apiVersion: v1
kind: Pod
metadata:
name: nginx-untrusted
annotations:
io.kubernetes.cri.untrusted-workload: "true"
spec:
containers:
- name: nginx
image: nginx
EON
cluster/kubectl.sh apply -f nginx-untrusted.yaml
查看pod
cluster/kubectl.sh describe pod nginx-untrusted
根据pod输出的ip地址,可以正确访问ngnix的服务
curl -i -XGET http://192.168.223.97:80
# kata-runtime list
ID PID STATUS BUNDLE CREATED OWNER
edff5f14efc36145ef29853064fafbb2d1e60c7127e5e62160457d7ebf362a6b 38346 running /run/containers/storage/overlay-containers/edff5f14efc36145ef29853064fafbb2d1e60c7127e5e62160457d7ebf362a6b/userdata 2018-07-24T08:32:40.246491549Z #0
0b530a50d5a353ef9f61e18b16c412fadf0fe98f82766f9f9678add7ede49d25 38543 running /run/containers/storage/overlay-containers/0b530a50d5a353ef9f61e18b16c412fadf0fe98f82766f9f9678add7ede49d25/userdata 2018-07-24T08:33:21.832878116Z #0
# df
overlay 241963588 33358740 208604848 14% /var/lib/containers/storage/overlay/1953579948b2a51cc93709ee96770496599e54f5f2cde525cc7138861a294495/merged
overlay 241963588 33358740 208604848 14% /var/lib/containers/storage/overlay/93e320ca6218832f69984481a9ae945a2508b73ed4e3a17a69d0ee2a1aa54564/merged
runc
runc是docker贡献出来支持OCI的容器运行时项目,其实际上是在libcontainerd上封装了一层用于支持OCI,并提供CLI可以通过 runtime spec运行容器。
k8s本地安装
k8s安装除了使用官方提供的minikube, kubeadm工具外,kubernetes源码也提供了简单的脚本安装方法
go get -d -u github.com/kubernetes/kubernetes
bash -x hack/local-up-cluster.sh
# 查看kubernetes相关信息
cluster/kubectl.sh get pods
cluster/kubectl.sh get services
cluster/kubectl.sh get pods
cluster/kubectl.sh get services
cluster/kubectl.sh run my-nginx --image=nginx --replicas=1 --port=80
####
Alpine | CentOS | ClearLinux | EulerOS | Fedora | |
---|---|---|---|---|---|
ARM64 | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |
PPC64le | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | ||
x86_64 | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
问题记录
/usr/bin/docker-current: Error response from daemon: shim error: docker-runc not installed on system.
[root@bm48 ~]# locate docker-runc
/usr/libexec/docker/docker-runc-current
[root@bm48 ~]# ln -s /usr/libexec/docker/docker-runc-current /usr/libexec/docker/docker-runc
问题记录
如果系统上安装了oci-register-machine,需要设置oci-register-machine为disable,否则kata container 退出时, systemd-machined导致docker服务shutdown
Jul 19 03:40:54 bm48 oci-register-machine[184480]: 2018/07/19 03:40:54 Register machine: poststop fa9f842ed3407606d01b7bb490473455be6bda49f6d478699976c61e83f37028 184468
Jul 19 03:40:54 bm48 systemd-machined[131968]: Machine fa9f842ed3407606d01b7bb490473455 terminated.
Jul 19 03:40:54 bm48 dockerd-current[183954]: time="2018-07-19T03:40:54.397828183-04:00" level=info msg="Processing signal 'terminated'"
Jul 19 03:40:54 bm48 dockerd-current[183954]: time="2018-07-19T03:40:54.397987798-04:00" level=debug msg="starting clean shutdown of all containers..."
当oci-register-machine设置为enable时,使用docker创建kata container,会在/var/run/systemd/machine下注册一条machine记录,而这条记录 关联的unit时docker服务,原因有待分析
# ls -al /var/run/systemd/machines/
total 24
drwxr-xr-x. 2 root root 280 Jul 19 04:25 .
drwxr-xr-x. 18 root root 440 Jul 19 03:39 ..
-rw-r--r--. 1 root root 229 Jul 19 04:25 e04f4fe715665c32998b13250dab4b4a
lrwxrwxrwx. 1 root root 32 Jul 19 04:25 unit:docker.service -> e04f4fe715665c32998b13250dab4b4a
/etc/oci-register-machine.conf
# Disable oci-register-machine by setting the disabled field to true
disabled : true
kata container guest
kata container guest vm
application挂载实现
-chardev socket,id=charch0,path=/run/vc/sbs/2ed4a3afed3c3d3269ca230d87da940bcdb85a6f239fab015b2710b83253dc02/kata.sock,server,nowait
-device virtio-9p-pci,fsdev=extra-9p-kataShared,mount_tag=kataShared -fsdev local,id=extra-9p-kataShared,path=/run/kata-containers/shared/sandboxes/2ed4a3afed3c3d3269ca230d87da940bcdb85a6f239fab015b2710b83253dc02,security_model=none
qemu nvdimm
-machine pc,nvdimm
-m $RAM_SIZE,slots=$N,maxmem=$MAX_SIZE
-object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE
-device nvdimm,id=nvdimm1,memdev=mem1
-machine pc,accel=kvm,kernel_irqchip,nvdimm
-m 2048M,slots=2,maxmem=129554M
-device nvdimm,id=nv0,memdev=mem0
-object memory-backend-file,id=mem0,mem-path=/usr/share/kata-containers/kata-containers.img,size=536870912
-append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 \
i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 \
pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 debug \
systemd.show_status=true systemd.log_level=debug panic=1 initcall_debug nr_cpus=48 \
ip=::::::70694528ccaafd1e6c0cc593ae05a44536497c7aa381974566b49937e41dae39::off:: init=/usr/lib/systemd/systemd \
systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket \
agent.log=debug agent.sandbox=70694528ccaafd1e6c0cc593ae05a44536497c7aa381974566b49937e41dae39
问题记录:
- iptables invalid mask 64, 重新build cni plugins 修复
- group_manager = cgroupfs ,修改k8s cgroup_driver为cgroupfs