github.com/mgoltzsche/ctnr@v0.7.1-alpha/nested-containers.md (about) 1 # Experiments with nested containers 2 3 ... in the repository directory on an ubuntu 16.04 host. 4 5 6 ## Run ctnr container inside privileged docker container 7 ``` 8 docker run -ti --rm --privileged \ 9 -v "$(pwd)/dist/bin/ctnr:/bin/ctnr" \ 10 -v "$(pwd)/image-policy-example.json:/etc/containers/policy.json" \ 11 alpine:3.7 12 > ctnr run -t --network=host docker://alpine:3.7 13 ``` 14 15 16 ## Run ctnr container inside unprivileged user's privileged ctnr container 17 ``` 18 dist/bin/ctnr run -t --privileged \ 19 -v "$(pwd)/dist/bin/ctnr:/bin/ctnr" \ 20 -v "$(pwd)/image-policy-example.json:/etc/containers/policy.json" \ 21 --image-policy=image-policy-example.json \ 22 docker://alpine:3.7 23 > ctnr run -t --rootless --network=host docker://alpine:3.7 24 ``` 25 26 27 ## Not working: Run ctnr container inside unprivileged docker container 28 ``` 29 docker run -ti --rm \ 30 -v "$(pwd)/dist/bin/ctnr:/bin/ctnr" \ 31 -v "$(pwd)/image-policy-example.json:/etc/containers/policy.json" \ 32 alpine:3.7 33 > ctnr run -ti --rootless --network=host docker://alpine:3.7 34 ``` 35 Error: Cannot change the process namespace ("running exec setns process for init caused \"exit status 34\"") 36 => seccomp denies setns 37 38 Adding a custom seccomp profile solves this problem but... 39 (TODO: use docker-default apparmor profile without `deny mount`, see https://github.com/moby/moby/blob/master/profiles/apparmor/template.go) 40 ``` 41 docker run -ti --rm --user=`id -u`:`id -g` \ 42 --security-opt apparmor=unconfined \ 43 --security-opt seccomp="$(pwd)/seccomp-container.json" \ 44 -v /sys/fs/cgroup:/sys/fs/cgroup:ro \ 45 -v "$HOME/.ctnr:/.ctnr" \ 46 -v "$(pwd)/dist/bin:/usr/local/bin" \ 47 -v "$(pwd)/image-policy-example.json:/etc/containers/policy.json" \ 48 debian:9 /bin/bash 49 $ ctnr --state-dir /tmp/ctnr run --verbose -ti -b test --update --rootless --no-new-keyring --no-pivot docker://alpine:3.8 50 ``` 51 Error: run process: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"rootfs_linux.go:58: mounting \\\"proc\\\" to rootfs \\\"/.ctnr/bundles/test/rootfs\\\" at \\\"/proc\\\" caused \\\"operation not permitted\\\"\"" 52 => proc cannot be mounted 53 => See https://github.com/opencontainers/runc/issues/1658 54 55 56 ## How to analyze container problems 57 - Run parent container with `CAP_SYS_PTRACE` capability and child container with 58 `strace -ff` to debug system calls 59 - Run moby's `check-config` script _(requires kernel config to be mounted)_: 60 ``` 61 apk update && apk add bash 62 wget -O /bin/chcfg https://raw.githubusercontent.com/moby/moby/master/contrib/check-config.sh 63 chmod +x /bin/chcfg && chcfg 64 ``` 65 66 67 ## Known errors and workarounds to run a container in another container 68 69 _Workarounds you do not want to do_ 70 _(also see https://github.com/opencontainers/runc/issues/1456)_ 71 72 - "running exec setns process for init caused \"exit status 34\"" 73 -> inner container: add `--rootless` option (if that has no effect: add setns syscall to list of SCMP_ACT_ALLOW calls (TODO: which syscall exactly?)) 74 -> {root} (outer container: add `--seccomp=unconfined` option) 75 -> add `--cap-add=SYS_ADMIN` to rootless outer container and `--rootless` to inner 76 - "mkdir /sys/fs/cgroup/cpuset/05dh[...]: permission denied" 77 -> inner container: add --rootless option 78 -> {ctnr} outer container: add --mount-cgroup=rw option 79 - "could not create session key: operation not permitted" 80 -> inner container: enable --no-new-keyring option 81 -> outer container: allow corresponding syscall in seccomp profile (dirty: set --seccomp=unconfined) 82 - "pivot_root operation not permitted" 83 -> inner container: enable --no-pivot option 84 -> outer container: seccomp: add "pivot_root" syscall to the list of SCMP_ACT_ALLOW calls 85 86 *Note regarding cgroups*: 87 The cgroup hierarchy can be mounted into a container using `--mount-cgroups=rw`. 88 Currently this is a security vulnerability since all cgroups are mounted writeable. 89 When using kernel >=4.6 it is possible to only make the process' cgroups writeable 90 (see https://github.com/opencontainers/runc/issues/225). 91 92 93 ## Summary so far 94 Containers can be run in privileged containers but nesting them in unprivileged containers is still problematic. 95 Docker's sane seccomp and apparmor default profiles deny syscalls that are required to run a container. 96 The seccomp profile denies `setns` and a few other syscalls. The apparmor profile denies `mount`. 97 Unfortunately it still doesn't run when apparmor is disabled (or better a custom profile provided that allows mount) 98 and a custom seccomp profile is provided since /proc cannot be mounted since masked by docker 99 (see https://github.com/opencontainers/runc/issues/1658, 100 https://lists.linuxfoundation.org/pipermail/containers/2018-April/038864.html 101 and https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1533642.html).