github.com/SagerNet/gvisor@v0.0.0-20210707092255-7731c139d75c/g3doc/README.md (about) 1 # What is gVisor? 2 3 gVisor is an application kernel, written in Go, that implements a substantial 4 portion of the [Linux system call interface][linux]. It provides an additional 5 layer of isolation between running applications and the host operating system. 6 7 gVisor includes an [Open Container Initiative (OCI)][oci] runtime called `runsc` 8 that makes it easy to work with existing container tooling. The `runsc` runtime 9 integrates with Docker and Kubernetes, making it simple to run sandboxed 10 containers. 11 12 gVisor can be used with Docker, Kubernetes, or directly using `runsc`. Use the 13 links below to see detailed instructions for each of them: 14 15 * [Docker](./user_guide/quick_start/docker.md): The quickest and easiest way 16 to get started. 17 * [Kubernetes](./user_guide/quick_start/kubernetes.md): Isolate Pods in your 18 K8s cluster with gVisor. 19 * [OCI Quick Start](./user_guide/quick_start/oci.md): Expert mode. Customize 20 gVisor for your environment. 21 22 ## What does gVisor do? 23 24 gVisor provides a virtualized environment in order to sandbox containers. The 25 system interfaces normally implemented by the host kernel are moved into a 26 distinct, per-sandbox application kernel in order to minimize the risk of an 27 container escape exploit. gVisor does not introduce large fixed overheads 28 however, and still retains a process-like model with respect to resource 29 utilization. 30 31 ## How is this different? 32 33 Two other approaches are commonly taken to provide stronger isolation than 34 native containers. 35 36 **Machine-level virtualization**, such as [KVM][kvm] and [Xen][xen], exposes 37 virtualized hardware to a guest kernel via a Virtual Machine Monitor (VMM). This 38 virtualized hardware is generally enlightened (paravirtualized) and additional 39 mechanisms can be used to improve the visibility between the guest and host 40 (e.g. balloon drivers, paravirtualized spinlocks). Running containers in 41 distinct virtual machines can provide great isolation, compatibility and 42 performance (though nested virtualization may bring challenges in this area), 43 but for containers it often requires additional proxies and agents, and may 44 require a larger resource footprint and slower start-up times. 45 46 ![Machine-level virtualization](Machine-Virtualization.png "Machine-level virtualization") 47 48 **Rule-based execution**, such as [seccomp][seccomp], [SELinux][selinux] and 49 [AppArmor][apparmor], allows the specification of a fine-grained security policy 50 for an application or container. These schemes typically rely on hooks 51 implemented inside the host kernel to enforce the rules. If the surface can be 52 made small enough, then this is an excellent way to sandbox applications and 53 maintain native performance. However, in practice it can be extremely difficult 54 (if not impossible) to reliably define a policy for arbitrary, previously 55 unknown applications, making this approach challenging to apply universally. 56 57 ![Rule-based execution](Rule-Based-Execution.png "Rule-based execution") 58 59 Rule-based execution is often combined with additional layers for 60 defense-in-depth. 61 62 **gVisor** provides a third isolation mechanism, distinct from those above. 63 64 gVisor intercepts application system calls and acts as the guest kernel, without 65 the need for translation through virtualized hardware. gVisor may be thought of 66 as either a merged guest kernel and VMM, or as seccomp on steroids. This 67 architecture allows it to provide a flexible resource footprint (i.e. one based 68 on threads and memory mappings, not fixed guest physical resources) while also 69 lowering the fixed costs of virtualization. However, this comes at the price of 70 reduced application compatibility and higher per-system call overhead. 71 72 ![gVisor](Layers.png "gVisor") 73 74 On top of this, gVisor employs rule-based execution to provide defense-in-depth 75 (details below). 76 77 gVisor's approach is similar to [User Mode Linux (UML)][uml], although UML 78 virtualizes hardware internally and thus provides a fixed resource footprint. 79 80 Each of the above approaches may excel in distinct scenarios. For example, 81 machine-level virtualization will face challenges achieving high density, while 82 gVisor may provide poor performance for system call heavy workloads. 83 84 ## Why Go? 85 86 gVisor is written in [Go][golang] in order to avoid security pitfalls that can 87 plague kernels. With Go, there are strong types, built-in bounds checks, no 88 uninitialized variables, no use-after-free, no stack overflow, and a built-in 89 race detector. However, the use of Go has its challenges, and the runtime often 90 introduces performance overhead. 91 92 ## What are the different components? 93 94 A gVisor sandbox consists of multiple processes. These processes collectively 95 comprise an environment in which one or more containers can be run. 96 97 Each sandbox has its own isolated instance of: 98 99 * The **Sentry**, which is a kernel that runs the containers and intercepts 100 and responds to system calls made by the application. 101 102 Each container running in the sandbox has its own isolated instance of: 103 104 * A **Gofer** which provides file system access to the containers. 105 106 ![gVisor architecture diagram](Sentry-Gofer.png "gVisor architecture diagram") 107 108 ## What is runsc? 109 110 The entrypoint to running a sandboxed container is the `runsc` executable. 111 `runsc` implements the [Open Container Initiative (OCI)][oci] runtime 112 specification, which is used by Docker and Kubernetes. This means that OCI 113 compatible _filesystem bundles_ can be run by `runsc`. Filesystem bundles are 114 comprised of a `config.json` file containing container configuration, and a root 115 filesystem for the container. Please see the [OCI runtime spec][runtime-spec] 116 for more information on filesystem bundles. `runsc` implements multiple commands 117 that perform various functions such as starting, stopping, listing, and querying 118 the status of containers. 119 120 ### Sentry {#sentry} 121 122 The Sentry is the largest component of gVisor. It can be thought of as a 123 application kernel. The Sentry implements all the kernel functionality needed by 124 the application, including: system calls, signal delivery, memory management and 125 page faulting logic, the threading model, and more. 126 127 When the application makes a system call, the 128 [Platform](./architecture_guide/platforms.md) redirects the call to the Sentry, 129 which will do the necessary work to service it. It is important to note that the 130 Sentry does not pass system calls through to the host kernel. As a userspace 131 application, the Sentry will make some host system calls to support its 132 operation, but it does not allow the application to directly control the system 133 calls it makes. For example, the Sentry is not able to open files directly; file 134 system operations that extend beyond the sandbox (not internal `/proc` files, 135 pipes, etc) are sent to the Gofer, described below. 136 137 ### Gofer {#gofer} 138 139 The Gofer is a standard host process which is started with each container and 140 communicates with the Sentry via the [9P protocol][9p] over a socket or shared 141 memory channel. The Sentry process is started in a restricted seccomp container 142 without access to file system resources. The Gofer mediates all access to the 143 these resources, providing an additional level of isolation. 144 145 ### Application {#application} 146 147 The application is a normal Linux binary provided to gVisor in an OCI runtime 148 bundle. gVisor aims to provide an environment equivalent to Linux v4.4, so 149 applications should be able to run unmodified. However, gVisor does not 150 presently implement every system call, `/proc` file, or `/sys` file so some 151 incompatibilities may occur. See [Compatibility](./user_guide/compatibility.md) 152 for more information. 153 154 [9p]: https://en.wikipedia.org/wiki/9P_(protocol) 155 [apparmor]: https://wiki.ubuntu.com/AppArmor 156 [golang]: https://golang.org 157 [kvm]: https://www.linux-kvm.org 158 [linux]: https://en.wikipedia.org/wiki/Linux_kernel_interfaces 159 [oci]: https://www.opencontainers.org 160 [runtime-spec]: https://github.com/opencontainers/runtime-spec 161 [seccomp]: https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt 162 [selinux]: https://selinuxproject.org 163 [uml]: http://user-mode-linux.sourceforge.net/ 164 [xen]: https://www.xenproject.org