gvisor.dev/gvisor@v0.0.0-20240520182842-f9d4d51c7e0f/g3doc/user_guide/production.md (about) 1 # Production guide 2 3 gVisor adds additional layers of defense to your containers, but comes with some 4 performance overhead. This page discusses **best practices** how and where to 5 integrate sandboxing within your production stack, to take full advantage of 6 gVisor's security benefits while minimizing overhead. 7 8 [TOC] 9 10 ## The role of sandboxing in your production stack {#role} 11 12 At its core, gVisor sandboxes your containers, isolating them from the host's 13 Linux kernel and from each other. This is relevant to your production stack for 14 the following use-cases: 15 16 * Hardening **externally-reachable endpoints**, such as user-facing load 17 balancers, web servers, public API endpoints, etc. 18 * Providing **defense-in-depth protection** for critical workloads handling 19 sensitive information and/or under security compliance requirements, e.g. 20 payment processing, sensitive data analysis pipelines. 21 * Safely **operating a multi-tenancy environment** with security isolation, 22 such as when operating an app platform for multiple third-party customers. 23 * Providing additional features to your container stack, such as **intrusion 24 detection** and **checkpoint save/restore**. 25 * Safely **running untrusted code**, such as when running 26 third-party/user-provided code, or for software forensics. **Note**: This 27 guide is not appropriate for this use-case, and will instead focus on how to 28 run an existing **trusted** stack with gVisor. 29 30 While gVisor is able to sandbox **any** application, it should generally not be 31 used to sandbox **every** application. 32 33 ## Attack surface reduction {#attack-surface} 34 35 Because sandboxing comes with some performance overhead, you should first 36 investigate ways to **reduce your outside attack surface** (without such 37 overhead) as much as possible, prior to introducing sandboxing into your 38 production stack at all. 39 40 Consider running non-user-facing workloads in a separate virtual network, and 41 only run the user-facing entry points into this network in a sandbox. You can 42 also rely on network security with a service mesh like [Istio] to prevent 43 network traffic from arriving at sensitive endpoints, though note that such 44 solutions have [their own performance overhead][Istio overhead]. 45 46 If using a Cloud provider, consider **using your provider's hosted application 47 solutions**, rather than rolling your own solution (sandboxed or not). This 48 simultaneously reduces your ops burden, the overall attack surface you are 49 personally responsible for, and most likely your overall Cloud provider charges. 50 For example, if using Google Cloud, consider: 51 52 * [Cloud Spanner](https://cloud.google.com/spanner) as a database 53 * [Cloud Load Balancing](https://cloud.google.com/load-balancing) as load 54 balancer 55 * [Cloud Storage](https://cloud.google.com/storage) for static file serving 56 57 These parts of your stack coincide with where sandboxing performance overhead is 58 most prevalent, so keeping them outside of your sandboxed perimeter provides 59 significant benefits. 60 61 ## Security/performance trade-off {#security-vs-performance} 62 63 Once you've reduced your attack surface and have identified the components of 64 your production stack that may benefit from sandboxing, you still need to 65 determine whether the security benefits sandboxing provides are worth the 66 performance overhead. 67 68 gVisor protects your workload by intercepting system calls and emulating them in 69 userspace. This shields the host Linux kernel and the sandboxed application from 70 each other, **protecting against most Linux CVEs**, **container escape 71 vulnerabilities**, and making **remote privilege-escalation attacks** less 72 impactful. See [Security Model] for more details. 73 74 On the other hand, sandboxing has a **performance penalty**. This overhead is 75 multi-faceted and highly depends on the behavior of the workload being 76 sandboxed. As a general guideline, **I/O-heavy** (*e.g. databases*) and 77 **network-heavy** (*e.g. load balancers*) workloads will see degraded 78 performance, whereas **CPU-bound** workloads (*e.g. API servers, non-static web 79 servers, data pipelines*) will see minimal or no overhead. See 80 [Performance Guide] for more details and data. 81 82 Ultimately, the decision of whether to run each workload in a sandbox or not 83 comes down to a **balancing decision** between: 84 85 * Sensitivity of your critical data 86 * Applicable compliance obligations and regulations 87 * Your organization's budget and PR risk tolerance 88 * Your application's performance requirements 89 * Overall security diligence 90 91 The following diagram summarizes the security/performance tradeoff for various 92 approaches of adding sandboxing to a typical stack. 93 94 ![Sandboxing tradeoff](sandboxing-tradeoffs.png "Sandboxing security/performance tradeoffs.") 95 96 ## Configuring gVisor for optimal performance {#configure-for-performance} 97 98 Once you've identified the workloads that absolutely need sandboxing, it is 99 worth spending some time to configure gVisor for optimal performance. 100 101 ### Choosing a platform {#configure-platform} 102 103 gVisor supports multiple low-level implementations called Platforms (See 104 [Platform architecture] for a detailed overview). Picking the right platform for 105 your environment is the **highest-impact performance decision**. 106 107 [GKE Sandbox] uses an optimized, custom platform which will provide good 108 performance with no tuning required. 109 110 When using gVisor outside of GKE Sandbox, we recommend **running gVisor on 111 bare-metal machines** (not VMs). In such a setup, use the KVM platform for best 112 performance. 113 114 If you absolutely must run gVisor in a virtual machine, we recommend using the 115 `systrap` platform. This platform has the most flexibility, but its performance 116 will lag behind that of KVM. 117 118 <a class="button" href="/docs/user_guide/platforms/">Configure Platform 119 »</a> 120 121 ### Optimizing I/O performance {#configure-io} 122 123 **File I/O is typically the most impacted performance characteristic** of a 124 gVisor-sandboxed workload. Because gVisor is a general-purpose sandbox, its 125 default configuration must support all possible I/O interaction patterns. 126 However, you can configure gVisor to use more aggressive caching policies where 127 it makes sense. 128 129 <a class="button" href="/docs/user_guide/filesystem/">Configure Filesystem 130 »</a> 131 132 ### Optimizing network performance {#configure-network} 133 134 **Networking is typically the second most-impacted performance characteristic** 135 of a gVisor-sandboxed workload. gVisor implements its own network stack, which 136 is optimized for security over performance. If your application is semi-trusted 137 and network performance is paramount, you can optionally enable Network 138 Passthrough to use the host's (Linux's) network stack, rather than gVisor's own. 139 140 <a class="button" href="/docs/user_guide/networking/">Configure Networking 141 »</a> 142 143 [Istio]: https://istio.io/ 144 [Istio overhead]: https://istio.io/latest/docs/ops/deployment/performance-and-scalability/ 145 [Security Model]: /docs/architecture_guide/security/ 146 [Performance Guide]: /docs/architecture_guide/performance/ 147 [Platform architecture]: /docs/architecture_guide/platforms/ 148 [Spectre]: https://en.wikipedia.org/wiki/Spectre_(security_vulnerability) 149 [Denial-of-Service attacks]: https://httpd.apache.org/docs/trunk/misc/security_tips.html 150 [GKE Sandbox]: https://cloud.google.com/kubernetes-engine/docs/concepts/sandbox-pods