gvisor.dev/gvisor@v0.0.0-20240520182842-f9d4d51c7e0f/g3doc/user_guide/production.md

gvisor.dev/gvisor@v0.0.0-20240520182842-f9d4d51c7e0f/g3doc/user_guide/production.md (about)

     1  # Production guide
     2  
     3  gVisor adds additional layers of defense to your containers, but comes with some
     4  performance overhead. This page discusses **best practices** how and where to
     5  integrate sandboxing within your production stack, to take full advantage of
     6  gVisor's security benefits while minimizing overhead.
     7  
     8  [TOC]
     9  
    10  ## The role of sandboxing in your production stack {#role}
    11  
    12  At its core, gVisor sandboxes your containers, isolating them from the host's
    13  Linux kernel and from each other. This is relevant to your production stack for
    14  the following use-cases:
    15  
    16  *   Hardening **externally-reachable endpoints**, such as user-facing load
    17      balancers, web servers, public API endpoints, etc.
    18  *   Providing **defense-in-depth protection** for critical workloads handling
    19      sensitive information and/or under security compliance requirements, e.g.
    20      payment processing, sensitive data analysis pipelines.
    21  *   Safely **operating a multi-tenancy environment** with security isolation,
    22      such as when operating an app platform for multiple third-party customers.
    23  *   Providing additional features to your container stack, such as **intrusion
    24      detection** and **checkpoint save/restore**.
    25  *   Safely **running untrusted code**, such as when running
    26      third-party/user-provided code, or for software forensics. **Note**: This
    27      guide is not appropriate for this use-case, and will instead focus on how to
    28      run an existing **trusted** stack with gVisor.
    29  
    30  While gVisor is able to sandbox **any** application, it should generally not be
    31  used to sandbox **every** application.
    32  
    33  ## Attack surface reduction {#attack-surface}
    34  
    35  Because sandboxing comes with some performance overhead, you should first
    36  investigate ways to **reduce your outside attack surface** (without such
    37  overhead) as much as possible, prior to introducing sandboxing into your
    38  production stack at all.
    39  
    40  Consider running non-user-facing workloads in a separate virtual network, and
    41  only run the user-facing entry points into this network in a sandbox. You can
    42  also rely on network security with a service mesh like [Istio] to prevent
    43  network traffic from arriving at sensitive endpoints, though note that such
    44  solutions have [their own performance overhead][Istio overhead].
    45  
    46  If using a Cloud provider, consider **using your provider's hosted application
    47  solutions**, rather than rolling your own solution (sandboxed or not). This
    48  simultaneously reduces your ops burden, the overall attack surface you are
    49  personally responsible for, and most likely your overall Cloud provider charges.
    50  For example, if using Google Cloud, consider:
    51  
    52  *   [Cloud Spanner](https://cloud.google.com/spanner) as a database
    53  *   [Cloud Load Balancing](https://cloud.google.com/load-balancing) as load
    54      balancer
    55  *   [Cloud Storage](https://cloud.google.com/storage) for static file serving
    56  
    57  These parts of your stack coincide with where sandboxing performance overhead is
    58  most prevalent, so keeping them outside of your sandboxed perimeter provides
    59  significant benefits.
    60  
    61  ## Security/performance trade-off {#security-vs-performance}
    62  
    63  Once you've reduced your attack surface and have identified the components of
    64  your production stack that may benefit from sandboxing, you still need to
    65  determine whether the security benefits sandboxing provides are worth the
    66  performance overhead.
    67  
    68  gVisor protects your workload by intercepting system calls and emulating them in
    69  userspace. This shields the host Linux kernel and the sandboxed application from
    70  each other, **protecting against most Linux CVEs**, **container escape
    71  vulnerabilities**, and making **remote privilege-escalation attacks** less
    72  impactful. See [Security Model] for more details.
    73  
    74  On the other hand, sandboxing has a **performance penalty**. This overhead is
    75  multi-faceted and highly depends on the behavior of the workload being
    76  sandboxed. As a general guideline, **I/O-heavy** (*e.g. databases*) and
    77  **network-heavy** (*e.g. load balancers*) workloads will see degraded
    78  performance, whereas **CPU-bound** workloads (*e.g. API servers, non-static web
    79  servers, data pipelines*) will see minimal or no overhead. See
    80  [Performance Guide] for more details and data.
    81  
    82  Ultimately, the decision of whether to run each workload in a sandbox or not
    83  comes down to a **balancing decision** between:
    84  
    85  *   Sensitivity of your critical data
    86  *   Applicable compliance obligations and regulations
    87  *   Your organization's budget and PR risk tolerance
    88  *   Your application's performance requirements
    89  *   Overall security diligence
    90  
    91  The following diagram summarizes the security/performance tradeoff for various
    92  approaches of adding sandboxing to a typical stack.
    93  
    94  ![Sandboxing tradeoff](sandboxing-tradeoffs.png "Sandboxing security/performance tradeoffs.")
    95  
    96  ## Configuring gVisor for optimal performance {#configure-for-performance}
    97  
    98  Once you've identified the workloads that absolutely need sandboxing, it is
    99  worth spending some time to configure gVisor for optimal performance.
   100  
   101  ### Choosing a platform {#configure-platform}
   102  
   103  gVisor supports multiple low-level implementations called Platforms (See
   104  [Platform architecture] for a detailed overview). Picking the right platform for
   105  your environment is the **highest-impact performance decision**.
   106  
   107  [GKE Sandbox] uses an optimized, custom platform which will provide good
   108  performance with no tuning required.
   109  
   110  When using gVisor outside of GKE Sandbox, we recommend **running gVisor on
   111  bare-metal machines** (not VMs). In such a setup, use the KVM platform for best
   112  performance.
   113  
   114  If you absolutely must run gVisor in a virtual machine, we recommend using the
   115  `systrap` platform. This platform has the most flexibility, but its performance
   116  will lag behind that of KVM.
   117  
   118  <a class="button" href="/docs/user_guide/platforms/">Configure Platform
   119  &raquo;</a>
   120  
   121  ### Optimizing I/O performance {#configure-io}
   122  
   123  **File I/O is typically the most impacted performance characteristic** of a
   124  gVisor-sandboxed workload. Because gVisor is a general-purpose sandbox, its
   125  default configuration must support all possible I/O interaction patterns.
   126  However, you can configure gVisor to use more aggressive caching policies where
   127  it makes sense.
   128  
   129  <a class="button" href="/docs/user_guide/filesystem/">Configure Filesystem
   130  &raquo;</a>
   131  
   132  ### Optimizing network performance {#configure-network}
   133  
   134  **Networking is typically the second most-impacted performance characteristic**
   135  of a gVisor-sandboxed workload. gVisor implements its own network stack, which
   136  is optimized for security over performance. If your application is semi-trusted
   137  and network performance is paramount, you can optionally enable Network
   138  Passthrough to use the host's (Linux's) network stack, rather than gVisor's own.
   139  
   140  <a class="button" href="/docs/user_guide/networking/">Configure Networking
   141  &raquo;</a>
   142  
   143  [Istio]: https://istio.io/
   144  [Istio overhead]: https://istio.io/latest/docs/ops/deployment/performance-and-scalability/
   145  [Security Model]: /docs/architecture_guide/security/
   146  [Performance Guide]: /docs/architecture_guide/performance/
   147  [Platform architecture]: /docs/architecture_guide/platforms/
   148  [Spectre]: https://en.wikipedia.org/wiki/Spectre_(security_vulnerability)
   149  [Denial-of-Service attacks]: https://httpd.apache.org/docs/trunk/misc/security_tips.html
   150  [GKE Sandbox]: https://cloud.google.com/kubernetes-engine/docs/concepts/sandbox-pods