github.com/SagerNet/gvisor@v0.0.0-20210707092255-7731c139d75c/website/blog/2020-04-02-networking-security.md (about) 1 # gVisor Networking Security 2 3 In our 4 [first blog post](https://gvisor.dev/blog/2019/11/18/gvisor-security-basics-part-1/), 5 we covered some secure design principles and how they guided the architecture of 6 gVisor as a whole. In this post, we will cover how these principles guided the 7 networking architecture of gVisor, and the tradeoffs involved. In particular, we 8 will cover how these principles culminated in two networking modes, how they 9 work, and the properties of each. 10 11 ## gVisor's security architecture in the context of networking 12 13 Linux networking is complicated. The TCP protocol is over 40 years old, and has 14 been repeatedly extended over the years to keep up with the rapid pace of 15 network infrastructure improvements, all while maintaining compatibility. On top 16 of that, Linux networking has a fairly large API surface. Linux supports 17 [over 150 options](https://github.com/google/gvisor/blob/960f6a975b7e44c0efe8fd38c66b02017c4fe137/pkg/sentry/strace/socket.go#L476-L644) 18 for the most common socket types alone. In fact, the net subsystem is one of the 19 largest and fastest growing in Linux at approximately 1.1 million lines of code. 20 For comparison, that is several times the size of the entire gVisor codebase. 21 22 At the same time, networking is increasingly important. The cloud era is 23 arguably about making everything a network service, and in order to make that 24 work, the interconnect performance is critical. Adding networking support to 25 gVisor was difficult, not just due to the inherent complexity, but also because 26 it has the potential to significantly weaken gVisor's security model. 27 28 As outlined in the previous blog post, gVisor's 29 [secure design principles](https://gvisor.dev/blog/2019/11/18/gvisor-security-basics-part-1/#design-principles) 30 are: 31 32 1. Defense in Depth: each component of the software stack trusts each other 33 component as little as possible. 34 1. Least Privilege: each software component has only the permissions it needs 35 to function, and no more. 36 1. Attack Surface Reduction: limit the surface area of the host exposed to the 37 sandbox. 38 1. Secure by Default: the default choice for a user should be safe. 39 40 gVisor manifests these principles as a multi-layered system. An application 41 running in the sandbox interacts with the Sentry, a userspace kernel, which 42 mediates all interactions with the Host OS and beyond. The Sentry is written in 43 pure Go with minimal unsafe code, making it less vulnerable to buffer overflows 44 and related memory bugs that can lead to a variety of compromises including code 45 injection. It emulates Linux using only a minimal and audited set of Host OS 46 syscalls that limit the Host OS's attack surface exposed to the Sentry itself. 47 The syscall restrictions are enforced by running the Sentry with seccomp 48 filters, which enforce that the Sentry can only use the expected set of 49 syscalls. The Sentry runs as an unprivileged user and in namespaces, which, 50 along with the seccomp filters, ensure that the Sentry is run with the Least 51 Privilege required. 52 53 gVisor's multi-layered design provides Defense in Depth. The Sentry, which does 54 not trust the application because it may attack the Sentry and try to bypass it, 55 is the first layer. The sandbox that the Sentry runs in is the second layer. If 56 the Sentry were compromised, the attacker would still be in a highly restrictive 57 sandbox which they must also break out of in order to compromise the Host OS. 58 59 To enable networking functionality while preserving gVisor's security 60 properties, we implemented a 61 [userspace network stack](https://github.com/google/gvisor/tree/master/pkg/tcpip) 62 in the Sentry, which we creatively named Netstack. Netstack is also written in 63 Go, not only to avoid unsafe code in the network stack itself, but also to avoid 64 a complicated and unsafe Foreign Function Interface. Having its own integrated 65 network stack allows the Sentry to implement networking operations using up to 66 three Host OS syscalls to read and write packets. These syscalls allow a very 67 minimal set of operations which are already allowed (either through the same or 68 a similar syscall). Moreover, because packets typically come from off-host (e.g. 69 the internet), the Host OS's packet processing code has received a lot of 70 scrutiny, hopefully resulting in a high degree of hardening. 71 72 ![Figure 1](/assets/images/2020-04-02-networking-security-figure1.png "Network and gVisor.") 73 74 ## Writing a network stack 75 76 Netstack was written from scratch specifically for gVisor. Because Netstack was 77 designed and implemented to be modular, flexible and self-contained, there are 78 now several more projects using Netstack in creative and exciting ways. As we 79 discussed, a custom network stack has enabled a variety of security-related 80 goals which would not have been possible any other way. This came at a cost 81 though. Network stacks are complex and writing a new one comes with many 82 challenges, mostly related to application compatibility and performance. 83 84 Compatibility issues typically come in two forms: missing features, and features 85 with behavior that differs from Linux (usually due to bugs). Both of these are 86 inevitable in an implementation of a complex system spanning many quickly 87 evolving and ambiguous standards. However, we have invested heavily in this 88 area, and the vast majority of applications have no issues using Netstack. For 89 example, 90 [we now support setting 34 different socket options](https://github.com/google/gvisor/blob/815df2959a76e4a19f5882e40402b9bbca9e70be/pkg/sentry/socket/netstack/netstack.go#L830-L1764) 91 versus 92 [only 7 in our initial git commit](https://github.com/google/gvisor/blob/d02b74a5dcfed4bfc8f2f8e545bca4d2afabb296/pkg/sentry/socket/epsocket/epsocket.go#L445-L702). 93 We are continuing to make good progress in this area. 94 95 Performance issues typically come from TCP behavior and packet processing speed. 96 To improve our TCP behavior, we are working on implementing the full set of TCP 97 RFCs. There are many RFCs which are significant to performance (e.g. 98 [RACK](https://tools.ietf.org/id/draft-ietf-tcpm-rack-03.html) and 99 [BBR](https://tools.ietf.org/html/draft-cardwell-iccrg-bbr-congestion-control-00)) 100 that we have yet to implement. This mostly affects TCP performance with 101 non-ideal network conditions (e.g. cross continent connections). Faster packet 102 processing mostly improves TCP performance when network conditions are very good 103 (e.g. within a datacenter). Our primary strategy here is to reduce interactions 104 with the Go runtime, specifically the garbage collector (GC) and scheduler. We 105 are currently optimizing buffer management to reduce the amount of garbage, 106 which will lower the GC cost. To reduce scheduler interactions, we are 107 re-architecting the TCP implementation to use fewer goroutines. Performance 108 today is good enough for most applications and we are making steady 109 improvements. For example, since May of 2019, we have improved the Netstack 110 runsc 111 [iperf3 download benchmark](https://github.com/google/gvisor/tree/master/test/benchmarks/network) 112 score by roughly 15% and upload score by around 10,000X. Current numbers are 113 about 17 Gbps download and about 8 Gbps upload versus about 42 Gbps and 43 Gbps 114 for native (Linux) respectively. 115 116 ## An alternative 117 118 We also offer an alternative network mode: passthrough. This name can be 119 misleading as syscalls are never passed through from the app to the Host OS. 120 Instead, the passthrough mode implements networking in gVisor using the Host 121 OS's network stack. (This mode is called 122 [hostinet](https://github.com/google/gvisor/tree/master/pkg/sentry/socket/hostinet) 123 in the codebase.) Passthrough mode can improve performance for some use cases as 124 the Host OS's network stack has had an enormous number of person-years poured 125 into making it highly performant. However, there is a rather large downside to 126 using passthrough mode: it weakens gVisor's security model by increasing the 127 Host OS's Attack Surface. This is because using the Host OS's network stack 128 requires the Sentry to use the Host OS's 129 [Berkeley socket interface](https://en.wikipedia.org/wiki/Berkeley_sockets). The 130 Berkeley socket interface is a much larger API surface than the packet interface 131 that our network stack uses. When passthrough mode is in use, the Sentry is 132 allowed to use 133 [15 additional syscalls](https://github.com/google/gvisor/blob/b1576e533223e98ebe4bd1b82b04e3dcda8c4bf1/runsc/boot/filter/config.go#L312-L517). 134 Further, this set of syscalls includes some that allow the Sentry to create file 135 descriptors, something that 136 [we don't normally allow](https://gvisor.dev/blog/2019/11/18/gvisor-security-basics-part-1/#sentry-host-os-interface) 137 as it opens up classes of file-based attacks. 138 139 There are some networking features that we can't implement on top of syscalls 140 that we feel are safe (most notably those behind 141 [ioctl](http://man7.org/linux/man-pages/man2/ioctl.2.html)) and therefore are 142 not supported. Because of this, we actually support fewer networking features in 143 passthrough mode than we do in Netstack, reducing application compatibility. 144 That's right: using our networking stack provides better overall application 145 compatibility than using our passthrough mode. 146 147 That said, gVisor with passthrough networking still provides a high level of 148 isolation. Applications cannot specify host syscall arguments directly, and the 149 sentry's seccomp policy restricts its syscall use significantly more than a 150 general purpose seccomp policy. 151 152 ## Secure by Default 153 154 The goal of the Secure by Default principle is to make it easy to securely 155 sandbox containers. Of course, disabling network access entirely is the most 156 secure option, but that is not practical for most applications. To make gVisor 157 Secure by Default, we have made Netstack the default networking mode in gVisor 158 as we believe that it provides significantly better isolation. For this reason 159 we strongly caution users from changing the default unless Netstack flat out 160 won't work for them. The passthrough mode option is still provided, but we want 161 users to make an informed decision when selecting it. 162 163 Another way in which gVisor makes it easy to securely sandbox containers is by 164 allowing applications to run unmodified, with no special configuration needed. 165 In order to do this, gVisor needs to support all of the features and syscalls 166 that applications use. Neither seccomp nor gVisor's passthrough mode can do this 167 as applications commonly use syscalls which are too dangerous to be included in 168 a secure policy. Even if this dream isn't fully realized today, gVisor's 169 architecture with Netstack makes this possible. 170 171 ## Give Netstack a Try 172 173 If you haven't already, try running a workload in gVisor with Netstack. You can 174 find instructions on how to get started in our 175 [Quick Start](/docs/user_guide/quick_start/docker/). We want to hear about both 176 your successes and any issues you encounter. We welcome your contributions, 177 whether that be verbal feedback or code contributions, via our 178 [Gitter channel](https://gitter.im/gvisor/community), 179 [email list](https://groups.google.com/forum/#!forum/gvisor-users), 180 [issue tracker](https://gvisor.dev/issue/new), and 181 [Github repository](https://github.com/google/gvisor). Feel free to express 182 interest in an [open issue](https://gvisor.dev/issue/), or reach out if you 183 aren't sure where to start.