gvisor.dev/gvisor@v0.0.0-20240520182842-f9d4d51c7e0f/website/blog/2021-08-31-gvisor-rack.md

gvisor.dev/gvisor@v0.0.0-20240520182842-f9d4d51c7e0f/website/blog/2021-08-31-gvisor-rack.md (about)

1 # gVisor RACK
2
3 gVisor has implemented the [RACK](https://datatracker.ietf.org/doc/html/rfc8985)
4 (Recent ACKnowledgement) TCP loss-detection algorithm in our network stack,
5 which improves throughput in the presence of packet loss and reordering.
6
7 TCP is a connection-oriented protocol that detects and recovers from loss by
8 retransmitting packets. [RACK](https://datatracker.ietf.org/doc/html/rfc8985) is
9 one of the recent loss-detection methods implemented in Linux and BSD, which
10 helps in identifying packet loss quickly and accurately in the presence of
11 packet reordering and tail losses.
12
13 ## Background
14
15 The TCP congestion window indicates the number of unacknowledged packets that
16 can be sent at any time. When packet loss is identified, the congestion window
17 is reduced depending on the type of loss. The sender will recover from the loss
18 after all the packets sent before reducing the congestion window are
19 acknowledged. If the loss is identified falsely by the connection, then the
20 connection enters loss recovery unnecessarily, resulting in sending fewer
21 packets.
22
23 Packet loss is identified mainly in two ways:
24
25 1. Three duplicate acknowledgments, which will result in either
26 [Fast](https://datatracker.ietf.org/doc/html/rfc2001#section-4) or
27 [SACK](https://datatracker.ietf.org/doc/html/rfc6675) recovery. The
28 congestion window is reduced depending on the type of congestion control
29 algorithm. For example, in the
30 [Reno](https://en.wikipedia.org/wiki/TCP_congestion_control#TCP_Tahoe_and_Reno)
31 algorithm it is reduced to half.
32 2. RTO (Retransmission Timeout) which will result in Timeout recovery. The
33 congestion window is reduced to one
34 [MSS](https://en.wikipedia.org/wiki/Maximum_segment_size).
35
36 Both of these cases result in reducing the congestion window, with RTO being
37 more expensive. Most of the existing algorithms do not detect packet reordering,
38 which get incorrectly identified as packet loss, resulting in an RTO.
39 Furthermore, the loss of an ACK at the end of a sequence (known as "tail loss")
40 will also trigger RTO and slow down future transmissions unnecessarily. RACK
41 helps us to identify loss accurately in all these scenarios, and will avoid
42 entering RTO.
43
44 ## Implementation of RACK
45
46 Implementation of RACK requires support for:
47
48 1. Per-packet transmission timestamps: RACK detects loss depending on the
49 transmission times of the packet and the timestamp at which ACK was
50 received.
51 2. SACK and ability to detect DSACK: Selective Acknowledgement and Duplicate
52 SACK are used to adjust the timer window after which a packet can be marked
53 as lost.
54
55 ### Packet Reordering
56
57 Packet reordering commonly occurs when different packets take different paths
58 through a network. The diagram below shows the transmission of four packets
59 which get reordered in transmission, and the resulting TCP behavior with and
60 without RACK.
61
62 ![Figure 1](/assets/images/2021-08-31-rack-figure1.png "Packet reordering.")
63
64 In the above example, the sender sees three duplicate acknowledgments. Without
65 RACK, this is identified falsely as packet loss, and the congestion window will
66 be reduced after entering Fast/SACK recovery.
67
68 To detect packet reordering, RACK uses a reorder window, bounded between
69 [[RTT](https://en.wikipedia.org/wiki/Round-trip_delay)/4, RTT]. The reorder
70 timer is set to expire after _RTT+reorder\_window_. A packet is marked as lost
71 when the packets following it were acknowledged using SACK and the reorder timer
72 expires. The reorder window is increased when a DSACK is received (which
73 indicates that there is a higher degree of reordering).
74
75 ### Tail Loss
76
77 Tail loss occurs when the packets are lost at the end of data transmission. The
78 diagram below shows an example of tail loss when the last three packets are
79 lost, and how it is handled with and without RACK.
80
81 ![Figure 2](/assets/images/2021-08-31-rack-figure2.png "Tail loss figure 2.")
82
83 For tail losses, RACK uses a Tail Loss Probe (TLP), which relies on a timer for
84 the last packet sent. The TLP timer is set to _2 \* RTT,_ after which a probe is
85 sent. The probe packet will allow the connection one more chance to detect a
86 loss by triggering ACK feedback to avoid entering RTO. In the above example, the
87 loss is recovered without entering the RTO.
88
89 TLP will also help in cases where the ACK was lost but all the packets were
90 received by the receiver. The below diagram shows that the ACK received for the
91 probe packet avoided the RTO.
92
93 ![Figure 3](/assets/images/2021-08-31-rack-figure3.png "Tail loss figure 3.")
94
95 If there was some loss, then the ACK for the probe packet will have the SACK
96 blocks, which will be used to detect and retransmit the lost packets.
97
98 In gVisor, we have support for
99 [NewReno](https://datatracker.ietf.org/doc/html/rfc6582) and SACK loss recovery
100 methods. We
101 [added support for RACK](https://github.com/google/gvisor/issues/5243) recently,
102 and it is the default when SACK is enabled. After enabling RACK, our internal
103 benchmarks in the presence of reordering and tail losses and the data we took
104 from internal users inside Google have shown ~50% reduction in the number of
105 RTOs.
106
107 While RACK has improved one aspect of TCP performance by reducing the timeouts
108 in the presence of reordering and tail losses, in gVisor we plan to implement
109 the undoing of congestion windows and
110 [BBRv2](https://datatracker.ietf.org/doc/html/draft-cardwell-iccrg-bbr-congestion-control)
111 (once there is an RFC available) to further improve TCP performance in less
112 ideal network conditions.
113
114 If you haven’t already, try gVisor. The instructions to get started are in our
115 [Quick Start](https://gvisor.dev/docs/user_guide/quick_start/docker/). You can
116 also get involved with the gVisor community via our
117 [Gitter channel](https://gitter.im/gvisor/community),
118 [email list](https://groups.google.com/forum/#!forum/gvisor-users),
119 [issue tracker](https://gvisor.dev/issue/new), and
120 [Github repository](https://github.com/google/gvisor).