gvisor.dev/gvisor@v0.0.0-20240520182842-f9d4d51c7e0f/website/blog/2021-08-31-gvisor-rack.md (about)

     1  # gVisor RACK
     2  
     3  gVisor has implemented the [RACK](https://datatracker.ietf.org/doc/html/rfc8985)
     4  (Recent ACKnowledgement) TCP loss-detection algorithm in our network stack,
     5  which improves throughput in the presence of packet loss and reordering.
     6  
     7  TCP is a connection-oriented protocol that detects and recovers from loss by
     8  retransmitting packets. [RACK](https://datatracker.ietf.org/doc/html/rfc8985) is
     9  one of the recent loss-detection methods implemented in Linux and BSD, which
    10  helps in identifying packet loss quickly and accurately in the presence of
    11  packet reordering and tail losses.
    12  
    13  ## Background
    14  
    15  The TCP congestion window indicates the number of unacknowledged packets that
    16  can be sent at any time. When packet loss is identified, the congestion window
    17  is reduced depending on the type of loss. The sender will recover from the loss
    18  after all the packets sent before reducing the congestion window are
    19  acknowledged. If the loss is identified falsely by the connection, then the
    20  connection enters loss recovery unnecessarily, resulting in sending fewer
    21  packets.
    22  
    23  Packet loss is identified mainly in two ways:
    24  
    25  1.  Three duplicate acknowledgments, which will result in either
    26      [Fast](https://datatracker.ietf.org/doc/html/rfc2001#section-4) or
    27      [SACK](https://datatracker.ietf.org/doc/html/rfc6675) recovery. The
    28      congestion window is reduced depending on the type of congestion control
    29      algorithm. For example, in the
    30      [Reno](https://en.wikipedia.org/wiki/TCP_congestion_control#TCP_Tahoe_and_Reno)
    31      algorithm it is reduced to half.
    32  2.  RTO (Retransmission Timeout) which will result in Timeout recovery. The
    33      congestion window is reduced to one
    34      [MSS](https://en.wikipedia.org/wiki/Maximum_segment_size).
    35  
    36  Both of these cases result in reducing the congestion window, with RTO being
    37  more expensive. Most of the existing algorithms do not detect packet reordering,
    38  which get incorrectly identified as packet loss, resulting in an RTO.
    39  Furthermore, the loss of an ACK at the end of a sequence (known as "tail loss")
    40  will also trigger RTO and slow down future transmissions unnecessarily. RACK
    41  helps us to identify loss accurately in all these scenarios, and will avoid
    42  entering RTO.
    43  
    44  ## Implementation of RACK
    45  
    46  Implementation of RACK requires support for:
    47  
    48  1.  Per-packet transmission timestamps: RACK detects loss depending on the
    49      transmission times of the packet and the timestamp at which ACK was
    50      received.
    51  2.  SACK and ability to detect DSACK: Selective Acknowledgement and Duplicate
    52      SACK are used to adjust the timer window after which a packet can be marked
    53      as lost.
    54  
    55  ### Packet Reordering
    56  
    57  Packet reordering commonly occurs when different packets take different paths
    58  through a network. The diagram below shows the transmission of four packets
    59  which get reordered in transmission, and the resulting TCP behavior with and
    60  without RACK.
    61  
    62  ![Figure 1](/assets/images/2021-08-31-rack-figure1.png "Packet reordering.")
    63  
    64  In the above example, the sender sees three duplicate acknowledgments. Without
    65  RACK, this is identified falsely as packet loss, and the congestion window will
    66  be reduced after entering Fast/SACK recovery.
    67  
    68  To detect packet reordering, RACK uses a reorder window, bounded between
    69  [[RTT](https://en.wikipedia.org/wiki/Round-trip_delay)/4, RTT]. The reorder
    70  timer is set to expire after _RTT+reorder\_window_. A packet is marked as lost
    71  when the packets following it were acknowledged using SACK and the reorder timer
    72  expires. The reorder window is increased when a DSACK is received (which
    73  indicates that there is a higher degree of reordering).
    74  
    75  ### Tail Loss
    76  
    77  Tail loss occurs when the packets are lost at the end of data transmission. The
    78  diagram below shows an example of tail loss when the last three packets are
    79  lost, and how it is handled with and without RACK.
    80  
    81  ![Figure 2](/assets/images/2021-08-31-rack-figure2.png "Tail loss figure 2.")
    82  
    83  For tail losses, RACK uses a Tail Loss Probe (TLP), which relies on a timer for
    84  the last packet sent. The TLP timer is set to _2 \* RTT,_ after which a probe is
    85  sent. The probe packet will allow the connection one more chance to detect a
    86  loss by triggering ACK feedback to avoid entering RTO. In the above example, the
    87  loss is recovered without entering the RTO.
    88  
    89  TLP will also help in cases where the ACK was lost but all the packets were
    90  received by the receiver. The below diagram shows that the ACK received for the
    91  probe packet avoided the RTO.
    92  
    93  ![Figure 3](/assets/images/2021-08-31-rack-figure3.png "Tail loss figure 3.")
    94  
    95  If there was some loss, then the ACK for the probe packet will have the SACK
    96  blocks, which will be used to detect and retransmit the lost packets.
    97  
    98  In gVisor, we have support for
    99  [NewReno](https://datatracker.ietf.org/doc/html/rfc6582) and SACK loss recovery
   100  methods. We
   101  [added support for RACK](https://github.com/google/gvisor/issues/5243) recently,
   102  and it is the default when SACK is enabled. After enabling RACK, our internal
   103  benchmarks in the presence of reordering and tail losses and the data we took
   104  from internal users inside Google have shown ~50% reduction in the number of
   105  RTOs.
   106  
   107  While RACK has improved one aspect of TCP performance by reducing the timeouts
   108  in the presence of reordering and tail losses, in gVisor we plan to implement
   109  the undoing of congestion windows and
   110  [BBRv2](https://datatracker.ietf.org/doc/html/draft-cardwell-iccrg-bbr-congestion-control)
   111  (once there is an RFC available) to further improve TCP performance in less
   112  ideal network conditions.
   113  
   114  If you haven’t already, try gVisor. The instructions to get started are in our
   115  [Quick Start](https://gvisor.dev/docs/user_guide/quick_start/docker/). You can
   116  also get involved with the gVisor community via our
   117  [Gitter channel](https://gitter.im/gvisor/community),
   118  [email list](https://groups.google.com/forum/#!forum/gvisor-users),
   119  [issue tracker](https://gvisor.dev/issue/new), and
   120  [Github repository](https://github.com/google/gvisor).