sigs.k8s.io/gateway-api@v1.0.0/geps/gep-1742.md

sigs.k8s.io/gateway-api@v1.0.0/geps/gep-1742.md (about)

     1  # GEP-1742: HTTPRoute Timeouts
     2  
     3  * Issue: [#1742](https://github.com/kubernetes-sigs/gateway-api/issues/1742)
     4  * Status: Experimental
     5  
     6  (See status definitions [here](overview.md#status).)
     7  
     8  ## TLDR
     9  
    10  Create some sort of design so that Gateway API objects can be used to configure
    11  timeouts for different types of connection.
    12  
    13  ## Goals
    14  
    15  - Create some method to configure some timeouts.
    16  - Timeout config must be applicable to most if not all Gateway API implementations.
    17  
    18  ## Non-Goals
    19  
    20  - A standard API for every possible timeout that implementations may support.
    21  
    22  ## Introduction
    23  
    24  In talking about Gateway API objects, particularly HTTPRoute, we've mentioned
    25  timeout configuration many times in the past as "too hard" to find the common
    26  ground necessary to make more generic configuration. This GEP intends firstly
    27  to make this process less difficult, then to find common timeouts that we can
    28  build into Gateway API.
    29  
    30  For this initial round, we'll focus on Layer 7 HTTP traffic, while acknowledging
    31  that Layer 4 connections have their own interesting timeouts as well.
    32  
    33  The following sections will review all the implementations, then document what
    34  timeouts are _available_ for the various data planes.
    35  
    36  ### Background on implementations
    37  
    38  Most implementations that handle HTTPRoute objects use a proxy as the data plane
    39  implementation, that actually forwards flows as directed by Gateway API configuration.
    40  
    41  The following table is a review of all the listed implementations of Gateway API
    42  at the time of writing, with the data plane they use for Layer 7, based on what information
    43  could be found online. If there are errors here, or if the implementation doesn't
    44  support layer 7, please feel free to correct them.
    45  
    46  | Implementation | Data Plane       |
    47  |----------------|------------|
    48  | Acnodal EPIC   | Envoy      |
    49  | Apache APISIX  | Nginx      |
    50  | BIG-IP Kubernetes Gateway| F5 BIG-IP  |
    51  | Cilium         | Envoy      |
    52  | Contour        | Envoy      |
    53  | Emissary Ingress| Envoy     |
    54  | Envoy Gateway  | Envoy      |
    55  | Flomesh Service Mesh | Pipy |
    56  | Gloo Edge      | Envoy      |
    57  | Google Kubernetes Engine (GKE) | Similar to Envoy Timeouts |
    58  | HAProxy Ingress | HAProxy   |
    59  | Hashicorp Consul | Envoy    |
    60  | Istio          | Envoy      |
    61  | Kong           | Nginx      |
    62  | Kuma           | Envoy      |
    63  | Litespeed      | Litespeed WebADC |
    64  | NGINX Gateway Fabric | Nginx |
    65  | Traefik        | Traefik    |
    66  
    67  
    68  ### Flow diagrams with available timeouts
    69  
    70  The following flow diagrams are based off the basic diagram below, with all the
    71  timeouts I could find included.
    72  
    73  In general, timeouts are recorded with the setting name or similar that the data
    74  plane uses for them, and are correct as far as I've parsed the documentation
    75  correctly.
    76  
    77  Idle timeouts are marked as such.
    78  
    79  ```mermaid
    80  sequenceDiagram
    81      participant C as Client
    82      participant P as Proxy
    83      participant U as Upstream
    84      C->>P: Connection Started
    85      C->>P: Starts sending Request
    86      C->>P: Finishes Headers
    87      C->>P: Finishes request
    88      P->>U: Connection Started
    89      P->>U: Starts sending Request
    90      P->>U: Finishes request
    91      P->>U: Finishes Headers
    92      U->>P: Starts Response
    93      U->>P: Finishes Headers
    94      U->>P: Finishes Response
    95      P->>C: Starts Response
    96      P->>C: Finishes Headers
    97      P->>C: Finishes Response
    98      Note right of P: Repeat if connection sharing
    99      U->>C: Connection ended
   100  ```
   101  
   102  #### Envoy Timeouts
   103  
   104  For Envoy, some timeouts are configurable at either the HTTP Connection Manager
   105  (very, very roughly equivalent to a Listener), the Route (equivalent to a HTTPRoute)
   106  level, or the Cluster (usually close to the Service) or some combination. These
   107  are noted in the below diagram with a `CM`, `R`, or `Cluster` prefix respectively.
   108  
   109  ```mermaid
   110  sequenceDiagram
   111      participant C as Client
   112      participant P as Envoy
   113      participant U as Upstream
   114      C->>P: Connection Started
   115      activate P
   116      Note left of P: transport_socket_connect_timeout for TLS
   117      deactivate P
   118      C->>P: Starts sending Request
   119      activate C
   120      activate P
   121      activate P
   122      C->>P: Finishes Headers
   123      note left of P: CM request_headers_timeout
   124  		C->>P: Finishes request
   125      deactivate P
   126      activate U
   127      note left of U: Cluster connect_timeout
   128      deactivate U
   129      P->>U: Connection Started
   130  		activate U
   131      note right of U: CM idle_timeout<br />CM max_connection_duration
   132      P->>U: Starts sending Request
   133      P->>U: Finishes Headers
   134      note left of P: CM request_timeout
   135  		P->>U: Finishes request
   136      deactivate P
   137      activate U
   138      U->>P: Starts Response
   139      U->>P: Finishes Headers
   140  		note right of U: R timeout<br/>R per_try_timeout<br/>R per_try_idle_timeout
   141      U->>P: Finishes Response
   142      deactivate U
   143      P->>C: Starts Response
   144      P->>C: Finishes Headers
   145      P->>C: Finishes Response
   146      Note left of C: CM stream_idle_timeout<br />R idle_timeout<br />CM,R max_stream_duration<br/>TCP proxy idle_timeout<br />TCP protocol idle_timeout
   147      deactivate C
   148      Note right of P: Repeat if connection sharing
   149      U->>C: Connection ended
   150      deactivate U
   151  ```
   152  
   153  #### Nginx timeouts
   154  
   155  Nginx allows setting of GRPC and general HTTP timeouts separately, although the
   156  purposes seem to be roughly equivalent.
   157  
   158  ```mermaid
   159  sequenceDiagram
   160      participant C as Client
   161      participant P as Nginx
   162      participant U as Upstream
   163      C->>P: Connection Started
   164      activate P
   165      C->>P: Starts sending Request
   166      C->>P: Finishes Headers
   167      Note right of P: client_headers_timeout
   168      deactivate P
   169      activate P
   170      C->>P: Finishes request
   171      deactivate P
   172      Note right of P: client_body_timeout
   173      activate U
   174      note left of U: proxy_connect_timeout<br/>grpc_connect_timeout
   175      deactivate U
   176      P->>U: Connection Started
   177      Activate U
   178  	  Activate U
   179      P->>U: Starts sending Request
   180      P->>U: Finishes Headers
   181  		P->>U: Finishes request
   182      Note right of U: (between write operations)<br/>proxy_send_timeout<br/>grpc_send_timeout
   183      deactivate U
   184  		activate U
   185      U->>P: Starts Response
   186      U->>P: Finishes Headers
   187          Note right of U: (between read operations)<br/>proxy_read_timeout<br/>grpc_read_timeout
   188      U->>P: Finishes Response
   189      deactivate U
   190      activate P
   191      P->>C: Starts Response
   192      P->>C: Finishes Headers
   193      P->>C: Finishes Response
   194      deactivate P
   195      Note left of P: send_timeout (only between two successive write operations)
   196      Note left of C: Repeat if connection is shared until server's keepalive_timeout is hit
   197      Note Right of U: upstream's keepalive_timeout (if keepalive enabled)
   198      U->>C: Connection ended
   199  		deactivate U
   200  ```
   201  
   202  #### HAProxy timeouts
   203  
   204  ```mermaid
   205  sequenceDiagram
   206      participant C as Client
   207      participant P as Proxy
   208      participant U as Upstream
   209  
   210      C->>P: Connection Started
   211      activate U
   212      activate C
   213      activate P
   214      note left of P: timeout client (idle)
   215      C->>P: Starts sending Request
   216      C->>P: Finishes Headers
   217      C->>P: Finishes request
   218      note left of C: timeout http-request
   219      deactivate C
   220  			activate C
   221      note left of C: timeout client-fin
   222      deactivate C
   223  		deactivate P
   224      activate U
   225      note left of U: timeout queue<br/>(wait for available server)
   226      deactivate U
   227  
   228      P->>U: Connection Started
   229      activate U
   230      P->>U: Starts sending Request
   231      activate U
   232      P->>U: Finishes Headers
   233      P->>U: Finishes request
   234  
   235      note right of U: timeout connect
   236      deactivate U
   237      note left of U: timeout server<br/>(idle timeout)
   238      deactivate U
   239      activate U
   240      note left of U: timeout server-fin
   241      deactivate U
   242      U->>P: Starts Response
   243      U->>P: Finishes Headers
   244      U->>P: Finishes Response
   245      P->>C: Starts Response
   246      P->>C: Finishes Headers
   247      P->>C: Finishes Response
   248      activate C
   249      note left of C: timeout http-keep-alive
   250      deactivate C
   251      Note right of P: Repeat if connection sharing
   252      Note right of U: timeout tunnel<br/>(for upgraded connections)
   253      deactivate U
   254      U->>C: Connection ended
   255  
   256  ```
   257  
   258  #### Traefik timeouts
   259  
   260  ```mermaid
   261  sequenceDiagram
   262      participant C as Client
   263      participant P as Proxy
   264      participant U as Upstream
   265      C->>P: Connection Started
   266      activate U
   267      C->>P: Starts sending Request
   268      activate P
   269      C->>P: Finishes Headers
   270      Note right of P: respondingTimeouts<br/>readTimeout
   271      C->>P: Finishes request
   272      deactivate P
   273      P->>U: Connection Started
   274      activate U
   275      Note right of U: forwardingTimeouts<br/>dialTimeout
   276      deactivate U
   277      P->>U: Starts sending Request
   278      P->>U: Finishes request
   279      P->>U: Finishes Headers
   280      U->>P: Starts Response
   281      activate U
   282      note right of U: forwardingTimeouts<br/>responseHeaderTimeout
   283      U->>P: Finishes Headers
   284      deactivate U
   285      U->>P: Finishes Response
   286      P->>C: Starts Response
   287      activate P
   288      P->>C: Finishes Headers
   289      Note right of P: respondingTimeouts<br/>writeTimeout
   290      P->>C: Finishes Response
   291      deactivate P
   292      Note right of P: Repeat if connection sharing
   293      Note right of U: respondingTimeouts<br/>idleTimeout<br/>Keepalive connections only
   294      deactivate U
   295      U->>C: Connection ended
   296  
   297  ```
   298  #### F5 BIG-IP Timeouts
   299  
   300  Could not find any HTTP specific timeouts. PRs welcomed. 😊
   301  
   302  #### Pipy Timeouts
   303  
   304  Could not find any HTTP specific timeouts. PRs welcomed. 😊
   305  
   306  #### Litespeed WebADC Timeouts
   307  
   308  Could not find any HTTP specific timeouts. PRs welcomed. 😊
   309  
   310  ## API
   311  
   312  The above diagrams show that there are many different kinds of configurable timeouts
   313  supported by Gateway implementations: connect, idle, request, upstream, downstream.
   314  Although there may be opportunity for the specification of a common API for more of
   315  them in the future, this GEP will focus on the L7 timeouts in HTTPRoutes that are
   316  most valuable to clients.
   317  
   318  From the above analysis, it appears that most implementations are capable of
   319  supporting the configuration of simple client downstream request timeouts on HTTPRoute
   320  rules. This is a relatively small addition that would benefit many users.
   321  
   322  Some implementations support configuring a timeout for individual backend requests,
   323  separate from the overall client request timeout. This is particularly useful if a
   324  client HTTP request to a gateway can result in more than one call from the gateway
   325  to the destination backend service, for example, if automatic retries are supported.
   326  Adding support for this would also benefit many users.
   327  
   328  ### Timeout values
   329  
   330  There are 2 kinds of timeouts that can be configured in an `HTTPRouteRule`:
   331  
   332  1. `timeouts.request` is the timeout for the Gateway API implementation to send a
   333      response to a client HTTP request. Whether the gateway starts the timeout before
   334      or after the entire client request stream has been received, is implementation dependent.
   335      This field is optional `Extended` support.
   336  
   337  1. `timeouts.backendRequest` is a timeout for a single request from the gateway to a backend.
   338      This field is optional `Extended` support. Typically used in conjunction with retry configuration,
   339      if supported by an implementation.
   340      Note that retry configuration will be the subject of a separate GEP (GEP-1731).
   341  
   342  ```mermaid
   343  sequenceDiagram
   344      participant C as Client
   345      participant P as Proxy
   346      participant U as Upstream
   347      C->>P: Connection Started
   348      note left of P: timeouts.request start time (min)
   349      C->>P: Starts sending Request
   350      C->>P: Finishes Headers
   351      C->>P: Finishes request
   352      note left of P: timeouts.request start time (max)
   353      P->>U: Connection Started
   354      note right of P: timeouts.backendRequest start time
   355      P->>U: Starts sending Request
   356      P->>U: Finishes request
   357      P->>U: Finishes Headers
   358      U->>P: Starts Response
   359      U->>P: Finishes Headers
   360      note right of P: timeouts.backendRequest end time
   361      note left of P: timeouts.request end time
   362      U->>P: Finishes Response
   363      note right of P: Repeat if retry
   364      P->>C: Starts Response
   365      P->>C: Finishes Headers
   366      P->>C: Finishes Response
   367      Note right of P: Repeat if connection sharing
   368      U->>C: Connection ended
   369  ```
   370  
   371  Both timeout fields are [GEP-2257 Duration] values. A zero-valued timeout
   372  ("0s") MUST be interpreted as disabling the timeout; a non-zero-valued timeout
   373  MUST be >= 1ms.
   374  
   375  [GEP-2257 Duration]:/geps/gep-2257/
   376  
   377  ### GO
   378  
   379  ```go
   380  type HTTPRouteRule struct {
   381  	// Timeouts defines the timeouts that can be configured for an HTTP request.
   382  	//
   383  	// Support: Extended
   384  	//
   385  	// +optional
   386  	// <gateway:experimental>
   387  	Timeouts *HTTPRouteTimeouts `json:"timeouts,omitempty"`
   388  
   389  	// ...
   390  }
   391  
   392  // HTTPRouteTimeouts defines timeouts that can be configured for an HTTPRoute.
   393  // Timeout values are represented with Gateway API Duration formatting.
   394  // Specifying a zero value such as "0s" is interpreted as no timeout.
   395  //
   396  // +kubebuilder:validation:XValidation:message="backendRequest timeout cannot be longer than request timeout",rule="!(has(self.request) && has(self.backendRequest) && duration(self.request) != duration('0s') && duration(self.backendRequest) > duration(self.request))"
   397  type HTTPRouteTimeouts struct {
   398  	// Request specifies the maximum duration for a gateway to respond to an HTTP request.
   399  	// If the gateway has not been able to respond before this deadline is met, the gateway
   400  	// MUST return a timeout error.
   401  	//
   402  	// For example, setting the `rules.timeouts.request` field to the value `10s` in an
   403  	// `HTTPRoute` will cause a timeout if a client request is taking longer than 10 seconds
   404  	// to complete.
   405  	//
   406  	// This timeout is intended to cover as close to the whole request-response transaction
   407  	// as possible although an implementation MAY choose to start the timeout after the entire
   408  	// request stream has been received instead of immediately after the transaction is
   409  	// initiated by the client.
   410  	//
   411  	// When this field is unspecified, request timeout behavior is implementation-specific.
   412  	//
   413  	// Support: Extended
   414  	//
   415  	// +optional
   416  	Request *Duration `json:"request,omitempty"`
   417  
   418  	// BackendRequest specifies a timeout for an individual request from the gateway
   419  	// to a backend. This covers the time from when the request first starts being
   420  	// sent from the gateway to when the full response has been received from the backend.
   421  	//
   422  	// An entire client HTTP transaction with a gateway, covered by the Request timeout,
   423  	// may result in more than one call from the gateway to the destination backend,
   424  	// for example, if automatic retries are supported.
   425  	//
   426  	// Because the Request timeout encompasses the BackendRequest timeout, the value of
   427  	// BackendRequest must be <= the value of Request timeout.
   428  	//
   429  	// Support: Extended
   430  	//
   431  	// +optional
   432  	BackendRequest *Duration `json:"backendRequest,omitempty"`
   433  }
   434  
   435  // Duration is a string value representing a duration in time. The foramat is as specified
   436  // in GEP-2257, a strict subset of the syntax parsed by Golang time.ParseDuration.
   437  //
   438  // +kubebuilder:validation:Pattern=`^([0-9]{1,5}(h|m|s|ms)){1,4}$`
   439  type Duration string
   440  ```
   441  
   442  ### YAML
   443  
   444  ```yaml
   445  apiVersion: gateway.networking.k8s.io/v1beta1
   446  kind: HTTPRoute
   447  metadata:
   448    name: timeout-example
   449  spec:
   450    ...
   451    rules:
   452    - backendRefs:
   453      - name: some-service
   454        port: 8080
   455      timeouts:
   456        request: 10s
   457        backendRequest: 2s
   458  ```
   459  
   460  ### Conformance Details
   461  
   462  Gateway implementations can indicate support for the optional behavior in this GEP using
   463  the following feature names:
   464  
   465  - `HTTPRouteRequestTimeout`: supports `rules.timeouts.request` in an `HTTPRoute`.
   466  - `HTTPRouteBackendTimeout`: supports `rules.timeouts.backendRequest` in an `HTTPRoute`.
   467  
   468  ## Alternatives
   469  
   470  Timeouts could be configured using policy attachments or in objects other than `HTTPRouteRule`.
   471  
   472  ### Policy Attachment
   473  
   474  Instead of configuring timeouts directly on an API object, they could be configured using policy
   475  attachments. The advantage to this approach would be that timeout policies can be not only
   476  configured for an `HTTPRouteRule`, but can also be added/overriden at a more fine
   477  (e.g., `HTTPBackendRef`) or course (e.g. `HTTPRoute`) level of granularity.
   478  
   479  The downside, however, is complexity introduced for the most common use case, adding a simple
   480  timeout for an HTTP request. Setting a single field in the route rule, instead of needing to
   481  create a policy resource, for this simple case seems much better.
   482  
   483  In the future, we could consider using policy attachments to configure less common kinds of
   484  timeouts that may be needed, but it would probably be better to instead extend the proposed API
   485  to support those timeouts as well.
   486  
   487  The default values of the proposed timeout fields could also be overridden
   488  using policy attachments in the future. For example, a policy attachment could be used to set the
   489  default value of `rules.timeouts.request` for all routes under an `HTTPRoute` or `Gateway`.
   490  
   491  ### Other API Objects
   492  
   493  The new timeouts field could be added to a different API struct, instead of `HTTPRouteRule`.
   494  
   495  Putting it on an `HTTPBackendRef`, for example, would allow users to set different timeouts for different
   496  backends. This is a feature that we believe has not been requested by existing proxy or service mesh
   497  clients and is also not implementable using available timeouts of most proxies.
   498  
   499  Another alternative is to move the timeouts configuration up a level in the API to `HTTPRoute`. This
   500  would be convenient when a user wants the same timeout on all rules, but would be overly restrictive.
   501  Using policy attachments to override the default timeout value for all rules, as described in the
   502  previous section, is likely a better way to handle timeout configuration above the route rule level.
   503  
   504  ## References
   505  
   506  (Add any additional document links. Again, we should try to avoid
   507  too much content not in version control to avoid broken links)