k8s.io/apiserver@v0.31.1/pkg/util/flowcontrol/fairqueuing/queueset/doc.go (about)

     1  /*
     2  Copyright 2019 The Kubernetes Authors.
     3  
     4  Licensed under the Apache License, Version 2.0 (the "License");
     5  you may not use this file except in compliance with the License.
     6  You may obtain a copy of the License at
     7  
     8      http://www.apache.org/licenses/LICENSE-2.0
     9  
    10  Unless required by applicable law or agreed to in writing, software
    11  distributed under the License is distributed on an "AS IS" BASIS,
    12  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    13  See the License for the specific language governing permissions and
    14  limitations under the License.
    15  */
    16  
    17  // Package queueset implements a technique called "fair queuing for
    18  // server requests".  One QueueSet is a set of queues operating
    19  // according to this technique.
    20  //
    21  // Fair queuing for server requests is inspired by the fair queuing
    22  // technique from the world of networking.  You can find a good paper
    23  // on that at https://dl.acm.org/citation.cfm?doid=75247.75248 or
    24  // http://people.csail.mit.edu/imcgraw/links/research/pubs/networks/WFQ.pdf
    25  // and there is an implementation outline in the Wikipedia article at
    26  // https://en.wikipedia.org/wiki/Fair_queuing .
    27  //
    28  // Fair queuing for server requests differs from traditional fair
    29  // queuing in three ways: (1) we are dispatching application layer
    30  // requests to a server rather than transmitting packets on a network
    31  // link, (2) multiple requests can be executing at once, and (3) the
    32  // service time (execution duration) is not known until the execution
    33  // completes.
    34  //
    35  // The first two differences can easily be handled by straightforward
    36  // adaptation of the concept called "R(t)" in the original paper and
    37  // "virtual time" in the implementation outline. In that
    38  // implementation outline, the notation now() is used to mean reading
    39  // the virtual clock. In the original paper’s terms, "R(t)" is the
    40  // number of "rounds" that have been completed at real time t ---
    41  // where a round consists of virtually transmitting one bit from every
    42  // non-empty queue in the router (regardless of which queue holds the
    43  // packet that is really being transmitted at the moment); in this
    44  // conception, a packet is considered to be "in" its queue until the
    45  // packet’s transmission is finished. For our problem, we can define a
    46  // round to be giving one nanosecond of CPU to every non-empty queue
    47  // in the apiserver (where emptiness is judged based on both queued
    48  // and executing requests from that queue), and define R(t) = (server
    49  // start time) + (1 ns) * (number of rounds since server start). Let
    50  // us write NEQ(t) for that number of non-empty queues in the
    51  // apiserver at time t.  Let us also write C for the concurrency
    52  // limit.  In the original paper, the partial derivative of R(t) with
    53  // respect to t is
    54  //
    55  //	1 / NEQ(t) .
    56  //
    57  // To generalize from transmitting one packet at a time to executing C
    58  // requests at a time, that derivative becomes
    59  //
    60  //	C / NEQ(t) .
    61  //
    62  // However, sometimes there are fewer than C requests available to
    63  // execute.  For a given queue "q", let us also write "reqs(q, t)" for
    64  // the number of requests of that queue that are executing at that
    65  // time.  The total number of requests executing is sum[over q]
    66  // reqs(q, t) and if that is less than C then virtual time is not
    67  // advancing as fast as it would if all C seats were occupied; in this
    68  // case the numerator of the quotient in that derivative should be
    69  // adjusted proportionally.  Putting it all together for fair queing
    70  // for server requests: at a particular time t, the partial derivative
    71  // of R(t) with respect to t is
    72  //
    73  //	min( C, sum[over q] reqs(q, t) ) / NEQ(t) .
    74  //
    75  // In terms of the implementation outline, this is the rate at which
    76  // virtual time is advancing at time t (in virtual nanoseconds per
    77  // real nanosecond). Where the networking implementation outline adds
    78  // packet size to a virtual time, in our version this corresponds to
    79  // adding a service time (i.e., duration) to virtual time.
    80  //
    81  // The third difference is handled by modifying the algorithm to
    82  // dispatch based on an initial guess at the request’s service time
    83  // (duration) and then make the corresponding adjustments once the
    84  // request’s actual service time is known. This is similar, although
    85  // not exactly isomorphic, to the original paper’s adjustment by
    86  // `$\delta$` for the sake of promptness.
    87  //
    88  // For implementation simplicity (see below), let us use the same
    89  // initial service time guess for every request; call that duration
    90  // G. A good choice might be the service time limit (1
    91  // minute). Different guesses will give slightly different dynamics,
    92  // but any positive number can be used for G without ruining the
    93  // long-term behavior.
    94  //
    95  // As in ordinary fair queuing, there is a bound on divergence from
    96  // the ideal. In plain fair queuing the bound is one packet; in our
    97  // version it is C requests.
    98  //
    99  // To support efficiently making the necessary adjustments once a
   100  // request’s actual service time is known, the virtual finish time of
   101  // a request and the last virtual finish time of a queue are not
   102  // represented directly but instead computed from queue length,
   103  // request position in the queue, and an alternate state variable that
   104  // holds the queue’s virtual start time. While the queue is empty and
   105  // has no requests executing: the value of its virtual start time
   106  // variable is ignored and its last virtual finish time is considered
   107  // to be in the virtual past. When a request arrives to an empty queue
   108  // with no requests executing, the queue’s virtual start time is set
   109  // to the current virtual time. The virtual finish time of request
   110  // number J in the queue (counting from J=1 for the head) is J * G +
   111  // (queue's virtual start time). While the queue is non-empty: the
   112  // last virtual finish time of the queue is the virtual finish time of
   113  // the last request in the queue. While the queue is empty and has a
   114  // request executing: the last virtual finish time is the queue’s
   115  // virtual start time. When a request is dequeued for service the
   116  // queue’s virtual start time is advanced by G. When a request
   117  // finishes being served, and the actual service time was S, the
   118  // queue’s virtual start time is decremented by G - S.
   119  package queueset // import "k8s.io/apiserver/pkg/util/flowcontrol/fairqueuing/queueset"