k8s.io/apiserver@v0.31.1/pkg/util/flowcontrol/fairqueuing/queueset/doc.go (about) 1 /* 2 Copyright 2019 The Kubernetes Authors. 3 4 Licensed under the Apache License, Version 2.0 (the "License"); 5 you may not use this file except in compliance with the License. 6 You may obtain a copy of the License at 7 8 http://www.apache.org/licenses/LICENSE-2.0 9 10 Unless required by applicable law or agreed to in writing, software 11 distributed under the License is distributed on an "AS IS" BASIS, 12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 See the License for the specific language governing permissions and 14 limitations under the License. 15 */ 16 17 // Package queueset implements a technique called "fair queuing for 18 // server requests". One QueueSet is a set of queues operating 19 // according to this technique. 20 // 21 // Fair queuing for server requests is inspired by the fair queuing 22 // technique from the world of networking. You can find a good paper 23 // on that at https://dl.acm.org/citation.cfm?doid=75247.75248 or 24 // http://people.csail.mit.edu/imcgraw/links/research/pubs/networks/WFQ.pdf 25 // and there is an implementation outline in the Wikipedia article at 26 // https://en.wikipedia.org/wiki/Fair_queuing . 27 // 28 // Fair queuing for server requests differs from traditional fair 29 // queuing in three ways: (1) we are dispatching application layer 30 // requests to a server rather than transmitting packets on a network 31 // link, (2) multiple requests can be executing at once, and (3) the 32 // service time (execution duration) is not known until the execution 33 // completes. 34 // 35 // The first two differences can easily be handled by straightforward 36 // adaptation of the concept called "R(t)" in the original paper and 37 // "virtual time" in the implementation outline. In that 38 // implementation outline, the notation now() is used to mean reading 39 // the virtual clock. In the original paper’s terms, "R(t)" is the 40 // number of "rounds" that have been completed at real time t --- 41 // where a round consists of virtually transmitting one bit from every 42 // non-empty queue in the router (regardless of which queue holds the 43 // packet that is really being transmitted at the moment); in this 44 // conception, a packet is considered to be "in" its queue until the 45 // packet’s transmission is finished. For our problem, we can define a 46 // round to be giving one nanosecond of CPU to every non-empty queue 47 // in the apiserver (where emptiness is judged based on both queued 48 // and executing requests from that queue), and define R(t) = (server 49 // start time) + (1 ns) * (number of rounds since server start). Let 50 // us write NEQ(t) for that number of non-empty queues in the 51 // apiserver at time t. Let us also write C for the concurrency 52 // limit. In the original paper, the partial derivative of R(t) with 53 // respect to t is 54 // 55 // 1 / NEQ(t) . 56 // 57 // To generalize from transmitting one packet at a time to executing C 58 // requests at a time, that derivative becomes 59 // 60 // C / NEQ(t) . 61 // 62 // However, sometimes there are fewer than C requests available to 63 // execute. For a given queue "q", let us also write "reqs(q, t)" for 64 // the number of requests of that queue that are executing at that 65 // time. The total number of requests executing is sum[over q] 66 // reqs(q, t) and if that is less than C then virtual time is not 67 // advancing as fast as it would if all C seats were occupied; in this 68 // case the numerator of the quotient in that derivative should be 69 // adjusted proportionally. Putting it all together for fair queing 70 // for server requests: at a particular time t, the partial derivative 71 // of R(t) with respect to t is 72 // 73 // min( C, sum[over q] reqs(q, t) ) / NEQ(t) . 74 // 75 // In terms of the implementation outline, this is the rate at which 76 // virtual time is advancing at time t (in virtual nanoseconds per 77 // real nanosecond). Where the networking implementation outline adds 78 // packet size to a virtual time, in our version this corresponds to 79 // adding a service time (i.e., duration) to virtual time. 80 // 81 // The third difference is handled by modifying the algorithm to 82 // dispatch based on an initial guess at the request’s service time 83 // (duration) and then make the corresponding adjustments once the 84 // request’s actual service time is known. This is similar, although 85 // not exactly isomorphic, to the original paper’s adjustment by 86 // `$\delta$` for the sake of promptness. 87 // 88 // For implementation simplicity (see below), let us use the same 89 // initial service time guess for every request; call that duration 90 // G. A good choice might be the service time limit (1 91 // minute). Different guesses will give slightly different dynamics, 92 // but any positive number can be used for G without ruining the 93 // long-term behavior. 94 // 95 // As in ordinary fair queuing, there is a bound on divergence from 96 // the ideal. In plain fair queuing the bound is one packet; in our 97 // version it is C requests. 98 // 99 // To support efficiently making the necessary adjustments once a 100 // request’s actual service time is known, the virtual finish time of 101 // a request and the last virtual finish time of a queue are not 102 // represented directly but instead computed from queue length, 103 // request position in the queue, and an alternate state variable that 104 // holds the queue’s virtual start time. While the queue is empty and 105 // has no requests executing: the value of its virtual start time 106 // variable is ignored and its last virtual finish time is considered 107 // to be in the virtual past. When a request arrives to an empty queue 108 // with no requests executing, the queue’s virtual start time is set 109 // to the current virtual time. The virtual finish time of request 110 // number J in the queue (counting from J=1 for the head) is J * G + 111 // (queue's virtual start time). While the queue is non-empty: the 112 // last virtual finish time of the queue is the virtual finish time of 113 // the last request in the queue. While the queue is empty and has a 114 // request executing: the last virtual finish time is the queue’s 115 // virtual start time. When a request is dequeued for service the 116 // queue’s virtual start time is advanced by G. When a request 117 // finishes being served, and the actual service time was S, the 118 // queue’s virtual start time is decremented by G - S. 119 package queueset // import "k8s.io/apiserver/pkg/util/flowcontrol/fairqueuing/queueset"