github.com/bytedance/gopkg@v0.0.0-20240514070511-01b2cbcf35e1/cloud/circuitbreaker/README.MD (about) 1 # circuitbreaker 2 3 ## A brief introduction to circuit breaker 4 ### What circuit breaker does 5 When making RPC calls, downstream services inevitably fail; 6 7 When a downstream service fails, if the upstream continues to make calls to it, it hinders the recovery of the downstream and wastes the resources of the upstream; 8 9 To solve this problem, you can set some dynamic switches to manually shut down the downstream calls when the downstream fails; 10 11 However, a better solution is to use circuit breakers to automate this problem. 12 13 Here is a more detailed [introduction to circuit breaker](https://msdn.microsoft.com/zh-cn/library/dn589784.aspx). 14 15 One of the better known circuit breakers is hystrix, and here is its [design document](https://github.com/Netflix/Hystrix/wiki). 16 17 ### circuit breaker ideas 18 The idea of a circuit breaker is simple: restrict access to the downstream based on the success or failure of the RPC; 19 20 Usually there are three states: CLOSED, OPEN, HALFOPEN; 21 22 When the RPC is normal, it is CLOSED; 23 24 When the number of RPC failures increases, the circuit breaker is triggered and goes to OPEN; 25 26 After a certain cooling time after OPEN, the circuit breaker will become HALFOPEN; 27 28 HALFOPEN will do some strategic access to the downstream, and then decide whether to become CLOSED, or OPEN according to the result; 29 30 In summary, the three state transitions are roughly as follows: 31 32 <pre> 33 [CLOSED] -->- tripped ----> [OPEN]<-------+ 34 ^ | ^ 35 | v | 36 + | detect fail 37 | | | 38 | cooling timeout | 39 ^ | ^ 40 | v | 41 +--- detect succeed --<-[HALFOPEN]-->--+ 42 </pre> 43 44 ## Use of this package 45 46 ### Basic usage 47 This package divides the results of RPC calls into three categories: Succeed, Fail, Timeout, and maintains a count of all three within a certain time window; 48 49 Before each RPC, you should call IsAllowed() to decide whether to initiate the RPC; 50 51 and call Succeed(), Fail(), Timeout() for feedback after the call is completed, depending on the result; 52 53 The package also controls the number of concurrency, you must also call Done() after each RPC; 54 55 Here is an example: 56 <pre> 57 var p *Panel 58 59 func init() { 60 var err error 61 p, err = NewPanel(nil, Options{ 62 CoolingTimeout: time.Minute, 63 DetectTimeout: time.Minute, 64 ShouldTrip: ThresholdTripFunc(100), 65 }) 66 if err != nil { 67 panic(err) 68 } 69 } 70 71 func DoRPC() error { 72 key := "remote::rpc::method" 73 if p.IsAllowed(key) == false { 74 return Err("Not allowed by circuitbreaker") 75 } 76 77 err := doRPC() 78 if err == nil { 79 p.Succeed(key) 80 } else if IsFailErr(err) { 81 p.Fail(key) 82 } else if IsTimeout(err) { 83 p.Timeout(key) 84 } 85 return err 86 } 87 88 func main() { 89 ... 90 for ... { 91 DoRPC() 92 } 93 p.Close() 94 } 95 </pre> 96 97 ### circuit breaker Trigger strategies 98 This package provides three basic circuit breaker triggering strategies: 99 + Number of consecutive failures reaches threshold (ExecutiveTripFunc) 100 + Failure count reaches threshold (ThresholdTripFunc) 101 + Failure rate reaches threshold (RateTripFunc) 102 103 Of course, you can write your own circuit breaker triggering strategy by implementing the TripFunc function; 104 105 Circuit breaker will call TripFunc each time Fail or Timeout to decide whether to trigger the circuit breaker; 106 107 ### Circuit breaker cooling strategy 108 After entering the OPEN state, the circuit breaker will cool down for a period of time, the default is 10 seconds, but this parameter is configurable (CoolingTimeout); 109 110 During this period, all IsAllowed() requests will be returned false; 111 112 After cooling, HALFOPEN is entered; 113 114 ### Half-open strategy 115 During HALFOPEN, the circuit breaker will let a request go every "while", and after a "number" of consecutive successful requests, the circuit breakerr will become CLOSED; if any of them fail, it will become OPEN; 116 117 This process is a gradual process of testing downstream, and opening up; 118 119 The above "timeout" (DetectTimeout) and "number" (DEFAULT_HALFOPEN_SUCCESSES) are both configurable; 120 121 ### Concurrency control 122 The circuit breaker also performs concurrency control, with the parameter MaxConcurrency; 123 124 IsAllowed will return false when the maximum number of concurrency is reached; 125 126 ### Statistics 127 ##### Default parameter 128 The circuit breaker counts successes, failures and timeouts within a period of time window, the default window size is 10S; 129 130 The time window can be set with two parameters, but usually you don't need to care. 131 132 ##### statistics method 133 The statistics method is to divide the time window into buckets, each bucket records data for a fixed period of time; 134 135 For example, if you want to count 10 seconds of data, you can divide the 10 second time period into 100 buckets, and each bucket will count 100ms of data; 136 137 The BucketTime and BucketNums in Options correspond to the time period maintained by each bucket and the number of buckets, respectively; 138 139 If BucketTime is set to 100ms and BucketNums is set to 100, it corresponds to a 10 second time window; 140 141 ##### Jitter 142 As time moves, the oldest bucket in the window will expire, and when the last bucket expires, jitter will occur; 143 144 As an example: 145 + you divide 10 seconds into 10 buckets, bucket 0 corresponds to the time [0S, 1S), bucket 1 corresponds to the time [1S, 2S), ... , barrel 9 corresponds to [9S, 10S); 146 + At 10.1S, if Succ is executed once, the following operation occurs in the circuitbreaker; 147 + (1) Bucket 0 is detected as expired and is discarded; (2) A new bucket 10 is created, corresponding to [10S, 11S); (3) The Succ is placed in bucket 10; 148 + At 10.2S, you run Successes() to query the number of successes in the window, then you get the actual count of [1S, 10.2S) instead of [0.2S, 10.2S); 149 150 If you use the bucket counting method, such jitter is unavoidable, a compromise is to increase the number of buckets to reduce the impact of jitter; 151 152 If the number of buckets is divided into 2000, the impact of jitter on the overall data will be at most 1/2000; 153 154 In this package, the default number of buckets is also 100, the bucket time is 100ms, and the overall window is 10S; 155 156 There were several technical solutions to avoid this problem, but they all introduced more problems, if you have good ideas, please issue or PR.