github.com/cilium/cilium@v1.16.2/Documentation/network/servicemesh/envoy-circuit-breaker.rst (about) 1 .. only:: not (epub or latex or html) 2 3 WARNING: You are looking at unreleased Cilium documentation. 4 Please use the official rendered version released here: 5 https://docs.cilium.io 6 7 .. _gs_envoy_circuit_breaker: 8 9 ******************* 10 L7 Circuit Breaking 11 ******************* 12 13 Cilium Service Mesh defines a ``CiliumClusterwideEnvoyConfig`` CRD which allows users 14 to set the configuration of the Envoy component built into Cilium agents. 15 16 Circuit breaking is an important pattern for creating resilient microservice applications. 17 Circuit breaking allows you to write applications that limit the impact of failures, latency spikes, 18 and other undesirable effects of network peculiarities. 19 20 You will configure Circuit breaking rules with ``CiliumClusterwideEnvoyConfig`` and then test the configuration 21 by intentionally “tripping” the circuit breaker in this example. 22 23 Deploy Test Applications 24 ======================== 25 26 .. parsed-literal:: 27 28 $ kubectl apply -f \ |SCM_WEB|\/examples/kubernetes/servicemesh/envoy/test-application-proxy-circuit-breaker.yaml 29 30 The test workloads consist of: 31 32 - One client Deployment, ``fortio-deploy`` 33 - One Service, ``echo-service`` 34 35 View information about these Pods: 36 37 .. code-block:: shell-session 38 39 $ kubectl get pods --show-labels -o wide 40 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS 41 echo-service-59557f5857-xh84s 2/2 Running 0 7m37s 10.0.0.125 cilium-control-plane <none> <none> kind=echo,name=echo-service,other=echo,pod-template-hash=59557f5857 42 fortio-deploy-687945c6dc-6qnh4 1/1 Running 0 7m37s 10.0.0.109 cilium-control-plane <none> <none> app=fortio,pod-template-hash=687945c6dc 43 44 45 Configuring Envoy Circuit Breaker 46 ================================= 47 48 Apply the ``envoy-circuit-breaker.yaml`` file, which defines a ``CiliumClusterwideEnvoyConfig``. 49 50 51 .. parsed-literal:: 52 53 $ kubectl apply -f \ |SCM_WEB|\/examples/kubernetes/servicemesh/envoy/envoy-circuit-breaker.yaml 54 55 .. include:: warning.rst 56 57 Verify the ``CiliumClusterwideEnvoyConfig`` was created correctly. 58 59 .. code-block:: shell-session 60 61 $ kubectl get ccec envoy-circuit-breaker -oyaml 62 apiVersion: cilium.io/v2 63 kind: CiliumClusterwideEnvoyConfig 64 ... 65 resources: 66 - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster 67 name: "default/echo-service" 68 connect_timeout: 5s 69 lb_policy: ROUND_ROBIN 70 type: EDS 71 circuit_breakers: 72 thresholds: 73 - priority: "DEFAULT" 74 max_requests: 2 75 max_pending_requests: 1 76 outlier_detection: 77 split_external_local_origin_errors: true 78 consecutive_local_origin_failure: 2 79 services: 80 - name: echo-service 81 namespace: default 82 83 In the ``CiliumClusterwideEnvoyConfig`` settings, you specified ``max_pending_requests: 1`` and ``max_requests: 2``. 84 These rules indicate that if you exceed more than one connection and request concurrently, 85 you will see some failures when the envoy opens the circuit for further requests and connections. 86 87 Tripping Envoy Circuit Breaker 88 ============================== 89 90 Make an environment variable with the Pod name for fortio: 91 92 .. code-block:: shell-session 93 94 $ export FORTIO_POD=$(kubectl get pods -l app=fortio -o 'jsonpath={.items[0].metadata.name}') 95 96 Use the following command to call the Service with two concurrent connections using the ``-c 2`` flag and send 20 requests using ``-n 20`` flag: 97 98 .. code-block:: shell-session 99 100 $ kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 2 -qps 0 -n 20 http://echo-service:8080 101 102 Output:: 103 104 $ kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 2 -qps 0 -n 20 http://echo-service:8080 105 {"ts":1692767216.838976,"level":"info","file":"scli.go","line":107,"msg":"Starting Φορτίο 1.57.3 h1:kdPlBiws3cFsLcssZxCt2opFmHj14C3yPBokFhMWzmg= go1.20.6 amd64 linux"} 106 Fortio 1.57.3 running at 0 queries per second, 4->4 procs, for 20 calls: http://echo-service:8080 107 {"ts":1692767216.839520,"level":"info","file":"httprunner.go","line":100,"msg":"Starting http test","run":"0","url":"http://echo-service:8080","threads":"2","qps":"-1.0","warmup":"parallel","conn-reuse":""} 108 Starting at max qps with 2 thread(s) [gomax 4] for exactly 20 calls (10 per thread + 0) 109 {"ts":1692767216.842149,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"1","run":"0"} 110 {"ts":1692767216.854289,"level":"info","file":"periodic.go","line":832,"msg":"T001 ended after 13.462339ms : 10 calls. qps=742.8129688310479"} 111 {"ts":1692767216.854985,"level":"info","file":"periodic.go","line":832,"msg":"T000 ended after 14.158587ms : 10 calls. qps=706.2851681456631"} 112 Ended after 14.197088ms : 20 calls. qps=1408.7 113 {"ts":1692767216.855035,"level":"info","file":"periodic.go","line":564,"msg":"Run ended","run":"0","elapsed":"14.197088ms","calls":"20","qps":"1408.739595049351"} 114 Aggregated Function Time : count 20 avg 0.0013703978 +/- 0.000461 min 0.00092124 max 0.002696039 sum 0.027407957 115 # range, mid point, percentile, count 116 >= 0.00092124 <= 0.001 , 0.00096062 , 10.00, 2 117 > 0.001 <= 0.002 , 0.0015 , 90.00, 16 118 > 0.002 <= 0.00269604 , 0.00234802 , 100.00, 2 119 # target 50% 0.0015 120 # target 75% 0.0018125 121 # target 90% 0.002 122 # target 99% 0.00262644 123 # target 99.9% 0.00268908 124 Error cases : count 1 avg 0.00133143 +/- 0 min 0.00133143 max 0.00133143 sum 0.00133143 125 # range, mid point, percentile, count 126 >= 0.00133143 <= 0.00133143 , 0.00133143 , 100.00, 1 127 # target 50% 0.00133143 128 # target 75% 0.00133143 129 # target 90% 0.00133143 130 # target 99% 0.00133143 131 # target 99.9% 0.00133143 132 # Socket and IP used for each connection: 133 [0] 1 socket used, resolved to 10.96.182.43:8080, connection timing : count 1 avg 0.000426815 +/- 0 min 0.000426815 max 0.000426815 sum 0.000426815 134 [1] 2 socket used, resolved to 10.96.182.43:8080, connection timing : count 2 avg 0.0004071275 +/- 0.0001215 min 0.000285596 max 0.000528659 sum 0.000814255 135 Connection time histogram (s) : count 3 avg 0.00041369 +/- 9.966e-05 min 0.000285596 max 0.000528659 sum 0.00124107 136 # range, mid point, percentile, count 137 >= 0.000285596 <= 0.000528659 , 0.000407128 , 100.00, 3 138 # target 50% 0.000346362 139 # target 75% 0.00043751 140 # target 90% 0.0004922 141 # target 99% 0.000525013 142 # target 99.9% 0.000528294 143 Sockets used: 3 (for perfect keepalive, would be 2) 144 Uniform: false, Jitter: false, Catchup allowed: true 145 IP addresses distribution: 146 10.96.182.43:8080: 3 147 Code 200 : 19 (95.0 %) 148 Code 503 : 1 (5.0 %) 149 Response Header Sizes : count 20 avg 370.5 +/- 85 min 0 max 390 sum 7410 150 Response Body/Total Sizes : count 20 avg 2340.15 +/- 465.7 min 310 max 2447 sum 46803 151 All done 20 calls (plus 0 warmup) 1.370 ms avg, 1408.7 qps 152 153 From the above output, you can see that the response code of some requests is 503, 154 which triggers a circuit breaker. 155 156 Bring the number of concurrent connections up to 4. 157 158 Output:: 159 160 $ kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 4 -qps 0 -n 20 http://echo-service:8080 161 {"ts":1692767495.818546,"level":"info","file":"scli.go","line":107,"msg":"Starting Φορτίο 1.57.3 h1:kdPlBiws3cFsLcssZxCt2opFmHj14C3yPBokFhMWzmg= go1.20.6 amd64 linux"} 162 Fortio 1.57.3 running at 0 queries per second, 4->4 procs, for 20 calls: http://echo-service:8080 163 {"ts":1692767495.819105,"level":"info","file":"httprunner.go","line":100,"msg":"Starting http test","run":"0","url":"http://echo-service:8080","threads":"4","qps":"-1.0","warmup":"parallel","conn-reuse":""} 164 Starting at max qps with 4 thread(s) [gomax 4] for exactly 20 calls (5 per thread + 0) 165 {"ts":1692767495.822424,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"3","run":"0"} 166 {"ts":1692767495.822428,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"0","run":"0"} 167 {"ts":1692767495.822603,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"1","run":"0"} 168 {"ts":1692767495.823855,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"0","run":"0"} 169 {"ts":1692767495.825250,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"1","run":"0"} 170 {"ts":1692767495.825285,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"0","run":"0"} 171 {"ts":1692767495.827282,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"0","run":"0"} 172 {"ts":1692767495.827514,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"2","run":"0"} 173 {"ts":1692767495.829886,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"0","run":"0"} 174 {"ts":1692767495.830156,"level":"info","file":"periodic.go","line":832,"msg":"T000 ended after 9.136284ms : 5 calls. qps=547.268451812575"} 175 {"ts":1692767495.830326,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"2","run":"0"} 176 {"ts":1692767495.831175,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"3","run":"0"} 177 {"ts":1692767495.832826,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"3","run":"0"} 178 {"ts":1692767495.834028,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"3","run":"0"} 179 {"ts":1692767495.834116,"level":"info","file":"periodic.go","line":832,"msg":"T003 ended after 13.09904ms : 5 calls. qps=381.7073617608619"} 180 {"ts":1692767495.834865,"level":"info","file":"periodic.go","line":832,"msg":"T001 ended after 13.846811ms : 5 calls. qps=361.09397318992796"} 181 {"ts":1692767495.835370,"level":"info","file":"periodic.go","line":832,"msg":"T002 ended after 14.352324ms : 5 calls. qps=348.3756358900482"} 182 Ended after 14.386516ms : 20 calls. qps=1390.2 183 {"ts":1692767495.835489,"level":"info","file":"periodic.go","line":564,"msg":"Run ended","run":"0","elapsed":"14.386516ms","calls":"20","qps":"1390.1906479650806"} 184 Aggregated Function Time : count 20 avg 0.0024801033 +/- 0.001782 min 0.000721482 max 0.008055527 sum 0.049602066 185 # range, mid point, percentile, count 186 >= 0.000721482 <= 0.001 , 0.000860741 , 10.00, 2 187 > 0.001 <= 0.002 , 0.0015 , 45.00, 7 188 > 0.002 <= 0.003 , 0.0025 , 80.00, 7 189 > 0.003 <= 0.004 , 0.0035 , 85.00, 1 190 > 0.005 <= 0.006 , 0.0055 , 95.00, 2 191 > 0.008 <= 0.00805553 , 0.00802776 , 100.00, 1 192 # target 50% 0.00214286 193 # target 75% 0.00285714 194 # target 90% 0.0055 195 # target 99% 0.00804442 196 # target 99.9% 0.00805442 197 Error cases : count 13 avg 0.0016602806 +/- 0.0006006 min 0.000721482 max 0.00281812 sum 0.021583648 198 # range, mid point, percentile, count 199 >= 0.000721482 <= 0.001 , 0.000860741 , 15.38, 2 200 > 0.001 <= 0.002 , 0.0015 , 61.54, 6 201 > 0.002 <= 0.00281812 , 0.00240906 , 100.00, 5 202 # target 50% 0.00175 203 # target 75% 0.00228634 204 # target 90% 0.00260541 205 # target 99% 0.00279685 206 # target 99.9% 0.00281599 207 # Socket and IP used for each connection: 208 [0] 5 socket used, resolved to 10.96.182.43:8080, connection timing : count 5 avg 0.0003044688 +/- 0.0001472 min 0.000120654 max 0.00053878 sum 0.001522344 209 [1] 3 socket used, resolved to 10.96.182.43:8080, connection timing : count 3 avg 0.00041437933 +/- 9.571e-05 min 0.000330279 max 0.000548277 sum 0.001243138 210 [2] 3 socket used, resolved to 10.96.182.43:8080, connection timing : count 3 avg 0.00041114067 +/- 0.0001352 min 0.000306734 max 0.00060203 sum 0.001233422 211 [3] 4 socket used, resolved to 10.96.182.43:8080, connection timing : count 4 avg 0.00038631225 +/- 0.0002447 min 0.000175125 max 0.00080311 sum 0.001545249 212 Connection time histogram (s) : count 15 avg 0.0003696102 +/- 0.0001758 min 0.000120654 max 0.00080311 sum 0.005544153 213 # range, mid point, percentile, count 214 >= 0.000120654 <= 0.00080311 , 0.000461882 , 100.00, 15 215 # target 50% 0.000437509 216 # target 75% 0.000620309 217 # target 90% 0.00072999 218 # target 99% 0.000795798 219 # target 99.9% 0.000802379 220 Sockets used: 15 (for perfect keepalive, would be 4) 221 Uniform: false, Jitter: false, Catchup allowed: true 222 IP addresses distribution: 223 10.96.182.43:8080: 15 224 Code 200 : 7 (35.0 %) 225 Code 503 : 13 (65.0 %) 226 Response Header Sizes : count 20 avg 136.5 +/- 186 min 0 max 390 sum 2730 227 Response Body/Total Sizes : count 20 avg 1026.9 +/- 1042 min 241 max 2447 sum 20538 228 All done 20 calls (plus 0 warmup) 2.480 ms avg, 1390.2 qps 229 230 Now you can start to see the expected Circuit breaking behavior. 231 Only 35% of the requests succeeded and the rest were trapped by Circuit breaking. 232 233 .. parsed-literal:: 234 Code 200 : 7 (35.0 %) 235 Code 503 : 13 (65.0 %) 236 237 238 Cleaning up 239 =========== 240 241 Remove the rules. 242 243 .. parsed-literal:: 244 245 $ kubectl delete -f \ |SCM_WEB|\/examples/kubernetes/servicemesh/envoy/envoy-circuit-breaker.yaml 246 247 Remove the test application. 248 249 .. parsed-literal:: 250 251 $ kubectl delete -f \ |SCM_WEB|\/examples/kubernetes/servicemesh/envoy/test-application-proxy-circuit-breaker.yaml