github.com/cilium/cilium@v1.16.2/Documentation/network/servicemesh/envoy-circuit-breaker.rst (about)

     1  .. only:: not (epub or latex or html)
     2  
     3      WARNING: You are looking at unreleased Cilium documentation.
     4      Please use the official rendered version released here:
     5      https://docs.cilium.io
     6  
     7  .. _gs_envoy_circuit_breaker:
     8  
     9  *******************
    10  L7 Circuit Breaking
    11  *******************
    12  
    13  Cilium Service Mesh defines a ``CiliumClusterwideEnvoyConfig`` CRD which allows users
    14  to set the configuration of the Envoy component built into Cilium agents.
    15  
    16  Circuit breaking is an important pattern for creating resilient microservice applications. 
    17  Circuit breaking allows you to write applications that limit the impact of failures, latency spikes, 
    18  and other undesirable effects of network peculiarities.
    19  
    20  You will configure Circuit breaking rules with ``CiliumClusterwideEnvoyConfig`` and then test the configuration 
    21  by intentionally “tripping” the circuit breaker in this example.
    22  
    23  Deploy Test Applications
    24  ========================
    25  
    26  .. parsed-literal::
    27  
    28      $ kubectl apply -f \ |SCM_WEB|\/examples/kubernetes/servicemesh/envoy/test-application-proxy-circuit-breaker.yaml
    29  
    30  The test workloads consist of:
    31  
    32  - One client Deployment, ``fortio-deploy``
    33  - One Service, ``echo-service``
    34  
    35  View information about these Pods:
    36  
    37  .. code-block:: shell-session
    38  
    39      $ kubectl get pods --show-labels -o wide
    40      NAME                             READY   STATUS    RESTARTS   AGE     IP           NODE                       NOMINATED NODE   READINESS GATES   LABELS
    41      echo-service-59557f5857-xh84s    2/2     Running   0          7m37s   10.0.0.125   cilium-control-plane   <none>           <none>            kind=echo,name=echo-service,other=echo,pod-template-hash=59557f5857
    42      fortio-deploy-687945c6dc-6qnh4   1/1     Running   0          7m37s   10.0.0.109   cilium-control-plane   <none>           <none>            app=fortio,pod-template-hash=687945c6dc
    43  
    44  
    45  Configuring Envoy Circuit Breaker
    46  =================================
    47  
    48  Apply the ``envoy-circuit-breaker.yaml`` file, which defines a ``CiliumClusterwideEnvoyConfig``.
    49  
    50  
    51  .. parsed-literal::
    52  
    53      $ kubectl apply -f \ |SCM_WEB|\/examples/kubernetes/servicemesh/envoy/envoy-circuit-breaker.yaml
    54  
    55  .. include:: warning.rst
    56  
    57  Verify the ``CiliumClusterwideEnvoyConfig`` was created correctly.
    58  
    59  .. code-block:: shell-session
    60  
    61      $ kubectl get ccec envoy-circuit-breaker -oyaml
    62      apiVersion: cilium.io/v2
    63      kind: CiliumClusterwideEnvoyConfig
    64      ...
    65      resources:
    66      - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
    67        name: "default/echo-service"
    68        connect_timeout: 5s
    69        lb_policy: ROUND_ROBIN
    70        type: EDS
    71        circuit_breakers:
    72          thresholds:
    73          - priority: "DEFAULT"
    74            max_requests: 2
    75            max_pending_requests: 1
    76        outlier_detection:
    77          split_external_local_origin_errors: true
    78          consecutive_local_origin_failure: 2
    79      services:
    80      - name: echo-service
    81        namespace: default
    82  
    83  In the ``CiliumClusterwideEnvoyConfig`` settings, you specified ``max_pending_requests: 1`` and ``max_requests: 2``. 
    84  These rules indicate that if you exceed more than one connection and request concurrently,
    85  you will see some failures when the envoy opens the circuit for further requests and connections.
    86  
    87  Tripping Envoy Circuit Breaker
    88  ==============================
    89  
    90  Make an environment variable with the Pod name for fortio:
    91  
    92  .. code-block:: shell-session
    93  
    94      $ export FORTIO_POD=$(kubectl get pods -l app=fortio -o 'jsonpath={.items[0].metadata.name}')
    95  
    96  Use the following command to call the Service with two concurrent connections using the ``-c 2`` flag and send 20 requests using ``-n 20`` flag:
    97  
    98  .. code-block:: shell-session
    99  
   100      $ kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 2 -qps 0 -n 20 http://echo-service:8080
   101  
   102  Output::
   103  
   104      $ kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 2 -qps 0 -n 20 http://echo-service:8080
   105      {"ts":1692767216.838976,"level":"info","file":"scli.go","line":107,"msg":"Starting Φορτίο 1.57.3 h1:kdPlBiws3cFsLcssZxCt2opFmHj14C3yPBokFhMWzmg= go1.20.6 amd64 linux"}
   106      Fortio 1.57.3 running at 0 queries per second, 4->4 procs, for 20 calls: http://echo-service:8080
   107      {"ts":1692767216.839520,"level":"info","file":"httprunner.go","line":100,"msg":"Starting http test","run":"0","url":"http://echo-service:8080","threads":"2","qps":"-1.0","warmup":"parallel","conn-reuse":""}
   108      Starting at max qps with 2 thread(s) [gomax 4] for exactly 20 calls (10 per thread + 0)
   109      {"ts":1692767216.842149,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"1","run":"0"}
   110      {"ts":1692767216.854289,"level":"info","file":"periodic.go","line":832,"msg":"T001 ended after 13.462339ms : 10 calls. qps=742.8129688310479"}
   111      {"ts":1692767216.854985,"level":"info","file":"periodic.go","line":832,"msg":"T000 ended after 14.158587ms : 10 calls. qps=706.2851681456631"}
   112      Ended after 14.197088ms : 20 calls. qps=1408.7
   113      {"ts":1692767216.855035,"level":"info","file":"periodic.go","line":564,"msg":"Run ended","run":"0","elapsed":"14.197088ms","calls":"20","qps":"1408.739595049351"}
   114      Aggregated Function Time : count 20 avg 0.0013703978 +/- 0.000461 min 0.00092124 max 0.002696039 sum 0.027407957
   115      # range, mid point, percentile, count
   116      >= 0.00092124 <= 0.001 , 0.00096062 , 10.00, 2
   117      > 0.001 <= 0.002 , 0.0015 , 90.00, 16
   118      > 0.002 <= 0.00269604 , 0.00234802 , 100.00, 2
   119      # target 50% 0.0015
   120      # target 75% 0.0018125
   121      # target 90% 0.002
   122      # target 99% 0.00262644
   123      # target 99.9% 0.00268908
   124      Error cases : count 1 avg 0.00133143 +/- 0 min 0.00133143 max 0.00133143 sum 0.00133143
   125      # range, mid point, percentile, count
   126      >= 0.00133143 <= 0.00133143 , 0.00133143 , 100.00, 1
   127      # target 50% 0.00133143
   128      # target 75% 0.00133143
   129      # target 90% 0.00133143
   130      # target 99% 0.00133143
   131      # target 99.9% 0.00133143
   132      # Socket and IP used for each connection:
   133      [0]   1 socket used, resolved to 10.96.182.43:8080, connection timing : count 1 avg 0.000426815 +/- 0 min 0.000426815 max 0.000426815 sum 0.000426815
   134      [1]   2 socket used, resolved to 10.96.182.43:8080, connection timing : count 2 avg 0.0004071275 +/- 0.0001215 min 0.000285596 max 0.000528659 sum 0.000814255
   135      Connection time histogram (s) : count 3 avg 0.00041369 +/- 9.966e-05 min 0.000285596 max 0.000528659 sum 0.00124107
   136      # range, mid point, percentile, count
   137      >= 0.000285596 <= 0.000528659 , 0.000407128 , 100.00, 3
   138      # target 50% 0.000346362
   139      # target 75% 0.00043751
   140      # target 90% 0.0004922
   141      # target 99% 0.000525013
   142      # target 99.9% 0.000528294
   143      Sockets used: 3 (for perfect keepalive, would be 2)
   144      Uniform: false, Jitter: false, Catchup allowed: true
   145      IP addresses distribution:
   146      10.96.182.43:8080: 3
   147      Code 200 : 19 (95.0 %)
   148      Code 503 : 1 (5.0 %)
   149      Response Header Sizes : count 20 avg 370.5 +/- 85 min 0 max 390 sum 7410
   150      Response Body/Total Sizes : count 20 avg 2340.15 +/- 465.7 min 310 max 2447 sum 46803
   151      All done 20 calls (plus 0 warmup) 1.370 ms avg, 1408.7 qps
   152  
   153  From the above output, you can see that the response code of some requests is 503, 
   154  which triggers a circuit breaker.
   155  
   156  Bring the number of concurrent connections up to 4.
   157  
   158  Output::
   159  
   160      $ kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 4 -qps 0 -n 20 http://echo-service:8080
   161      {"ts":1692767495.818546,"level":"info","file":"scli.go","line":107,"msg":"Starting Φορτίο 1.57.3 h1:kdPlBiws3cFsLcssZxCt2opFmHj14C3yPBokFhMWzmg= go1.20.6 amd64 linux"}
   162      Fortio 1.57.3 running at 0 queries per second, 4->4 procs, for 20 calls: http://echo-service:8080
   163      {"ts":1692767495.819105,"level":"info","file":"httprunner.go","line":100,"msg":"Starting http test","run":"0","url":"http://echo-service:8080","threads":"4","qps":"-1.0","warmup":"parallel","conn-reuse":""}
   164      Starting at max qps with 4 thread(s) [gomax 4] for exactly 20 calls (5 per thread + 0)
   165      {"ts":1692767495.822424,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"3","run":"0"}
   166      {"ts":1692767495.822428,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"0","run":"0"}
   167      {"ts":1692767495.822603,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"1","run":"0"}
   168      {"ts":1692767495.823855,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"0","run":"0"}
   169      {"ts":1692767495.825250,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"1","run":"0"}
   170      {"ts":1692767495.825285,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"0","run":"0"}
   171      {"ts":1692767495.827282,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"0","run":"0"}
   172      {"ts":1692767495.827514,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"2","run":"0"}
   173      {"ts":1692767495.829886,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"0","run":"0"}
   174      {"ts":1692767495.830156,"level":"info","file":"periodic.go","line":832,"msg":"T000 ended after 9.136284ms : 5 calls. qps=547.268451812575"}
   175      {"ts":1692767495.830326,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"2","run":"0"}
   176      {"ts":1692767495.831175,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"3","run":"0"}
   177      {"ts":1692767495.832826,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"3","run":"0"}
   178      {"ts":1692767495.834028,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"3","run":"0"}
   179      {"ts":1692767495.834116,"level":"info","file":"periodic.go","line":832,"msg":"T003 ended after 13.09904ms : 5 calls. qps=381.7073617608619"}
   180      {"ts":1692767495.834865,"level":"info","file":"periodic.go","line":832,"msg":"T001 ended after 13.846811ms : 5 calls. qps=361.09397318992796"}
   181      {"ts":1692767495.835370,"level":"info","file":"periodic.go","line":832,"msg":"T002 ended after 14.352324ms : 5 calls. qps=348.3756358900482"}
   182      Ended after 14.386516ms : 20 calls. qps=1390.2
   183      {"ts":1692767495.835489,"level":"info","file":"periodic.go","line":564,"msg":"Run ended","run":"0","elapsed":"14.386516ms","calls":"20","qps":"1390.1906479650806"}
   184      Aggregated Function Time : count 20 avg 0.0024801033 +/- 0.001782 min 0.000721482 max 0.008055527 sum 0.049602066
   185      # range, mid point, percentile, count
   186      >= 0.000721482 <= 0.001 , 0.000860741 , 10.00, 2
   187      > 0.001 <= 0.002 , 0.0015 , 45.00, 7
   188      > 0.002 <= 0.003 , 0.0025 , 80.00, 7
   189      > 0.003 <= 0.004 , 0.0035 , 85.00, 1
   190      > 0.005 <= 0.006 , 0.0055 , 95.00, 2
   191      > 0.008 <= 0.00805553 , 0.00802776 , 100.00, 1
   192      # target 50% 0.00214286
   193      # target 75% 0.00285714
   194      # target 90% 0.0055
   195      # target 99% 0.00804442
   196      # target 99.9% 0.00805442
   197      Error cases : count 13 avg 0.0016602806 +/- 0.0006006 min 0.000721482 max 0.00281812 sum 0.021583648
   198      # range, mid point, percentile, count
   199      >= 0.000721482 <= 0.001 , 0.000860741 , 15.38, 2
   200      > 0.001 <= 0.002 , 0.0015 , 61.54, 6
   201      > 0.002 <= 0.00281812 , 0.00240906 , 100.00, 5
   202      # target 50% 0.00175
   203      # target 75% 0.00228634
   204      # target 90% 0.00260541
   205      # target 99% 0.00279685
   206      # target 99.9% 0.00281599
   207      # Socket and IP used for each connection:
   208      [0]   5 socket used, resolved to 10.96.182.43:8080, connection timing : count 5 avg 0.0003044688 +/- 0.0001472 min 0.000120654 max 0.00053878 sum 0.001522344
   209      [1]   3 socket used, resolved to 10.96.182.43:8080, connection timing : count 3 avg 0.00041437933 +/- 9.571e-05 min 0.000330279 max 0.000548277 sum 0.001243138
   210      [2]   3 socket used, resolved to 10.96.182.43:8080, connection timing : count 3 avg 0.00041114067 +/- 0.0001352 min 0.000306734 max 0.00060203 sum 0.001233422
   211      [3]   4 socket used, resolved to 10.96.182.43:8080, connection timing : count 4 avg 0.00038631225 +/- 0.0002447 min 0.000175125 max 0.00080311 sum 0.001545249
   212      Connection time histogram (s) : count 15 avg 0.0003696102 +/- 0.0001758 min 0.000120654 max 0.00080311 sum 0.005544153
   213      # range, mid point, percentile, count
   214      >= 0.000120654 <= 0.00080311 , 0.000461882 , 100.00, 15
   215      # target 50% 0.000437509
   216      # target 75% 0.000620309
   217      # target 90% 0.00072999
   218      # target 99% 0.000795798
   219      # target 99.9% 0.000802379
   220      Sockets used: 15 (for perfect keepalive, would be 4)
   221      Uniform: false, Jitter: false, Catchup allowed: true
   222      IP addresses distribution:
   223      10.96.182.43:8080: 15
   224      Code 200 : 7 (35.0 %)
   225      Code 503 : 13 (65.0 %)
   226      Response Header Sizes : count 20 avg 136.5 +/- 186 min 0 max 390 sum 2730
   227      Response Body/Total Sizes : count 20 avg 1026.9 +/- 1042 min 241 max 2447 sum 20538
   228      All done 20 calls (plus 0 warmup) 2.480 ms avg, 1390.2 qps
   229  
   230  Now you can start to see the expected Circuit breaking behavior. 
   231  Only 35% of the requests succeeded and the rest were trapped by Circuit breaking.
   232  
   233  .. parsed-literal::
   234      Code 200 : 7 (35.0 %)
   235      Code 503 : 13 (65.0 %)
   236  
   237  
   238  Cleaning up
   239  ===========
   240  
   241  Remove the rules.
   242  
   243  .. parsed-literal::
   244  
   245      $ kubectl delete -f \ |SCM_WEB|\/examples/kubernetes/servicemesh/envoy/envoy-circuit-breaker.yaml
   246  
   247  Remove the test application.
   248  
   249  .. parsed-literal::
   250  
   251      $ kubectl delete -f \ |SCM_WEB|\/examples/kubernetes/servicemesh/envoy/test-application-proxy-circuit-breaker.yaml