github.com/cilium/cilium@v1.16.2/Documentation/network/bgp-control-plane/bgp-control-plane-v1.rst (about) 1 .. only:: not (epub or latex or html) 2 3 WARNING: You are looking at unreleased Cilium documentation. 4 Please use the official rendered version released here: 5 https://docs.cilium.io 6 7 .. _bgp_control_plane_v1: 8 9 BGP Peering Policy ( Legacy ) 10 ############################# 11 12 .. warning:: 13 ``CiliumBGPPeeringPolicy`` will be discontinued in future. Consider 14 using the new :ref:`BGP APIs <bgp_control_plane_v2>` to configure the BGP Control Plane. 15 16 Configure Peering 17 ----------------- 18 19 .. code-block:: yaml 20 21 apiVersion: "cilium.io/v2alpha1" 22 kind: CiliumBGPPeeringPolicy 23 metadata: 24 name: rack0 25 spec: 26 nodeSelector: 27 matchLabels: 28 rack: rack0 29 virtualRouters: 30 - localASN: 64512 31 neighbors: 32 - peerAddress: '10.0.0.1/32' 33 peerASN: 64512 34 35 All BGP peering topology information is carried in a ``CiliumBGPPeeringPolicy`` 36 CRD. A ``CiliumBGPPeeringPolicy`` can be applied to one or more nodes based on 37 its ``nodeSelector`` field. Only a single ``CiliumBGPPeeringPolicy`` can be 38 applied to a node. If multiple policies match a node, Cilium clears all BGP 39 sessions until only one policy matches the node. 40 41 .. warning:: 42 43 Applying another policy over an existing one will cause the BGP session to 44 be cleared and causes immediate connectivity disruption. It is strongly 45 recommended to test the policy in a staging environment before applying it 46 to production. 47 48 Each ``CiliumBGPPeeringPolicy`` defines one or more ``virtualRouters``. The 49 virtual router defines a BGP router instance which is uniquely identified by 50 its ``localASN``. Each virtual router can have multiple ``neighbors`` defined. 51 The neighbor defines a BGP neighbor uniquely identified by its ``peerAddress`` 52 and ``peerASN``. When ``localASN`` and ``peerASN`` are the same, iBGP peering 53 is used. When ``localASN`` and ``peerASN`` are different, eBGP peering is used. 54 55 Specifying Router ID (IPv6 single-stack only) 56 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 57 58 When Cilium is running on an IPv4 or a dual-stack, the BGP Router ID is 59 automatically derived from the IPv4 address assigned to the node. When Cilium 60 is running on an IPv6 single-stack cluster, the BGP Router ID must be 61 configured manually. This can be done by setting the annotation on the 62 Kubernetes Node resource: 63 64 .. code-block:: shell-session 65 66 $ kubectl annotate node <node-name> cilium.io/bgp-virtual-router.64512="router-id=10.0.0.2" 67 68 Currently, you must set the annotation for each Node. In the future, automatic 69 assignment of the Router ID may be supported. Follow `#30333 70 <https://github.com/cilium/cilium/issues/30333/>`_ for updates. 71 72 73 Validating Peering Status 74 ^^^^^^^^^^^^^^^^^^^^^^^^^ 75 76 Once the ``CiliumBGPPeeringPolicy`` is applied, you can check the BGP peering 77 status with the Cilium CLI with the following command: 78 79 .. code-block:: shell-session 80 81 $ cilium bgp peers 82 Node Local AS Peer AS Peer Address Session State Uptime Family Received Advertised 83 node0 64512 64512 10.0.0.1 established 10s ipv4/unicast 0 0 84 ipv6/unicast 0 0 85 86 87 Node Annotations 88 ---------------- 89 90 A ``CiliumBGPPeeringPolicy`` can apply to multiple nodes. When a 91 ``CiliumBGPPeeringPolicy`` applies to one or more nodes each node will 92 instantiate one or more BGP routers as defined in ``virtualRouters``. However, 93 there are times when fine-grained control over an instantiated virtual router's 94 configuration needs to take place. This can be accomplished by applying a 95 Kubernetes annotation to Kubernetes Node resources. 96 97 A single annotation is used to specify a set of configuration attributes 98 to apply to a particular virtual router instantiated on a particular 99 host. 100 101 The syntax of the annotation is as follows: 102 103 :: 104 105 cilium.io/bgp-virtual-router.{asn}="key=value,..." 106 107 The ``{asn}`` portion should be replaced by the virtual router's local ASN you 108 wish to apply these configuration attributes to. Multiple option key/value 109 pairs can be specified by separating them with a comma. When duplicate keys are 110 defined with different values, the last key's value will be used. 111 112 Overriding Router ID 113 ^^^^^^^^^^^^^^^^^^^^ 114 115 When Cilium is running on an IPv4 single-stack or a dual-stack, the BGP Control 116 Plane can use the IPv4 address assigned to the node as the BGP Router ID 117 because Router ID is 32bit long, and we can rely on the uniqueness of the IPv4 118 address to make Router ID unique which is not the case for IPv6. Thus, when 119 running in an IPv6 single-stack, or when the auto assignment of the Router ID 120 is not desired, the administrator needs to manually define it. This can be 121 accomplished by setting the ``router-id`` key in the annotation. 122 123 .. code-block:: shell-session 124 125 $ kubectl annotate node <node-name> cilium.io/bgp-virtual-router.{asn}="router-id=10.0.0.2" 126 127 128 Listening on the Local Port 129 ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 130 131 By default, the BGP Control Plane instantiates each virtual router without a 132 listening port. This means the BGP router can only initiate connections to the 133 configured peers, but cannot accept incoming connections. This is the default 134 behavior because the BGP Control Plane is designed to function in environments 135 where another BGP router (such as ``Bird``) is running on the same node. When 136 it is required to accept incoming connections, the ``local-port`` key can be 137 used to specify the listening port. 138 139 .. code-block:: shell-session 140 141 $ kubectl annotate node <node-name> cilium.io/bgp-virtual-router.{asn}="local-port=179" 142 143 Advertising PodCIDRs 144 -------------------- 145 146 BGP Control Plane can advertise PodCIDR prefixes of the nodes selected by the 147 ``CiliumBGPPeeringPolicy`` to the BGP peers. This allows the BGP peers to reach 148 the Pods directly without involving load balancers or NAT. There are two ways 149 to advertise PodCIDRs depending on the IPAM mode setting. 150 151 Kubernetes and ClusterPool IPAM 152 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 153 154 When :ref:`Kubernetes <k8s_hostscope>` or :ref:`ClusterPool 155 <ipam_crd_cluster_pool>` IPAM is used, set the 156 ``virtualRouters[*].exportPodCIDR`` field to true. 157 158 .. code-block:: yaml 159 160 apiVersion: "cilium.io/v2alpha1" 161 kind: CiliumBGPPeeringPolicy 162 metadata: 163 name: rack0 164 spec: 165 nodeSelector: 166 matchLabels: 167 rack: rack0 168 virtualRouters: 169 - localASN: 64512 170 exportPodCIDR: true # <-- enable PodCIDR advertisement 171 neighbors: 172 - peerAddress: '10.0.0.1/32' 173 peerASN: 64512 174 175 With this configuration, the BGP speaker on each node advertises the 176 PodCIDR prefixes assigned to the local node. 177 178 .. _bgp_control_plane_multipool_ipam: 179 180 MutliPool IPAM 181 ^^^^^^^^^^^^^^ 182 183 When :ref:`MultiPool IPAM <ipam_crd_multi_pool>` is used, specify the 184 ``virtualRouters[*].podIPPoolSelector`` field. The ``.podIPPoolSelector`` field 185 is a label selector that selects allocated CIDRs of ``CiliumPodIPPool`` 186 matching the specified ``.matchLabels`` or ``.matchExpressions``. 187 188 .. code-block:: yaml 189 190 apiVersion: "cilium.io/v2alpha1" 191 kind: CiliumBGPPeeringPolicy 192 metadata: 193 name: rack0 194 spec: 195 nodeSelector: 196 matchLabels: 197 rack: rack0 198 virtualRouters: 199 - localASN: 64512 200 podIPPoolSelector: # <-- select CiliumPodIPPool to advertise 201 matchLabels: 202 environment: production 203 neighbors: 204 - peerAddress: '10.0.0.1/32' 205 peerASN: 64512 206 207 This advertises the PodCIDR prefixes allocated from the selected 208 CiliumPodIPPools. Note that the CIDR must be allocated to a ``CiliumNode`` that 209 matches the ``.nodeSelector`` for the virtual router to announce the PodCIDR as 210 a BGP route. 211 212 If you wish to announce ALL CiliumPodIPPool CIDRs within the cluster, a ``NotIn`` match expression 213 with a dummy key and value can be used like: 214 215 .. code-block:: yaml 216 217 apiVersion: "cilium.io/v2alpha1" 218 kind: CiliumBGPPeeringPolicy 219 spec: 220 nodeSelector: 221 matchLabels: 222 rack: rack0 223 virtualRouters: 224 - localASN: 64512 225 podIPPoolSelector: 226 matchExpressions: 227 - {key: somekey, operator: NotIn, values: ['never-used-value']} 228 neighbors: 229 - peerAddress: '10.0.0.1/32' 230 peerASN: 64512 231 232 There are two special purpose selector fields that match CiliumPodIPPools based on ``name`` and/or 233 ``namespace`` metadata instead of labels: 234 235 =============================== =================== 236 Selector Field 237 ------------------------------- ------------------- 238 io.cilium.podippool.namespace ``.meta.namespace`` 239 io.cilium.podippool.name ``.meta.name`` 240 =============================== =================== 241 242 For additional details regarding CiliumPodIPPools, see the :ref:`ipam_crd_multi_pool` section. 243 244 Other IPAM Types 245 ^^^^^^^^^^^^^^^^ 246 247 When using other IPAM types, the BGP Control Plane does not support advertising 248 PodCIDRs and specifying ``virtualRouters[*].exportPodCIDR`` doesn't take any 249 effect. 250 251 Advertising Service Virtual IPs 252 ------------------------------- 253 254 In Kubernetes, a Service has multiple virtual IP addresses, 255 such as ``.spec.clusterIP``, ``.spec.clusterIPs``, ``.status.loadBalancer.ingress[*].ip`` 256 and ``.spec.externalIPs``. 257 The BGP control plane can advertise the virtual IP address of the Service to BGP peers. 258 This allows users to directly access the Service from outside the cluster. 259 260 To advertise the virtual IPs, specify the ``virtualRouters[*].serviceSelector`` field 261 and the ``virtualRouters[*].serviceAdvertisements`` field. The ``.serviceAdvertisements`` 262 defaults to the ``LoadBalancerIP`` service. You can also specify the ``.serviceAdvertisements`` 263 field to advertise specific service types, with options such as ``LoadBalancerIP``, 264 ``ClusterIP`` and ``ExternalIP``. 265 266 It is worth noting that when you configure ``virtualRouters[*].serviceAdvertisements`` as ``ClusterIP``, 267 the BGP Control Plane only considers the configuration of the service's ``.spec.internalTrafficPolicy`` and ignores 268 the configuration of ``.spec.externalTrafficPolicy``. 269 For ``ExternalIP`` and ``LoadBalancerIP``, it only considers the configuration of 270 the service's ``.spec.externalTrafficPolicy`` and ignores the configuration of ``.spec.internalTrafficPolicy``. 271 272 The ``.serviceSelector`` field is a label selector that selects Services matching 273 the specified ``.matchLabels`` or ``.matchExpressions``. 274 275 When your upstream router supports Equal Cost Multi Path(ECMP), you can use 276 this feature to load balance traffic to the Service across multiple nodes by 277 advertising the same ingress IPs from multiple nodes. 278 279 .. code-block:: yaml 280 281 apiVersion: "cilium.io/v2alpha1" 282 kind: CiliumBGPPeeringPolicy 283 metadata: 284 name: rack0 285 spec: 286 nodeSelector: 287 matchLabels: 288 rack: rack0 289 virtualRouters: 290 - localASN: 64512 291 serviceSelector: # <-- select Services to advertise 292 matchLabels: 293 app: foo 294 serviceAdvertisements: # <-- specify the service types to advertise 295 - LoadBalancerIP # <-- default 296 - ClusterIP # <-- options 297 - ExternalIP # <-- options 298 neighbors: 299 - peerAddress: '10.0.0.1/32' 300 peerASN: 64512 301 302 303 .. warning:: 304 305 Many routers have a limit on the number of ECMP paths they can hold in their 306 routing table (`Juniper 307 <https://www.juniper.net/documentation/us/en/software/junos/cli-reference/topics/ref/statement/maximum-ecmp-edit-chassis.html>`__). 308 When advertising the Service VIPs from many nodes, you may exceed this 309 limit. We recommend checking the limit with your network administrator 310 before using this feature. 311 312 Advertising ExternalIP Services 313 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 314 315 If you wish to use this together with ``kubeProxyReplacement`` feature (see :ref:`kubeproxy-free` docs), 316 please make sure the ExternalIP support is enabled. 317 318 If you only wish to advertise the ``.spec.externalIPs`` of Service, 319 you can specify the ``virtualRouters[*].serviceAdvertisements`` field as ``ExternalIP``. 320 321 .. code-block:: yaml 322 323 apiVersion: "cilium.io/v2alpha1" 324 kind: CiliumBGPPeeringPolicy 325 metadata: 326 name: rack0 327 spec: 328 nodeSelector: 329 matchLabels: 330 rack: rack0 331 virtualRouters: 332 - localASN: 64512 333 serviceSelector: # <-- select Services to advertise 334 matchLabels: 335 app: foo 336 serviceAdvertisements: # <-- specify the service types to advertise 337 - ExternalIP 338 neighbors: 339 - peerAddress: '10.0.0.1/32' 340 peerASN: 64512 341 342 343 Advertising ClusterIP Services 344 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 345 346 If you wish to use this together with ``kubeProxyReplacement`` feature (see :ref:`kubeproxy-free` docs), 347 specific BPF parameters need to be enabled. 348 See :ref:`External Access To ClusterIP Services <external_access_to_clusterip_services>` section for how to enable it. 349 350 If you only wish to advertise the ``.spec.clusterIP`` and ``.spec.clusterIPs`` of Service, 351 you can specify the ``virtualRouters[*].serviceAdvertisements`` field as ``ClusterIP``. 352 353 .. code-block:: yaml 354 355 apiVersion: "cilium.io/v2alpha1" 356 kind: CiliumBGPPeeringPolicy 357 metadata: 358 name: rack0 359 spec: 360 nodeSelector: 361 matchLabels: 362 rack: rack0 363 virtualRouters: 364 - localASN: 64512 365 serviceSelector: # <-- select Services to advertise 366 matchLabels: 367 app: foo 368 serviceAdvertisements: # <-- specify the service types to advertise 369 - ClusterIP 370 neighbors: 371 - peerAddress: '10.0.0.1/32' 372 peerASN: 64512 373 374 Additionally, when the ``.spec.clusterIP`` or ``.spec.clusterIPs`` of the Service contains ``None``, 375 this IP address will be ignored and will not be advertised. 376 377 378 379 Advertising Load Balancer Services 380 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 381 382 You must first allocate ingress IPs to advertise them. By default, Kubernetes 383 doesn't provide a way to assign ingress IPs to a Service. The cluster 384 administrator is responsible for preparing a controller that assigns ingress 385 IPs. Cilium supports assigning ingress IPs with the :ref:`Load Balancer IPAM 386 <lb_ipam>` feature. 387 388 .. code-block:: yaml 389 390 apiVersion: "cilium.io/v2alpha1" 391 kind: CiliumBGPPeeringPolicy 392 spec: 393 nodeSelector: 394 matchLabels: 395 rack: rack0 396 virtualRouters: 397 - localASN: 64512 398 serviceSelector: 399 matchLabels: 400 app: foo 401 neighbors: 402 - peerAddress: '10.0.0.1/32' 403 peerASN: 64512 404 405 This advertises the ingress IPs of all Services matching the ``.serviceSelector``. 406 407 If you wish to announce ALL services within the cluster, a ``NotIn`` match expression 408 with a dummy key and value can be used like: 409 410 .. code-block:: yaml 411 412 apiVersion: "cilium.io/v2alpha1" 413 kind: CiliumBGPPeeringPolicy 414 spec: 415 nodeSelector: 416 matchLabels: 417 rack: rack0 418 virtualRouters: 419 - localASN: 64512 420 serviceSelector: 421 matchExpressions: 422 - {key: somekey, operator: NotIn, values: ['never-used-value']} 423 neighbors: 424 - peerAddress: '10.0.0.1/32' 425 peerASN: 64512 426 427 There are a few special purpose selector fields which don't match on labels but 428 instead on other metadata like ``.meta.name`` or ``.meta.namespace``. 429 430 =============================== =================== 431 Selector Field 432 ------------------------------- ------------------- 433 io.kubernetes.service.namespace ``.meta.namespace`` 434 io.kubernetes.service.name ``.meta.name`` 435 =============================== =================== 436 437 Load Balancer Class 438 ~~~~~~~~~~~~~~~~~~~ 439 440 Cilium supports the `loadBalancerClass 441 <https://kubernetes.io/docs/concepts/services-networking/service/#load-balancer-class>`__. 442 When the load balancer class is set to ``io.cilium/bgp-control-plane`` or unspecified, 443 Cilium will announce the ingress IPs of the Service. Otherwise, Cilium will not announce 444 the ingress IPs of the Service. 445 446 externalTrafficPolicy 447 ~~~~~~~~~~~~~~~~~~~~~ 448 449 When the Service has ``externalTrafficPolicy: Cluster``, BGP Control Plane 450 unconditionally advertises the ingress IPs of the selected Service. When the 451 Service has ``externalTrafficPolicy: Local``, BGP Control Plane keeps track of 452 the endpoints for the service on the local node and stops advertisement when 453 there's no local endpoint. 454 455 Validating Advertised Routes 456 ---------------------------- 457 458 Get all IPv4 unicast routes available: 459 460 .. code-block:: shell-session 461 462 $ cilium bgp routes available ipv4 unicast 463 Node VRouter Prefix NextHop Age Attrs 464 node0 64512 10.1.0.0/24 0.0.0.0 17m42s [{Origin: i} {Nexthop: 0.0.0.0}] 465 466 Get all IPv4 unicast routes available for a specific vrouter: 467 468 .. code-block:: shell-session 469 470 $ cilium bgp routes available ipv4 unicast vrouter 64512 471 Node VRouter Prefix NextHop Age Attrs 472 node0 64512 10.1.0.0/24 0.0.0.0 17m42s [{Origin: i} {Nexthop: 0.0.0.0}] 473 474 Get IPv4 unicast routes advertised to a specific peer: 475 476 .. code-block:: shell-session 477 478 $ cilium bgp routes advertised ipv4 unicast peer 10.0.0.1 479 Node VRouter Prefix NextHop Age Attrs 480 node0 64512 10.1.0.0/24 10.0.0.2 17m42s [{Origin: i} {AsPath: } {Nexthop: 10.0.0.2} {LocalPref: 100}] 481 482 483 Neighbor Options 484 ---------------- 485 486 Each ``virtualRouters`` can contain multiple ``neighbors``. You can specify 487 various BGP peering options for each neighbor. This section describes the 488 available options and use cases. 489 490 .. warning:: 491 492 Change of an existing neighbor configuration can cause reset of the existing BGP 493 peering connection, which results in route flaps and transient packet loss while 494 the session reestablishes and peers exchange their routes. To prevent packet loss, 495 it is recommended to configure BGP Graceful Restart. 496 497 Peer Port 498 ^^^^^^^^^ 499 500 By default, the BGP Control Plane uses port 179 for BGP peering. When the neighbor is 501 running on a non-standard port, you can specify the port number with the ``peerPort`` 502 field. 503 504 .. code-block:: yaml 505 506 apiVersion: "cilium.io/v2alpha1" 507 kind: CiliumBGPPeeringPolicy 508 spec: 509 nodeSelector: 510 matchLabels: 511 rack: rack0 512 virtualRouters: 513 - localASN: 64512 514 neighbors: 515 - peerAddress: '10.0.0.1/32' 516 peerASN: 64512 517 peerPort: 1179 # <-- specify the peer port 518 519 Timers 520 ^^^^^^ 521 522 BGP Control Plane supports modifying the following BGP timer parameters. For 523 more detailed description for each timer parameters, please refer to `RFC4271 524 <https://kubernetes.io/docs/concepts/services-networking/service/#load-balancer-class>`__. 525 526 ================= ============================ ========== 527 Name Field Default 528 ----------------- ---------------------------- ---------- 529 ConnectRetryTimer ``connectRetryTimeSeconds`` 120 530 HoldTimer ``holdTimeSeconds`` 90 531 KeepaliveTimer ``keepAliveTimeSeconds`` 30 532 ================= ============================ ========== 533 534 In datacenter networks which Kubernetes clusters are deployed, it is generally 535 recommended to set the ``HoldTimer`` and ``KeepaliveTimer`` to a lower value 536 for faster possible failure detection. For example, you can set the minimum 537 possible values ``holdTimeSeconds=9`` and ``keepAliveTimeSeconds=3``. 538 539 .. code-block:: yaml 540 541 apiVersion: "cilium.io/v2alpha1" 542 kind: CiliumBGPPeeringPolicy 543 spec: 544 nodeSelector: 545 matchLabels: 546 rack: rack0 547 virtualRouters: 548 - localASN: 64512 549 neighbors: 550 - peerAddress: '10.0.0.1/32' 551 peerASN: 64512 552 connetRetryTimeSeconds: 90 # <-- specify the ConnectRetryTimer 553 holdTimeSeconds: 9 # <-- specify the HoldTimer 554 keepAliveTimeSeconds: 3 # <-- specify the KeepaliveTimer 555 556 eBGP Multihop 557 ^^^^^^^^^^^^^ 558 559 By default, IP TTL of the BGP packets is set to 1 in eBGP. Generally, it is 560 encouraged to not change the TTL, but in some cases, you may need to change the 561 TTL value. For example, when the BGP peer is a Route Server and located in a 562 different subnet, you may need to set the TTL value to more than 1. 563 564 .. code-block:: yaml 565 566 apiVersion: "cilium.io/v2alpha1" 567 kind: CiliumBGPPeeringPolicy 568 spec: 569 nodeSelector: 570 matchLabels: 571 rack: rack0 572 virtualRouters: 573 - localASN: 64512 574 neighbors: 575 - peerAddress: '10.0.0.1/32' 576 peerASN: 64512 577 eBGPMultihopTTL: 4 # <-- specify the TTL value 578 579 MD5 Passwords 580 ^^^^^^^^^^^^^ 581 582 By configuring ``authSecretRef`` for a neighbor you can configure that a 583 `RFC-2385`_ TCP MD5 password should be configured on the session with this BGP 584 peer. 585 586 .. code-block:: yaml 587 588 apiVersion: "cilium.io/v2alpha1" 589 kind: CiliumBGPPeeringPolicy 590 metadata: 591 name: rack0 592 spec: 593 nodeSelector: 594 matchLabels: 595 rack: rack0 596 virtualRouters: 597 - localASN: 64512 598 neighbors: 599 - peerAddress: '10.0.0.1/32' 600 peerASN: 64512 601 authSecretRef: "bgp-password" # <-- specify the secret name 602 603 ``authSecretRef`` should reference the name of a secret in the BGP secrets 604 namespace (if using the Helm chart this is ``kube-system`` by default). The 605 secret should contain a key with a name of ``password``. 606 607 BGP secrets are limited to a configured namespace to keep the permissions 608 needed on each Cilium Agent instance to a minimum. The Helm chart will 609 configure Cilium to be able to read from it by default. 610 611 An example of creating a secret is: 612 613 .. code-block:: shell-session 614 615 $ kubectl create secret generic -n kube-system --type=string secretname --from-literal=password=my-secret-password 616 617 If you wish to change the namespace, you can set the 618 ``bgpControlPlane.secretNamespace.name`` Helm chart value. To have the 619 namespace created automatically, you can set the 620 ``bgpControlPlane.secretNamespace.create`` Helm chart value to ``true``. 621 622 Because TCP MD5 passwords sign the header of the packet they cannot be used if 623 the session will be address translated by Cilium (i.e. the Cilium Agent's pod 624 IP address must be the address the BGP peer sees). 625 626 If the password is incorrect, or the header is otherwise changed the TCP 627 connection will not succeed. This will appear as ``dial: i/o timeout`` in the 628 Cilium Agent's logs rather than a more specific error message. 629 630 .. _RFC-2385 : https://www.rfc-editor.org/rfc/rfc2385.html 631 632 If a ``CiliumBGPPeeringPolicy`` is deployed with an ``authSecretRef`` that Cilium cannot find, the BGP session will use an empty password and the agent will log an error such as in the following example:: 633 634 level=error msg="Failed to fetch secret \"secretname\": not found (will continue with empty password)" component=manager.fetchPeerPassword subsys=bgp-control-plane 635 636 .. _bgp_control_plane_graceful_restart: 637 638 Graceful Restart 639 ^^^^^^^^^^^^^^^^ 640 The Cilium BGP Control Plane can be configured to act as a graceful restart 641 ``Restarting Speaker``. When you enable graceful restart, the BGP session will restart 642 and the "graceful restart" capability will be advertised in the BGP OPEN message. 643 644 In the event of a Cilium Agent restart, the peering BGP router does not withdraw 645 routes received from the Cilium BGP control plane immediately. The datapath 646 continues to forward traffic during Agent restart, so there is no traffic 647 disruption. 648 649 Configure graceful restart on per-neighbor basis, as follows: 650 651 .. code-block:: yaml 652 653 apiVersion: "cilium.io/v2alpha1" 654 kind: CiliumBGPPeeringPolicy 655 metadata: 656 name: rack0 657 spec: 658 nodeSelector: 659 matchLabels: 660 rack: rack0 661 virtualRouters: 662 - localASN: 64512 663 neighbors: 664 - peerAddress: '10.0.0.1/32' 665 peerASN: 64512 666 gracefulRestart: 667 enabled: true # <-- enable graceful restart 668 restartTimeSeconds: 120 # <-- set RestartTime 669 670 .. warning:: 671 672 When enabled, graceful restart capability is advertised for IPv4 and IPv6 673 address families by default. From v1.15, we have a known issue where Cilium 674 takes long time (approximately 300s) to restart route advertisement after 675 graceful restart when Cilium advertises both IPv4 and IPv6 address families, 676 but a remote peer advertises only one of them. You can work around this 677 issue by aligning the address families advertised by Cilium and remote with 678 the `families field <bgp-control-plane-address-families_>`_. You can track 679 `#30367 <https://github.com/cilium/cilium/issues/30367/>`_ for updates. 680 681 Optionally, you can use the ``RestartTime`` parameter. ``RestartTime`` is the time 682 advertised to the peer within which Cilium BGP control plane is expected to re-establish 683 the BGP session after a restart. On expiration of ``RestartTime``, the peer removes 684 the routes previously advertised by the Cilium BGP control plane. 685 686 When the Cilium Agent restarts, it closes the BGP TCP socket, causing the emission of a 687 TCP FIN packet. On receiving this TCP FIN, the peer changes its BGP state to ``Idle`` and 688 starts its ``RestartTime`` timer. 689 690 The Cilium agent boot up time varies depending on the deployment. If using ``RestartTime``, 691 you should set it to a duration greater than the time taken by the Cilium Agent to boot up. 692 693 Default value of ``RestartTime`` is 120 seconds. More details on graceful restart and 694 ``RestartTime`` can be found in `RFC-4724`_ and `RFC-8538`_. 695 696 .. _RFC-4724 : https://www.rfc-editor.org/rfc/rfc4724.html 697 .. _RFC-8538 : https://www.rfc-editor.org/rfc/rfc8538.html 698 699 Advertised Path Attributes 700 ^^^^^^^^^^^^^^^^^^^^^^^^^^ 701 702 BGP advertisements can be extended with additional BGP Path Attributes - BGP Communities (`RFC-1997`_) or Local Preference. 703 These Path Attributes can be configured selectively for each BGP peer and advertisement type. 704 705 The following code block shows an example configuration of ``AdvertisedPathAttributes`` for a BGP neighbor, 706 which adds a BGP community attribute with the value ``64512:100`` to all Service announcements from the 707 matching ``CiliumLoadBalancerIPPool`` and sets the Local Preference value for all Pod CIDR announcements 708 to the value ``150``: 709 710 .. code-block:: yaml 711 712 apiVersion: "cilium.io/v2alpha1" 713 kind: CiliumBGPPeeringPolicy 714 metadata: 715 name: rack0 716 spec: 717 nodeSelector: 718 matchLabels: 719 rack: rack0 720 virtualRouters: 721 - localASN: 64512 722 neighbors: 723 - peerAddress: '10.0.0.1/32' 724 peerASN: 64512 725 advertisedPathAttributes: 726 - selectorType: CiliumLoadBalancerIPPool # <-- select CiliumLoadBalancerIPPool and add BGP community 64512:100 727 selector: 728 matchLabels: 729 environment: production 730 communities: 731 standard: 732 - 64512:100 733 - selectorType: PodCIDR # <-- select PodCIDR and add local preference 150 and BGP community 64512:150 734 localPreference: 150 735 communities: 736 standard: 737 - 64512:150 738 739 .. note:: 740 Note that Local Preference Path Attribute is sent only to ``iBGP`` peers (not to ``eBGP`` peers). 741 742 Each ``AdvertisedPathAttributes`` configuration item consists of two parts: 743 744 - ``SelectorType`` with ``Selector`` define which BGP advertisements will be extended with additional Path Attributes. 745 - ``Communities`` and / or ``LocalPreference`` define the additional Path Attributes applied on the selected routes. 746 747 There are three possible values of the ``SelectorType`` which define the object type on which the ``Selector`` applies: 748 749 - ``PodCIDR``: matches ``CiliumNode`` custom resources 750 (Path Attributes apply to routes announced for PodCIDRs of selected ``CiliumNode`` objects). 751 - ``CiliumLoadBalancerIPPool``: matches ``CiliumLoadBalancerIPPool`` custom resources 752 (Path Attributes apply to routes announced for selected ``CiliumLoadBalancerIPPool`` objects). 753 - ``CiliumPodIPPool``: matches ``CiliumPodIPPool`` custom resources 754 (Path Attributes apply to routes announced for allocated prefixes of selected ``CiliumPodIPPool`` objects). 755 756 There are two types of additional Path Attributes that can be advertised with the routes: ``Communities`` and ``LocalPreference``. 757 758 ``Communities`` defines a set of community values advertised in the supported BGP Communities Path Attributes. 759 The values can be of three types: 760 761 - ``Standard``: represents a value of the "standard" 32-bit BGP Communities Attribute (`RFC-1997`_) 762 as a 4-byte decimal number or two 2-byte decimal numbers separated by a colon (e.g. ``64512:100``). 763 - ``WellKnown``: represents a value of the "standard" 32-bit BGP Communities Attribute (`RFC-1997`_) 764 as a well-known string alias to its numeric value. Allowed values and their mapping to the numeric values: 765 766 =============================== ================= ================= 767 Well-Known Value Hexadecimal Value 16-bit Pair Value 768 ------------------------------- ----------------- ----------------- 769 ``internet`` ``0x00000000`` ``0:0`` 770 ``planned-shut`` ``0xffff0000`` ``65535:0`` 771 ``accept-own`` ``0xffff0001`` ``65535:1`` 772 ``route-filter-translated-v4`` ``0xffff0002`` ``65535:2`` 773 ``route-filter-v4`` ``0xffff0003`` ``65535:3`` 774 ``route-filter-translated-v6`` ``0xffff0004`` ``65535:4`` 775 ``route-filter-v6`` ``0xffff0005`` ``65535:5`` 776 ``llgr-stale`` ``0xffff0006`` ``65535:6`` 777 ``no-llgr`` ``0xffff0007`` ``65535:7`` 778 ``blackhole`` ``0xffff029a`` ``65535:666`` 779 ``no-export`` ``0xffffff01`` ``65535:65281`` 780 ``no-advertise`` ``0xffffff02`` ``65535:65282`` 781 ``no-export-subconfed`` ``0xffffff03`` ``65535:65283`` 782 ``no-peer`` ``0xffffff04`` ``65535:65284`` 783 =============================== ================= ================= 784 785 - ``Large``: represents a value of the BGP Large Communities Attribute (`RFC-8092`_), 786 as three 4-byte decimal numbers separated by colons (e.g. ``64512:100:50``). 787 788 .. _RFC-1997 : https://www.rfc-editor.org/rfc/rfc1997.html 789 .. _RFC-8092 : https://www.rfc-editor.org/rfc/rfc8092.html 790 791 ``LocalPreference`` defines the preference value advertised in the BGP Local Preference Path Attribute. 792 As Local Preference is only valid for ``iBGP`` peers, this value will be ignored for ``eBGP`` peers 793 (no Local Preference Path Attribute will be advertised). 794 795 Once configured, the additional Path Attributes advertised with the routes for a peer can be verified using the 796 ``cilium bgp routes`` Cilium CLI command, for example: 797 798 .. code-block:: shell-session 799 800 $ cilium bgp routes advertised ipv4 unicast peer 10.0.0.1 801 802 VRouter Prefix NextHop Age Attrs 803 64512 10.1.0.0/24 10.0.0.2 3m31s [{Origin: i} {LocalPref: 150} {Nexthop: 10.0.0.2}] 804 64512 192.168.100.190/32 10.0.0.2 3m32s [{Origin: i} {LocalPref: 100} {Communities: 64512:100} {Nexthop: 10.0.0.2}] 805 806 .. _bgp-control-plane-address-families: 807 808 Address Families 809 ^^^^^^^^^^^^^^^^ 810 811 By default, the BGP Control Plane advertises IPv4 Unicast and IPv6 Unicast 812 Multiprotocol Extensions Capability (`RFC-4760`_) as well as Graceful Restart 813 address families (`RFC-4724`_) if enabled. If you wish to change the default 814 behavior and advertise only specific address families, you can use the 815 ``families`` field. The ``families`` field is a list of AFI (Address Family 816 Identifier) and SAFI (Subsequent Address Family Identifier) pairs. The only 817 options currently supported are ``{afi: ipv4, safi: unicast}`` and ``{afi: 818 ipv6, safi: unicast}``. 819 820 Following example shows how to advertise only IPv4 Unicast address family: 821 822 .. _RFC-4760 : https://www.rfc-editor.org/rfc/rfc4760.html 823 824 .. code-block:: yaml 825 826 apiVersion: "cilium.io/v2alpha1" 827 kind: CiliumBGPPeeringPolicy 828 metadata: 829 name: rack0 830 spec: 831 nodeSelector: 832 matchLabels: 833 rack: rack0 834 virtualRouters: 835 - localASN: 64512 836 neighbors: 837 - peerAddress: '10.0.0.1/32' 838 peerASN: 64512 839 families: 840 - afi: ipv4 841 safi: unicast