github.com/cilium/cilium@v1.16.2/Documentation/network/bgp-control-plane/bgp-control-plane-v1.rst (about)

     1  .. only:: not (epub or latex or html)
     2  
     3      WARNING: You are looking at unreleased Cilium documentation.
     4      Please use the official rendered version released here:
     5      https://docs.cilium.io
     6  
     7  .. _bgp_control_plane_v1:
     8  
     9  BGP Peering Policy ( Legacy )
    10  #############################
    11  
    12  .. warning::
    13      ``CiliumBGPPeeringPolicy`` will be discontinued in future. Consider
    14      using the new :ref:`BGP APIs <bgp_control_plane_v2>` to configure the BGP Control Plane.
    15  
    16  Configure Peering
    17  -----------------
    18  
    19  .. code-block:: yaml
    20  
    21     apiVersion: "cilium.io/v2alpha1"
    22     kind: CiliumBGPPeeringPolicy
    23     metadata:
    24       name: rack0
    25     spec:
    26       nodeSelector:
    27         matchLabels:
    28           rack: rack0
    29       virtualRouters:
    30       - localASN: 64512
    31         neighbors:
    32         - peerAddress: '10.0.0.1/32'
    33           peerASN: 64512
    34  
    35  All BGP peering topology information is carried in a ``CiliumBGPPeeringPolicy``
    36  CRD. A ``CiliumBGPPeeringPolicy`` can be applied to one or more nodes based on
    37  its ``nodeSelector`` field. Only a single ``CiliumBGPPeeringPolicy`` can be
    38  applied to a node. If multiple policies match a node, Cilium clears all BGP
    39  sessions until only one policy matches the node.
    40  
    41  .. warning::
    42  
    43     Applying another policy over an existing one will cause the BGP session to
    44     be cleared and causes immediate connectivity disruption. It is strongly
    45     recommended to test the policy in a staging environment before applying it
    46     to production.
    47  
    48  Each ``CiliumBGPPeeringPolicy`` defines one or more ``virtualRouters``. The
    49  virtual router defines a BGP router instance which is uniquely identified by
    50  its ``localASN``. Each virtual router can have multiple ``neighbors`` defined.
    51  The neighbor defines a BGP neighbor uniquely identified by its ``peerAddress``
    52  and ``peerASN``. When ``localASN`` and ``peerASN`` are the same, iBGP peering
    53  is used. When ``localASN`` and ``peerASN`` are different, eBGP peering is used.
    54  
    55  Specifying Router ID (IPv6 single-stack only)
    56  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    57  
    58  When Cilium is running on an IPv4 or a dual-stack, the BGP Router ID is
    59  automatically derived from the IPv4 address assigned to the node. When Cilium
    60  is running on an IPv6 single-stack cluster, the BGP Router ID must be
    61  configured manually. This can be done by setting the annotation on the
    62  Kubernetes Node resource:
    63  
    64  .. code-block:: shell-session
    65  
    66     $ kubectl annotate node <node-name> cilium.io/bgp-virtual-router.64512="router-id=10.0.0.2"
    67  
    68  Currently, you must set the annotation for each Node. In the future, automatic
    69  assignment of the Router ID may be supported. Follow `#30333
    70  <https://github.com/cilium/cilium/issues/30333/>`_ for updates.
    71  
    72  
    73  Validating Peering Status
    74  ^^^^^^^^^^^^^^^^^^^^^^^^^
    75  
    76  Once the ``CiliumBGPPeeringPolicy`` is applied, you can check the BGP peering
    77  status with the Cilium CLI with the following command:
    78  
    79  .. code-block:: shell-session
    80  
    81     $ cilium bgp peers
    82     Node                              Local AS   Peer AS   Peer Address     Session State   Uptime   Family         Received   Advertised
    83     node0                             64512      64512     10.0.0.1         established     10s      ipv4/unicast   0          0
    84                                                                                                      ipv6/unicast   0          0
    85  
    86  
    87  Node Annotations
    88  ----------------
    89  
    90  A ``CiliumBGPPeeringPolicy`` can apply to multiple nodes. When a
    91  ``CiliumBGPPeeringPolicy`` applies to one or more nodes each node will
    92  instantiate one or more BGP routers as defined in ``virtualRouters``. However,
    93  there are times when fine-grained control over an instantiated virtual router's
    94  configuration needs to take place. This can be accomplished by applying a
    95  Kubernetes annotation to Kubernetes Node resources.
    96  
    97  A single annotation is used to specify a set of configuration attributes
    98  to apply to a particular virtual router instantiated on a particular
    99  host.
   100  
   101  The syntax of the annotation is as follows:
   102  
   103  ::
   104  
   105         cilium.io/bgp-virtual-router.{asn}="key=value,..."
   106  
   107  The ``{asn}`` portion should be replaced by the virtual router's local ASN you
   108  wish to apply these configuration attributes to. Multiple option key/value
   109  pairs can be specified by separating them with a comma. When duplicate keys are
   110  defined with different values, the last key's value will be used.
   111  
   112  Overriding Router ID
   113  ^^^^^^^^^^^^^^^^^^^^
   114  
   115  When Cilium is running on an IPv4 single-stack or a dual-stack, the BGP Control
   116  Plane can use the IPv4 address assigned to the node as the BGP Router ID
   117  because Router ID is 32bit long, and we can rely on the uniqueness of the IPv4
   118  address to make Router ID unique which is not the case for IPv6. Thus, when
   119  running in an IPv6 single-stack, or when the auto assignment of the Router ID
   120  is not desired, the administrator needs to manually define it. This can be
   121  accomplished by setting the ``router-id`` key in the annotation.
   122  
   123  .. code-block:: shell-session
   124  
   125     $ kubectl annotate node <node-name> cilium.io/bgp-virtual-router.{asn}="router-id=10.0.0.2"
   126  
   127  
   128  Listening on the Local Port
   129  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
   130  
   131  By default, the BGP Control Plane instantiates each virtual router without a
   132  listening port. This means the BGP router can only initiate connections to the
   133  configured peers, but cannot accept incoming connections. This is the default
   134  behavior because the BGP Control Plane is designed to function in environments
   135  where another BGP router (such as ``Bird``) is running on the same node. When
   136  it is required to accept incoming connections, the ``local-port`` key can be
   137  used to specify the listening port.
   138  
   139  .. code-block:: shell-session
   140  
   141     $ kubectl annotate node <node-name> cilium.io/bgp-virtual-router.{asn}="local-port=179"
   142  
   143  Advertising PodCIDRs
   144  --------------------
   145  
   146  BGP Control Plane can advertise PodCIDR prefixes of the nodes selected by the
   147  ``CiliumBGPPeeringPolicy`` to the BGP peers. This allows the BGP peers to reach
   148  the Pods directly without involving load balancers or NAT. There are two ways
   149  to advertise PodCIDRs depending on the IPAM mode setting.
   150  
   151  Kubernetes and ClusterPool IPAM
   152  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   153  
   154  When :ref:`Kubernetes <k8s_hostscope>` or :ref:`ClusterPool
   155  <ipam_crd_cluster_pool>` IPAM is used, set the
   156  ``virtualRouters[*].exportPodCIDR`` field to true.
   157  
   158  .. code-block:: yaml
   159  
   160     apiVersion: "cilium.io/v2alpha1"
   161     kind: CiliumBGPPeeringPolicy
   162     metadata:
   163       name: rack0
   164     spec:
   165       nodeSelector:
   166         matchLabels:
   167           rack: rack0
   168       virtualRouters:
   169       - localASN: 64512
   170         exportPodCIDR: true # <-- enable PodCIDR advertisement
   171         neighbors:
   172         - peerAddress: '10.0.0.1/32'
   173           peerASN: 64512
   174  
   175  With this configuration, the BGP speaker on each node advertises the
   176  PodCIDR prefixes assigned to the local node.
   177  
   178  .. _bgp_control_plane_multipool_ipam:
   179  
   180  MutliPool IPAM
   181  ^^^^^^^^^^^^^^
   182  
   183  When :ref:`MultiPool IPAM <ipam_crd_multi_pool>` is used, specify the
   184  ``virtualRouters[*].podIPPoolSelector`` field. The ``.podIPPoolSelector`` field
   185  is a label selector that selects allocated CIDRs of ``CiliumPodIPPool``
   186  matching the specified ``.matchLabels`` or ``.matchExpressions``.
   187  
   188  .. code-block:: yaml
   189  
   190     apiVersion: "cilium.io/v2alpha1"
   191     kind: CiliumBGPPeeringPolicy
   192     metadata:
   193       name: rack0
   194     spec:
   195       nodeSelector:
   196         matchLabels:
   197           rack: rack0
   198       virtualRouters:
   199       - localASN: 64512
   200         podIPPoolSelector: # <-- select CiliumPodIPPool to advertise
   201           matchLabels:
   202             environment: production
   203         neighbors:
   204         - peerAddress: '10.0.0.1/32'
   205           peerASN: 64512
   206  
   207  This advertises the PodCIDR prefixes allocated from the selected
   208  CiliumPodIPPools. Note that the CIDR must be allocated to a ``CiliumNode`` that
   209  matches the ``.nodeSelector`` for the virtual router to announce the PodCIDR as
   210  a BGP route.
   211  
   212  If you wish to announce ALL CiliumPodIPPool CIDRs within the cluster, a ``NotIn`` match expression
   213  with a dummy key and value can be used like:
   214  
   215  .. code-block:: yaml
   216  
   217     apiVersion: "cilium.io/v2alpha1"
   218     kind: CiliumBGPPeeringPolicy
   219     spec:
   220       nodeSelector:
   221         matchLabels:
   222           rack: rack0
   223       virtualRouters:
   224       - localASN: 64512
   225         podIPPoolSelector:
   226           matchExpressions:
   227           - {key: somekey, operator: NotIn, values: ['never-used-value']}
   228         neighbors:
   229         - peerAddress: '10.0.0.1/32'
   230           peerASN: 64512
   231  
   232  There are two special purpose selector fields that match CiliumPodIPPools based on ``name`` and/or
   233  ``namespace`` metadata instead of labels:
   234  
   235  =============================== ===================
   236  Selector                        Field
   237  ------------------------------- -------------------
   238  io.cilium.podippool.namespace   ``.meta.namespace``
   239  io.cilium.podippool.name        ``.meta.name``
   240  =============================== ===================
   241  
   242  For additional details regarding CiliumPodIPPools, see the :ref:`ipam_crd_multi_pool` section.
   243  
   244  Other IPAM Types
   245  ^^^^^^^^^^^^^^^^
   246  
   247  When using other IPAM types, the BGP Control Plane does not support advertising
   248  PodCIDRs and specifying ``virtualRouters[*].exportPodCIDR`` doesn't take any
   249  effect.
   250  
   251  Advertising Service Virtual IPs
   252  -------------------------------
   253  
   254  In Kubernetes, a Service has multiple virtual IP addresses,
   255  such as ``.spec.clusterIP``, ``.spec.clusterIPs``, ``.status.loadBalancer.ingress[*].ip``
   256  and ``.spec.externalIPs``.
   257  The BGP control plane can advertise the virtual IP address of the Service to BGP peers.
   258  This allows users to directly access the Service from outside the cluster.
   259  
   260  To advertise the virtual IPs, specify the ``virtualRouters[*].serviceSelector`` field
   261  and the ``virtualRouters[*].serviceAdvertisements`` field. The ``.serviceAdvertisements``
   262  defaults to the ``LoadBalancerIP`` service. You can also specify the ``.serviceAdvertisements``
   263  field to advertise specific service types, with options such as ``LoadBalancerIP``,
   264  ``ClusterIP`` and ``ExternalIP``.
   265  
   266  It is worth noting that when you configure ``virtualRouters[*].serviceAdvertisements`` as ``ClusterIP``,
   267  the BGP Control Plane only considers the configuration of the service's ``.spec.internalTrafficPolicy`` and ignores
   268  the configuration of ``.spec.externalTrafficPolicy``.
   269  For ``ExternalIP`` and ``LoadBalancerIP``, it only considers the configuration of
   270  the service's ``.spec.externalTrafficPolicy`` and ignores the configuration of ``.spec.internalTrafficPolicy``.
   271  
   272  The ``.serviceSelector`` field is a label selector that selects Services matching
   273  the specified ``.matchLabels`` or ``.matchExpressions``.
   274  
   275  When your upstream router supports Equal Cost Multi Path(ECMP), you can use
   276  this feature to load balance traffic to the Service across multiple nodes by
   277  advertising the same ingress IPs from multiple nodes.
   278  
   279  .. code-block:: yaml
   280  
   281     apiVersion: "cilium.io/v2alpha1"
   282     kind: CiliumBGPPeeringPolicy
   283     metadata:
   284       name: rack0
   285     spec:
   286       nodeSelector:
   287         matchLabels:
   288           rack: rack0
   289       virtualRouters:
   290       - localASN: 64512
   291         serviceSelector: # <-- select Services to advertise
   292           matchLabels:
   293             app: foo
   294         serviceAdvertisements: # <-- specify the service types to advertise
   295         - LoadBalancerIP # <-- default
   296         - ClusterIP      # <-- options
   297         - ExternalIP     # <-- options
   298         neighbors:
   299         - peerAddress: '10.0.0.1/32'
   300           peerASN: 64512
   301  
   302  
   303  .. warning::
   304  
   305     Many routers have a limit on the number of ECMP paths they can hold in their
   306     routing table (`Juniper
   307     <https://www.juniper.net/documentation/us/en/software/junos/cli-reference/topics/ref/statement/maximum-ecmp-edit-chassis.html>`__).
   308     When advertising the Service VIPs from many nodes, you may exceed this
   309     limit. We recommend checking the limit with your network administrator
   310     before using this feature.
   311  
   312  Advertising ExternalIP Services
   313  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   314  
   315  If you wish to use this together with ``kubeProxyReplacement`` feature  (see :ref:`kubeproxy-free` docs),
   316  please make sure the ExternalIP support is enabled.
   317  
   318  If you only wish to advertise the ``.spec.externalIPs`` of Service,
   319  you can specify the ``virtualRouters[*].serviceAdvertisements`` field as ``ExternalIP``.
   320  
   321  .. code-block:: yaml
   322  
   323     apiVersion: "cilium.io/v2alpha1"
   324     kind: CiliumBGPPeeringPolicy
   325     metadata:
   326       name: rack0
   327     spec:
   328       nodeSelector:
   329         matchLabels:
   330           rack: rack0
   331       virtualRouters:
   332       - localASN: 64512
   333         serviceSelector: # <-- select Services to advertise
   334           matchLabels:
   335             app: foo
   336         serviceAdvertisements: # <-- specify the service types to advertise
   337         - ExternalIP
   338         neighbors:
   339         - peerAddress: '10.0.0.1/32'
   340           peerASN: 64512
   341  
   342  
   343  Advertising ClusterIP Services
   344  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   345  
   346  If you wish to use this together with ``kubeProxyReplacement`` feature  (see :ref:`kubeproxy-free` docs),
   347  specific BPF parameters need to be enabled.
   348  See :ref:`External Access To ClusterIP Services <external_access_to_clusterip_services>` section for how to enable it.
   349  
   350  If you only wish to advertise the ``.spec.clusterIP`` and ``.spec.clusterIPs`` of Service,
   351  you can specify the ``virtualRouters[*].serviceAdvertisements`` field as ``ClusterIP``.
   352  
   353  .. code-block:: yaml
   354  
   355     apiVersion: "cilium.io/v2alpha1"
   356     kind: CiliumBGPPeeringPolicy
   357     metadata:
   358       name: rack0
   359     spec:
   360       nodeSelector:
   361         matchLabels:
   362           rack: rack0
   363       virtualRouters:
   364       - localASN: 64512
   365         serviceSelector: # <-- select Services to advertise
   366           matchLabels:
   367             app: foo
   368         serviceAdvertisements: # <-- specify the service types to advertise
   369         - ClusterIP
   370         neighbors:
   371         - peerAddress: '10.0.0.1/32'
   372           peerASN: 64512
   373  
   374  Additionally, when the ``.spec.clusterIP`` or ``.spec.clusterIPs`` of the Service contains ``None``,
   375  this IP address will be ignored and will not be advertised.
   376  
   377  
   378  
   379  Advertising Load Balancer Services
   380  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   381  
   382  You must first allocate ingress IPs to advertise them. By default, Kubernetes
   383  doesn't provide a way to assign ingress IPs to a Service. The cluster
   384  administrator is responsible for preparing a controller that assigns ingress
   385  IPs. Cilium supports assigning ingress IPs with the :ref:`Load Balancer IPAM
   386  <lb_ipam>` feature.
   387  
   388  .. code-block:: yaml
   389  
   390     apiVersion: "cilium.io/v2alpha1"
   391     kind: CiliumBGPPeeringPolicy
   392     spec:
   393       nodeSelector:
   394         matchLabels:
   395           rack: rack0
   396       virtualRouters:
   397       - localASN: 64512
   398         serviceSelector:
   399           matchLabels:
   400             app: foo
   401         neighbors:
   402         - peerAddress: '10.0.0.1/32'
   403           peerASN: 64512
   404  
   405  This advertises the ingress IPs of all Services matching the ``.serviceSelector``.
   406  
   407  If you wish to announce ALL services within the cluster, a ``NotIn`` match expression
   408  with a dummy key and value can be used like:
   409  
   410  .. code-block:: yaml
   411  
   412     apiVersion: "cilium.io/v2alpha1"
   413     kind: CiliumBGPPeeringPolicy
   414     spec:
   415       nodeSelector:
   416         matchLabels:
   417           rack: rack0
   418       virtualRouters:
   419       - localASN: 64512
   420         serviceSelector:
   421            matchExpressions:
   422               - {key: somekey, operator: NotIn, values: ['never-used-value']}
   423         neighbors:
   424         - peerAddress: '10.0.0.1/32'
   425           peerASN: 64512
   426  
   427  There are a few special purpose selector fields which don't match on labels but
   428  instead on other metadata like ``.meta.name`` or ``.meta.namespace``.
   429  
   430  =============================== ===================
   431  Selector                        Field
   432  ------------------------------- -------------------
   433  io.kubernetes.service.namespace ``.meta.namespace``
   434  io.kubernetes.service.name      ``.meta.name``
   435  =============================== ===================
   436  
   437  Load Balancer Class
   438  ~~~~~~~~~~~~~~~~~~~
   439  
   440  Cilium supports the `loadBalancerClass
   441  <https://kubernetes.io/docs/concepts/services-networking/service/#load-balancer-class>`__.
   442  When the load balancer class is set to ``io.cilium/bgp-control-plane`` or unspecified,
   443  Cilium will announce the ingress IPs of the Service. Otherwise, Cilium will not announce
   444  the ingress IPs of the Service.
   445  
   446  externalTrafficPolicy
   447  ~~~~~~~~~~~~~~~~~~~~~
   448  
   449  When the Service has ``externalTrafficPolicy: Cluster``, BGP Control Plane
   450  unconditionally advertises the ingress IPs of the selected Service. When the
   451  Service has ``externalTrafficPolicy: Local``, BGP Control Plane keeps track of
   452  the endpoints for the service on the local node and stops advertisement when
   453  there's no local endpoint.
   454  
   455  Validating Advertised Routes
   456  ----------------------------
   457  
   458  Get all IPv4 unicast routes available:
   459  
   460  .. code-block:: shell-session
   461  
   462     $ cilium bgp routes available ipv4 unicast
   463     Node                              VRouter   Prefix        NextHop   Age    Attrs
   464     node0                             64512     10.1.0.0/24   0.0.0.0   17m42s [{Origin: i} {Nexthop: 0.0.0.0}]
   465  
   466  Get all IPv4 unicast routes available for a specific vrouter:
   467  
   468  .. code-block:: shell-session
   469  
   470     $ cilium bgp routes available ipv4 unicast vrouter 64512
   471     Node                              VRouter   Prefix        NextHop   Age    Attrs
   472     node0                             64512     10.1.0.0/24   0.0.0.0   17m42s [{Origin: i} {Nexthop: 0.0.0.0}]
   473  
   474  Get IPv4 unicast routes advertised to a specific peer:
   475  
   476  .. code-block:: shell-session
   477  
   478     $ cilium bgp routes advertised ipv4 unicast peer 10.0.0.1
   479     Node                              VRouter   Prefix        NextHop   Age    Attrs
   480     node0                             64512     10.1.0.0/24   10.0.0.2  17m42s [{Origin: i} {AsPath: } {Nexthop: 10.0.0.2} {LocalPref: 100}]
   481  
   482  
   483  Neighbor Options
   484  ----------------
   485  
   486  Each ``virtualRouters`` can contain multiple ``neighbors``. You can specify
   487  various BGP peering options for each neighbor. This section describes the
   488  available options and use cases.
   489  
   490  .. warning::
   491  
   492     Change of an existing neighbor configuration can cause reset of the existing BGP
   493     peering connection, which results in route flaps and transient packet loss while
   494     the session reestablishes and peers exchange their routes. To prevent packet loss,
   495     it is recommended to configure BGP Graceful Restart.
   496  
   497  Peer Port
   498  ^^^^^^^^^
   499  
   500  By default, the BGP Control Plane uses port 179 for BGP peering. When the neighbor is
   501  running on a non-standard port, you can specify the port number with the ``peerPort``
   502  field.
   503  
   504  .. code-block:: yaml
   505  
   506     apiVersion: "cilium.io/v2alpha1"
   507     kind: CiliumBGPPeeringPolicy
   508     spec:
   509       nodeSelector:
   510         matchLabels:
   511           rack: rack0
   512       virtualRouters:
   513       - localASN: 64512
   514         neighbors:
   515         - peerAddress: '10.0.0.1/32'
   516           peerASN: 64512
   517           peerPort: 1179 # <-- specify the peer port
   518  
   519  Timers
   520  ^^^^^^
   521  
   522  BGP Control Plane supports modifying the following BGP timer parameters. For
   523  more detailed description for each timer parameters, please refer to `RFC4271
   524  <https://kubernetes.io/docs/concepts/services-networking/service/#load-balancer-class>`__.
   525  
   526  ================= ============================ ==========
   527  Name              Field                        Default
   528  ----------------- ---------------------------- ----------
   529  ConnectRetryTimer ``connectRetryTimeSeconds``  120
   530  HoldTimer         ``holdTimeSeconds``          90
   531  KeepaliveTimer    ``keepAliveTimeSeconds``     30
   532  ================= ============================ ==========
   533  
   534  In datacenter networks which Kubernetes clusters are deployed, it is generally
   535  recommended to set the ``HoldTimer`` and ``KeepaliveTimer`` to a lower value
   536  for faster possible failure detection. For example, you can set the minimum
   537  possible values ``holdTimeSeconds=9`` and ``keepAliveTimeSeconds=3``.
   538  
   539  .. code-block:: yaml
   540  
   541     apiVersion: "cilium.io/v2alpha1"
   542     kind: CiliumBGPPeeringPolicy
   543     spec:
   544       nodeSelector:
   545         matchLabels:
   546           rack: rack0
   547       virtualRouters:
   548       - localASN: 64512
   549         neighbors:
   550         - peerAddress: '10.0.0.1/32'
   551           peerASN: 64512
   552           connetRetryTimeSeconds: 90 # <-- specify the ConnectRetryTimer
   553           holdTimeSeconds: 9         # <-- specify the HoldTimer
   554           keepAliveTimeSeconds: 3    # <-- specify the KeepaliveTimer
   555  
   556  eBGP Multihop
   557  ^^^^^^^^^^^^^
   558  
   559  By default, IP TTL of the BGP packets is set to 1 in eBGP. Generally, it is
   560  encouraged to not change the TTL, but in some cases, you may need to change the
   561  TTL value. For example, when the BGP peer is a Route Server and located in a
   562  different subnet, you may need to set the TTL value to more than 1.
   563  
   564  .. code-block:: yaml
   565  
   566     apiVersion: "cilium.io/v2alpha1"
   567     kind: CiliumBGPPeeringPolicy
   568     spec:
   569       nodeSelector:
   570         matchLabels:
   571           rack: rack0
   572       virtualRouters:
   573       - localASN: 64512
   574         neighbors:
   575         - peerAddress: '10.0.0.1/32'
   576           peerASN: 64512
   577           eBGPMultihopTTL: 4 # <-- specify the TTL value
   578  
   579  MD5 Passwords
   580  ^^^^^^^^^^^^^
   581  
   582  By configuring ``authSecretRef`` for a neighbor you can configure that a
   583  `RFC-2385`_ TCP MD5 password should be configured on the session with this BGP
   584  peer.
   585  
   586  .. code-block:: yaml
   587  
   588     apiVersion: "cilium.io/v2alpha1"
   589     kind: CiliumBGPPeeringPolicy
   590     metadata:
   591       name: rack0
   592     spec:
   593       nodeSelector:
   594         matchLabels:
   595           rack: rack0
   596       virtualRouters:
   597       - localASN: 64512
   598         neighbors:
   599         - peerAddress: '10.0.0.1/32'
   600           peerASN: 64512
   601           authSecretRef: "bgp-password" # <-- specify the secret name
   602  
   603  ``authSecretRef`` should reference the name of a secret in the BGP secrets
   604  namespace (if using the Helm chart this is ``kube-system`` by default). The
   605  secret should contain a key with a name of ``password``.
   606  
   607  BGP secrets are limited to a configured namespace to keep the permissions
   608  needed on each Cilium Agent instance to a minimum. The Helm chart will
   609  configure Cilium to be able to read from it by default.
   610  
   611  An example of creating a secret is:
   612  
   613  .. code-block:: shell-session
   614  
   615     $ kubectl create secret generic -n kube-system --type=string secretname --from-literal=password=my-secret-password
   616  
   617  If you wish to change the namespace, you can set the
   618  ``bgpControlPlane.secretNamespace.name`` Helm chart value. To have the
   619  namespace created automatically, you can set the
   620  ``bgpControlPlane.secretNamespace.create`` Helm chart value  to ``true``.
   621  
   622  Because TCP MD5 passwords sign the header of the packet they cannot be used if
   623  the session will be address translated by Cilium (i.e. the Cilium Agent's pod
   624  IP address must be the address the BGP peer sees).
   625  
   626  If the password is incorrect, or the header is otherwise changed the TCP
   627  connection will not succeed. This will appear as ``dial: i/o timeout`` in the
   628  Cilium Agent's logs rather than a more specific error message.
   629  
   630  .. _RFC-2385 : https://www.rfc-editor.org/rfc/rfc2385.html
   631  
   632  If a ``CiliumBGPPeeringPolicy`` is deployed with an ``authSecretRef`` that Cilium cannot find, the BGP session will use an empty password and the agent will log an error such as in the following example::
   633  
   634     level=error msg="Failed to fetch secret \"secretname\": not found (will continue with empty password)" component=manager.fetchPeerPassword subsys=bgp-control-plane
   635  
   636  .. _bgp_control_plane_graceful_restart:
   637  
   638  Graceful Restart
   639  ^^^^^^^^^^^^^^^^
   640  The Cilium BGP Control Plane can be configured to act as a graceful restart
   641  ``Restarting Speaker``. When you enable graceful restart, the BGP session will restart
   642  and the "graceful restart" capability will be advertised in the BGP OPEN message.
   643  
   644  In the event of a Cilium Agent restart, the peering BGP router does not withdraw
   645  routes received from the Cilium BGP control plane immediately. The datapath
   646  continues to forward traffic during Agent restart, so there is no traffic
   647  disruption.
   648  
   649  Configure graceful restart on per-neighbor basis, as follows:
   650  
   651  .. code-block:: yaml
   652  
   653     apiVersion: "cilium.io/v2alpha1"
   654     kind: CiliumBGPPeeringPolicy
   655     metadata:
   656       name: rack0
   657     spec:
   658       nodeSelector:
   659         matchLabels:
   660           rack: rack0
   661       virtualRouters:
   662       - localASN: 64512
   663         neighbors:
   664         - peerAddress: '10.0.0.1/32'
   665           peerASN: 64512
   666           gracefulRestart:
   667             enabled: true           # <-- enable graceful restart
   668             restartTimeSeconds: 120 # <-- set RestartTime
   669  
   670  .. warning::
   671  
   672     When enabled, graceful restart capability is advertised for IPv4 and IPv6
   673     address families by default. From v1.15, we have a known issue where Cilium
   674     takes long time (approximately 300s) to restart route advertisement after
   675     graceful restart when Cilium advertises both IPv4 and IPv6 address families,
   676     but a remote peer advertises only one of them. You can work around this
   677     issue by aligning the address families advertised by Cilium and remote with
   678     the `families field <bgp-control-plane-address-families_>`_. You can track
   679     `#30367 <https://github.com/cilium/cilium/issues/30367/>`_ for updates.
   680  
   681  Optionally, you can use the ``RestartTime`` parameter. ``RestartTime`` is the time
   682  advertised to the peer within which Cilium BGP control plane is expected to re-establish
   683  the BGP session after a restart. On expiration of ``RestartTime``, the peer removes
   684  the routes previously advertised by the Cilium BGP control plane.
   685  
   686  When the Cilium Agent restarts, it closes the BGP TCP socket, causing the emission of a
   687  TCP FIN packet. On receiving this TCP FIN, the peer changes its BGP state to ``Idle`` and
   688  starts its ``RestartTime`` timer.
   689  
   690  The Cilium agent boot up time varies depending on the deployment. If using ``RestartTime``,
   691  you should set it to a duration greater than the time taken by the Cilium Agent to boot up.
   692  
   693  Default value of ``RestartTime`` is 120 seconds. More details on graceful restart and
   694  ``RestartTime`` can be found in `RFC-4724`_ and `RFC-8538`_.
   695  
   696  .. _RFC-4724 : https://www.rfc-editor.org/rfc/rfc4724.html
   697  .. _RFC-8538 : https://www.rfc-editor.org/rfc/rfc8538.html
   698  
   699  Advertised Path Attributes
   700  ^^^^^^^^^^^^^^^^^^^^^^^^^^
   701  
   702  BGP advertisements can be extended with additional BGP Path Attributes - BGP Communities (`RFC-1997`_) or Local Preference.
   703  These Path Attributes can be configured selectively for each BGP peer and advertisement type.
   704  
   705  The following code block shows an example configuration of ``AdvertisedPathAttributes`` for a BGP neighbor,
   706  which adds a BGP community attribute with the value ``64512:100`` to all Service announcements from the
   707  matching ``CiliumLoadBalancerIPPool`` and sets the Local Preference value for all Pod CIDR announcements
   708  to the value ``150``:
   709  
   710  .. code-block:: yaml
   711  
   712     apiVersion: "cilium.io/v2alpha1"
   713     kind: CiliumBGPPeeringPolicy
   714     metadata:
   715       name: rack0
   716     spec:
   717       nodeSelector:
   718         matchLabels:
   719           rack: rack0
   720       virtualRouters:
   721       - localASN: 64512
   722         neighbors:
   723         - peerAddress: '10.0.0.1/32'
   724           peerASN: 64512
   725           advertisedPathAttributes:
   726           - selectorType: CiliumLoadBalancerIPPool # <-- select CiliumLoadBalancerIPPool and add BGP community 64512:100
   727             selector:
   728               matchLabels:
   729                 environment: production
   730             communities:
   731               standard:
   732               - 64512:100
   733           - selectorType: PodCIDR # <-- select PodCIDR and add local preference 150 and BGP community 64512:150
   734             localPreference: 150
   735             communities:
   736               standard:
   737               - 64512:150
   738  
   739  .. note::
   740    Note that Local Preference Path Attribute is sent only to ``iBGP`` peers (not to ``eBGP`` peers).
   741  
   742  Each ``AdvertisedPathAttributes`` configuration item consists of two parts:
   743  
   744   - ``SelectorType`` with ``Selector`` define which BGP advertisements will be extended with additional Path Attributes.
   745   - ``Communities`` and / or ``LocalPreference`` define the additional Path Attributes applied on the selected routes.
   746  
   747  There are three possible values of the ``SelectorType`` which define the object type on which the ``Selector`` applies:
   748  
   749   - ``PodCIDR``: matches ``CiliumNode`` custom resources
   750     (Path Attributes apply to routes announced for PodCIDRs of selected ``CiliumNode`` objects).
   751   - ``CiliumLoadBalancerIPPool``: matches ``CiliumLoadBalancerIPPool`` custom resources
   752     (Path Attributes apply to routes announced for selected ``CiliumLoadBalancerIPPool`` objects).
   753   - ``CiliumPodIPPool``: matches ``CiliumPodIPPool`` custom resources
   754     (Path Attributes apply to routes announced for allocated prefixes of selected ``CiliumPodIPPool`` objects).
   755  
   756  There are two types of additional Path Attributes that can be advertised with the routes: ``Communities`` and ``LocalPreference``.
   757  
   758  ``Communities`` defines a set of community values advertised in the supported BGP Communities Path Attributes.
   759  The values can be of three types:
   760  
   761   - ``Standard``: represents a value of the "standard" 32-bit BGP Communities Attribute (`RFC-1997`_)
   762     as a 4-byte decimal number or two 2-byte decimal numbers separated by a colon (e.g. ``64512:100``).
   763   - ``WellKnown``: represents a value of the "standard" 32-bit BGP Communities Attribute (`RFC-1997`_)
   764     as a well-known string alias to its numeric value. Allowed values and their mapping to the numeric values:
   765  
   766      =============================== ================= =================
   767      Well-Known Value                Hexadecimal Value 16-bit Pair Value
   768      ------------------------------- ----------------- -----------------
   769      ``internet``                    ``0x00000000``    ``0:0``
   770      ``planned-shut``                ``0xffff0000``    ``65535:0``
   771      ``accept-own``                  ``0xffff0001``    ``65535:1``
   772      ``route-filter-translated-v4``  ``0xffff0002``    ``65535:2``
   773      ``route-filter-v4``             ``0xffff0003``    ``65535:3``
   774      ``route-filter-translated-v6``  ``0xffff0004``    ``65535:4``
   775      ``route-filter-v6``             ``0xffff0005``    ``65535:5``
   776      ``llgr-stale``                  ``0xffff0006``    ``65535:6``
   777      ``no-llgr``                     ``0xffff0007``    ``65535:7``
   778      ``blackhole``                   ``0xffff029a``    ``65535:666``
   779      ``no-export``                   ``0xffffff01``    ``65535:65281``
   780      ``no-advertise``                ``0xffffff02``    ``65535:65282``
   781      ``no-export-subconfed``         ``0xffffff03``    ``65535:65283``
   782      ``no-peer``                     ``0xffffff04``    ``65535:65284``
   783      =============================== ================= =================
   784  
   785   - ``Large``: represents a value of the BGP Large Communities Attribute (`RFC-8092`_),
   786     as three 4-byte decimal numbers separated by colons (e.g. ``64512:100:50``).
   787  
   788  .. _RFC-1997 : https://www.rfc-editor.org/rfc/rfc1997.html
   789  .. _RFC-8092 : https://www.rfc-editor.org/rfc/rfc8092.html
   790  
   791  ``LocalPreference`` defines the preference value advertised in the BGP Local Preference Path Attribute.
   792  As Local Preference is only valid for ``iBGP`` peers, this value will be ignored for ``eBGP`` peers
   793  (no Local Preference Path Attribute will be advertised).
   794  
   795  Once configured, the additional Path Attributes advertised with the routes for a peer can be verified using the
   796  ``cilium bgp routes`` Cilium CLI command, for example:
   797  
   798  .. code-block:: shell-session
   799  
   800     $ cilium bgp routes advertised ipv4 unicast peer 10.0.0.1
   801  
   802     VRouter   Prefix               NextHop     Age     Attrs
   803     64512     10.1.0.0/24          10.0.0.2    3m31s   [{Origin: i} {LocalPref: 150} {Nexthop: 10.0.0.2}]
   804     64512     192.168.100.190/32   10.0.0.2    3m32s   [{Origin: i} {LocalPref: 100} {Communities: 64512:100} {Nexthop: 10.0.0.2}]
   805  
   806  .. _bgp-control-plane-address-families:
   807  
   808  Address Families
   809  ^^^^^^^^^^^^^^^^
   810  
   811  By default, the BGP Control Plane advertises IPv4 Unicast and IPv6 Unicast
   812  Multiprotocol Extensions Capability (`RFC-4760`_) as well as Graceful Restart
   813  address families (`RFC-4724`_) if enabled. If you wish to change the default
   814  behavior and advertise only specific address families, you can use the
   815  ``families`` field. The ``families`` field is a list of AFI (Address Family
   816  Identifier) and SAFI (Subsequent Address Family Identifier) pairs. The only
   817  options currently supported are ``{afi: ipv4, safi: unicast}`` and ``{afi:
   818  ipv6, safi: unicast}``.
   819  
   820  Following example shows how to advertise only IPv4 Unicast address family:
   821  
   822  .. _RFC-4760 : https://www.rfc-editor.org/rfc/rfc4760.html
   823  
   824  .. code-block:: yaml
   825  
   826     apiVersion: "cilium.io/v2alpha1"
   827     kind: CiliumBGPPeeringPolicy
   828     metadata:
   829       name: rack0
   830     spec:
   831       nodeSelector:
   832         matchLabels:
   833           rack: rack0
   834       virtualRouters:
   835       - localASN: 64512
   836         neighbors:
   837         - peerAddress: '10.0.0.1/32'
   838           peerASN: 64512
   839           families:
   840           - afi: ipv4
   841             safi: unicast