github.com/thanos-io/thanos@v0.32.5/docs/operating/cross-cluster-tls-communication.md (about)

     1  # Configuring Thanos Secure TLS Cross-Cluster Communication
     2  
     3  ###### *This guide was contributed by the community thanks to [gmintoco](https://github.com/gmintoco)*
     4  
     5  With some scale in global view Thanos mode, without [Thanos Receive](../components/receive.md), you often have centralized clusters that require secure, TLS gRPC routes to remote clusters outside your network to access leaf Prometheus-es with sidecars. Common solutions like VPC peering and VPN might be complex to setup, expensive and not easy to manage. In this guide we will explain setting up server proxies to establish secure route for queries.
     6  
     7  ## Scenario
     8  
     9  Let's imagine we have an `Observer Cluster` that is hosting [Thanos Querier](../components/query.md) along with [Thanos Store Gateway](../components/store.md). In same cluster we also have one or more [Thanos Sidecars](../components/sidecar.md) that you would like to connect to within the cluster.
    10  
    11  However let's say we also need to connect from the observer cluster's querier to several remote instances of Thanos Sidecar in remote clusters. (For example their NGINX Ingress, of which the configs below are based on).
    12  
    13  Ideally, we want to use TLS to encrypt the connection to the remote clusters, but we don't want to use TLS within the cluster (to reduce no. of ingresses, pain of provisioning certificates etc.) We may also want to use client certificate authentication to these remote clusters for improved security (see envoy v3 example below).
    14  
    15  In this scenario you need to use a proxy server. Further guidance below.
    16  
    17  ## Proxy based communication using Envoy
    18  
    19  Envoy can be implemented as a sidecar container (example shown here) within the Thanos Querier pod on the Observer Cluster. It will perform TLS origination to connect to secure remote sidecars while forwarding their communications unencrypted back, locally to Thanos Querier.
    20  
    21  [Envoy](https://www.envoyproxy.io/) is a proxy server that has good HTTP2 and gRPC support and is relatively straightforward to configure for this purpose.
    22  
    23  - Configure an envoy sidecar container to the Thanos Querier pod (unfortunately this also isn't supported by a lot of Thanos charts) an example pod config is below (see `deployment.yaml`)
    24  - Make sure that the envoy sidecar has the correct certificates (using a mounted secret) and a valid configuration (using a mounted configmap) an example envoy config is below (`envoy.yaml`)
    25  - Configure a service for the envoy sidecar an example service is shown below (`service.yaml`) you may have another service already for local cluster access (for Thanos Ruler or Grafana etc.)
    26  - Point the querier at the service and the correct port an example `--store ` field is below (`thanos-querier args`)
    27  - Make sure your remote cluster has TLS setup and an appropriate HTTP2 supported ingress, example below `ingress.yaml`
    28  
    29  ### Observer Cluster: Querier with Envoy `deployment.yaml`
    30  
    31  - `[port_name]` is the name of the port specified within the service (see `service.yaml`)
    32  - `[service-name]` is the name of the envoy service
    33  - `[namespace]` is the name of the envoy service namespace
    34  
    35  You may need to change cluster.local depending on your cluster domain. The `--store` entries for thanos storegateway etc. may be named different in your setup
    36  
    37  ```yaml
    38  kind: Deployment
    39  apiVersion: apps/v1
    40  metadata:
    41    name: thanos-global-test-querier
    42    namespace: thanos-global
    43    labels:
    44      name: thanos-global-test-querier
    45    replicas: 2
    46    selector:
    47      matchLabels:
    48        name: thanos-global-test-querier
    49    template:
    50      metadata:
    51        labels:
    52          name: thanos-global-test-querier
    53      spec:
    54        volumes:
    55          - name: config
    56            configMap:
    57              name: thanos-global-test-envoy-config
    58              defaultMode: 420
    59              optional: false
    60          - name: certs
    61            secret:
    62              secretName: thanos-global-test-envoy-certs
    63              defaultMode: 420
    64              optional: false
    65        containers:
    66          - name: querier
    67            image: 'thanosio/thanos:v0.17.2'
    68            args:
    69              - query
    70              - '--log.level=info'
    71              - '--grpc-address=0.0.0.0:10901'
    72              - '--http-address=0.0.0.0:10902'
    73              - '--query.replica-label=replica'
    74              - >-
    75                --endpoint=dnssrv+_grpc._tcp.thanos-global-test-storegateway.thanos-global.svc.cluster.local
    76              - >-
    77                --endpoint=dnssrv+_grpc._tcp.thanos-global-test-sidecar.thanos-global.svc.cluster.local
    78              - >-
    79                --endpoint=dnssrv+_grpc._tcp.thanos-global-test-ruler.thanos-global.svc.cluster.local
    80              - >-
    81                --endpoint=dnssrv+_[port_name]._tcp.[service-name].[namespace].svc.cluster.local
    82              - >-
    83                --endpoint=dnssrv+_[port_name_2]._tcp.[service-name].[namespace].svc.cluster.local
    84            ports:
    85              - name: http
    86                containerPort: 10902
    87                protocol: TCP
    88              - name: grpc
    89                containerPort: 10901
    90                protocol: TCP
    91            resources: {}
    92            livenessProbe:
    93              httpGet:
    94                path: /-/healthy
    95                port: http
    96                scheme: HTTP
    97              initialDelaySeconds: 30
    98              timeoutSeconds: 1
    99              periodSeconds: 30
   100              successThreshold: 1
   101              failureThreshold: 3
   102            readinessProbe:
   103              httpGet:
   104                path: /-/ready
   105                port: http
   106                scheme: HTTP
   107              initialDelaySeconds: 30
   108              timeoutSeconds: 1
   109              periodSeconds: 30
   110              successThreshold: 1
   111              failureThreshold: 3
   112            terminationMessagePath: /dev/termination-log
   113            terminationMessagePolicy: File
   114            imagePullPolicy: IfNotPresent
   115            securityContext:
   116              privileged: false
   117              runAsUser: 1001
   118              runAsGroup: 0
   119              runAsNonRoot: false
   120              readOnlyRootFilesystem: false
   121              allowPrivilegeEscalation: true
   122          - name: envoy-sidecar
   123            image: 'envoyproxy/envoy:v1.16.0'
   124            args:
   125              - '-c'
   126              - /config/envoy.yaml
   127              - '-l'
   128              - debug
   129            ports:
   130              - name: [port_name]
   131                containerPort: 10000
   132                protocol: TCP
   133              - name: [port_name_2]
   134                containerPort: 10001
   135                protocol: TCP
   136            resources: {}
   137            volumeMounts:
   138              - name: config
   139                mountPath: /config
   140                mountPropagation: None
   141              - name: certs
   142                mountPath: /certs
   143                mountPropagation: None
   144            terminationMessagePath: /dev/termination-log
   145            terminationMessagePolicy: File
   146            imagePullPolicy: IfNotPresent
   147        restartPolicy: Always
   148        terminationGracePeriodSeconds: 30
   149        dnsPolicy: ClusterFirst
   150        serviceAccountName: thanos-global-test-querier-sa
   151        serviceAccount: thanos-global-test-querier-sa
   152        automountServiceAccountToken: false
   153        shareProcessNamespace: false
   154        securityContext: {}
   155        schedulerName: default-scheduler
   156    strategy:
   157      type: RollingUpdate
   158      rollingUpdate:
   159        maxUnavailable: 25%
   160        maxSurge: 25%
   161    revisionHistoryLimit: 10
   162    progressDeadlineSeconds: 600
   163  ```
   164  
   165  ### Forward proxy Envoy configuration `envoy.yaml`
   166  
   167  This is a static v2 envoy configuration (v3 example below). You will need to update this configuration for every sidecar you would like to talk to. There are also several options for dynamic configuration, like envoy XDS (and other associated dynamic config modes), or using something like terraform (if thats your deployment method) to generate the configs at deployment time. NOTE: This config **does not** send a client certificate to authenticate with remote clusters, see envoy v3 config.
   168  
   169  ```yaml
   170  admin:
   171    access_log_path: /tmp/admin_access.log
   172    address:
   173      socket_address: { address: 0.0.0.0, port_value: 9901 }
   174  ​
   175  static_resources:
   176    listeners:
   177    - name: sidecar_name
   178      address:
   179        socket_address: { address: 0.0.0.0, port_value: 10000 }
   180      filter_chains:
   181      - filters:
   182        - name: envoy.http_connection_manager
   183          config:
   184            codec_type: auto
   185            stat_prefix: ingress_http
   186            route_config:
   187              name: local_route
   188              virtual_hosts:
   189              - name: local_service
   190                domains: ["*"]
   191                routes:
   192                - match: { prefix: "/" }
   193                  route: { cluster: sidecar_name, host_rewrite: thanos.sidecardomain.com }
   194            http_filters:
   195            - name: envoy.router
   196    - name: sidecar_name_2
   197      address:
   198        socket_address: { address: 0.0.0.0, port_value: 10001 }
   199      filter_chains:
   200      - filters:
   201        - name: envoy.http_connection_manager
   202          config:
   203            codec_type: auto
   204            stat_prefix: ingress_http
   205            route_config:
   206              name: local_route
   207              virtual_hosts:
   208              - name: local_service
   209                domains: ["*"]
   210                routes:
   211                - match: { prefix: "/" }
   212                  route: { cluster: sidecar_name_2, host_rewrite: thanos.sidecardomain.com }
   213            http_filters:
   214            - name: envoy.router
   215    clusters:
   216    - name: sidecar_name
   217      connect_timeout: 30s
   218      type: logical_dns
   219      http2_protocol_options: {}
   220      dns_lookup_family: V4_ONLY
   221      lb_policy: round_robin
   222      hosts: [{ socket_address: { address: thanos.sidecardomain.com, port_value: 443 }}]
   223      tls_context:
   224        common_tls_context:
   225          validation_context:
   226            trusted_ca:
   227              filename: /certs/ca.crt
   228          alpn_protocols:
   229          - h2
   230          - http/1.1
   231        sni: thanos.sidecardomain.com
   232    - name: sidecar_name_2
   233      connect_timeout: 30s
   234      type: logical_dns
   235      http2_protocol_options: {}
   236      dns_lookup_family: V4_ONLY
   237      lb_policy: round_robin
   238      hosts: [{ socket_address: { address: thanos-2.sidecardomain.com, port_value: 443 }}]
   239      tls_context:
   240        common_tls_context:
   241          validation_context:
   242            trusted_ca:
   243              filename: /certs/ca.crt
   244          alpn_protocols:
   245          - h2
   246          - http/1.1
   247        sni: thanos-2.sidecardomain.com
   248  ```
   249  
   250  ### `envoy.yaml` V3 API
   251  
   252  This is an example envoy config using the v3 API. It does differ slightly to the above (more log formatting) but is essentially the same in functionality. This config **sends** a client certificate to authenticate with remote clusters (they must have the CA loaded in order to verify). This only implements a single port/listener, but adding more (ie. the v2 example has 2) is fairly trivial. Simple clone the `sidecar_name` listener and the `sidecar_name` cluster blocks.
   253  
   254  ```yaml
   255  admin:
   256    access_log_path: /tmp/admin_access.log
   257    address:
   258      socket_address: { address: 0.0.0.0, port_value: 9901 }
   259  
   260  static_resources:
   261    listeners:
   262    - name: sidecar_name
   263      address:
   264        socket_address:
   265          address: 0.0.0.0
   266          port_value: 10001
   267      filter_chains:
   268      - filters:
   269        - name: envoy.http_connection_manager
   270          typed_config:
   271            "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
   272            codec_type: AUTO
   273            access_log:
   274            - name: envoy.access_loggers.file
   275              typed_config:
   276                "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
   277                path: /dev/stdout
   278                log_format:
   279                  text_format: |
   280                    [%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%"
   281                    %RESPONSE_CODE% %RESPONSE_FLAGS% %RESPONSE_CODE_DETAILS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION%
   282                    %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)% "%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%"
   283                    "%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%" "%UPSTREAM_HOST%" "%UPSTREAM_TRANSPORT_FAILURE_REASON%"\n
   284            - name: envoy.access_loggers.file
   285              typed_config:
   286                "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
   287                path: /dev/stdout
   288            stat_prefix: ingress_http
   289            route_config:
   290              name: local_route
   291              virtual_hosts:
   292              - name: local_service
   293                domains: ["*"]
   294                routes:
   295                - match:
   296                    prefix: "/"
   297                  route:
   298                    cluster: sidecar_name
   299                    host_rewrite_literal: thanos.sidecardomain.com
   300            http_filters:
   301            - name: envoy.filters.http.router
   302    clusters:
   303    - name: sidecar_name
   304      connect_timeout: 30s
   305      type: LOGICAL_DNS
   306      http2_protocol_options: {}
   307      dns_lookup_family: V4_ONLY
   308      lb_policy: ROUND_ROBIN
   309      load_assignment:
   310        cluster_name: sidecar_name
   311        endpoints:
   312          - lb_endpoints:
   313            - endpoint:
   314                address:
   315                  socket_address:
   316                    address: thanos.sidecardomain.com
   317                    port_value: 443
   318      transport_socket:
   319        name: envoy.transport_sockets.tls
   320        typed_config:
   321          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
   322          common_tls_context:
   323            tls_certificates:
   324              - certificate_chain:
   325                  filename: /certs/tls.crt
   326                private_key:
   327                  filename: /certs/tls.key
   328            validation_context:
   329              trusted_ca:
   330                filename: /certs/cacerts.pem
   331            alpn_protocols:
   332            - h2
   333            - http/1.1
   334          sni: thanos.sidecardomain.com
   335  
   336  ```
   337  
   338  ### Observer Cluster: Querier with Envoy `service.yaml`
   339  
   340  This is the service for the envoy sidecar. You will need to define a new port for every sidecar you would like to add.
   341  
   342  ```yaml
   343  kind: Service
   344  apiVersion: v1
   345  metadata:
   346    name: thanos-global-test-envoy
   347    namespace: thanos-global
   348    labels:
   349      name: thanos-global-test-envoy
   350  spec:
   351    ports:
   352      - name: [port_name]
   353        protocol: TCP
   354        port: 10000
   355        targetPort: 10000
   356      - name: [port_name_2]
   357        protocol: TCP
   358        port: 10001
   359        targetPort: 10001
   360    selector:
   361      name: thanos-global-test-querier
   362    type: ClusterIP
   363    sessionAffinity: None
   364  ```
   365  
   366  ### Client clusters: Sidecarc`ingress.yaml`
   367  
   368  This is an example ingress for a remote sidecar using NGINX ingress. You must use TLS (port 443 - limitation from NGINX) as HTTP2 is only supported on a separate listener (see [here](https://github.com/kubernetes/ingress-nginx/issues/3938))
   369  
   370  You must have certs configured and the CA added into the envoy sidecar earlier to allow verification (if using client cert v3 envoy config)
   371  
   372  ```yaml
   373  kind: Ingress
   374  apiVersion: extensions/v1beta1
   375  metadata:
   376    name: monitoring-rancher-monitor-thanos-gateway
   377    namespace: monitoring
   378    annotations:
   379      nginx.ingress.kubernetes.io/backend-protocol: GRPC
   380      nginx.ingress.kubernetes.io/force-ssl-redirect: 'true'
   381      nginx.ingress.kubernetes.io/grpc-backend: 'true'
   382      nginx.ingress.kubernetes.io/protocol: h2c
   383      nginx.ingress.kubernetes.io/proxy-read-timeout: '160'
   384  spec:
   385    tls:
   386      - hosts:
   387          - thanos.sidecardomain.com
   388        secretName: thanos-tls
   389    rules:
   390      - host: thanos.sidecardomain.com
   391        http:
   392          paths:
   393            - path: /
   394              backend:
   395                serviceName: monitoring-rancher-monitor-prometheus
   396                servicePort: 10901
   397  ```
   398  
   399  ## Summary
   400  
   401  This has outlined a scenario, potential solution and a collection of example configurations. After implementing a setup like this you can expect to be able to have a central Thanos instance that can access sidecars, store gateway's, receivers through standard unsecured gRPC etc. while simultaneously accessing resources (e.g StoreAPIs of sidecars etc.) located externally in a secure fashion using client-cert authentication and HTTPS/TLS encryption.