github.com/thanos-io/thanos@v0.32.5/docs/operating/cross-cluster-tls-communication.md (about) 1 # Configuring Thanos Secure TLS Cross-Cluster Communication 2 3 ###### *This guide was contributed by the community thanks to [gmintoco](https://github.com/gmintoco)* 4 5 With some scale in global view Thanos mode, without [Thanos Receive](../components/receive.md), you often have centralized clusters that require secure, TLS gRPC routes to remote clusters outside your network to access leaf Prometheus-es with sidecars. Common solutions like VPC peering and VPN might be complex to setup, expensive and not easy to manage. In this guide we will explain setting up server proxies to establish secure route for queries. 6 7 ## Scenario 8 9 Let's imagine we have an `Observer Cluster` that is hosting [Thanos Querier](../components/query.md) along with [Thanos Store Gateway](../components/store.md). In same cluster we also have one or more [Thanos Sidecars](../components/sidecar.md) that you would like to connect to within the cluster. 10 11 However let's say we also need to connect from the observer cluster's querier to several remote instances of Thanos Sidecar in remote clusters. (For example their NGINX Ingress, of which the configs below are based on). 12 13 Ideally, we want to use TLS to encrypt the connection to the remote clusters, but we don't want to use TLS within the cluster (to reduce no. of ingresses, pain of provisioning certificates etc.) We may also want to use client certificate authentication to these remote clusters for improved security (see envoy v3 example below). 14 15 In this scenario you need to use a proxy server. Further guidance below. 16 17 ## Proxy based communication using Envoy 18 19 Envoy can be implemented as a sidecar container (example shown here) within the Thanos Querier pod on the Observer Cluster. It will perform TLS origination to connect to secure remote sidecars while forwarding their communications unencrypted back, locally to Thanos Querier. 20 21 [Envoy](https://www.envoyproxy.io/) is a proxy server that has good HTTP2 and gRPC support and is relatively straightforward to configure for this purpose. 22 23 - Configure an envoy sidecar container to the Thanos Querier pod (unfortunately this also isn't supported by a lot of Thanos charts) an example pod config is below (see `deployment.yaml`) 24 - Make sure that the envoy sidecar has the correct certificates (using a mounted secret) and a valid configuration (using a mounted configmap) an example envoy config is below (`envoy.yaml`) 25 - Configure a service for the envoy sidecar an example service is shown below (`service.yaml`) you may have another service already for local cluster access (for Thanos Ruler or Grafana etc.) 26 - Point the querier at the service and the correct port an example `--store ` field is below (`thanos-querier args`) 27 - Make sure your remote cluster has TLS setup and an appropriate HTTP2 supported ingress, example below `ingress.yaml` 28 29 ### Observer Cluster: Querier with Envoy `deployment.yaml` 30 31 - `[port_name]` is the name of the port specified within the service (see `service.yaml`) 32 - `[service-name]` is the name of the envoy service 33 - `[namespace]` is the name of the envoy service namespace 34 35 You may need to change cluster.local depending on your cluster domain. The `--store` entries for thanos storegateway etc. may be named different in your setup 36 37 ```yaml 38 kind: Deployment 39 apiVersion: apps/v1 40 metadata: 41 name: thanos-global-test-querier 42 namespace: thanos-global 43 labels: 44 name: thanos-global-test-querier 45 replicas: 2 46 selector: 47 matchLabels: 48 name: thanos-global-test-querier 49 template: 50 metadata: 51 labels: 52 name: thanos-global-test-querier 53 spec: 54 volumes: 55 - name: config 56 configMap: 57 name: thanos-global-test-envoy-config 58 defaultMode: 420 59 optional: false 60 - name: certs 61 secret: 62 secretName: thanos-global-test-envoy-certs 63 defaultMode: 420 64 optional: false 65 containers: 66 - name: querier 67 image: 'thanosio/thanos:v0.17.2' 68 args: 69 - query 70 - '--log.level=info' 71 - '--grpc-address=0.0.0.0:10901' 72 - '--http-address=0.0.0.0:10902' 73 - '--query.replica-label=replica' 74 - >- 75 --endpoint=dnssrv+_grpc._tcp.thanos-global-test-storegateway.thanos-global.svc.cluster.local 76 - >- 77 --endpoint=dnssrv+_grpc._tcp.thanos-global-test-sidecar.thanos-global.svc.cluster.local 78 - >- 79 --endpoint=dnssrv+_grpc._tcp.thanos-global-test-ruler.thanos-global.svc.cluster.local 80 - >- 81 --endpoint=dnssrv+_[port_name]._tcp.[service-name].[namespace].svc.cluster.local 82 - >- 83 --endpoint=dnssrv+_[port_name_2]._tcp.[service-name].[namespace].svc.cluster.local 84 ports: 85 - name: http 86 containerPort: 10902 87 protocol: TCP 88 - name: grpc 89 containerPort: 10901 90 protocol: TCP 91 resources: {} 92 livenessProbe: 93 httpGet: 94 path: /-/healthy 95 port: http 96 scheme: HTTP 97 initialDelaySeconds: 30 98 timeoutSeconds: 1 99 periodSeconds: 30 100 successThreshold: 1 101 failureThreshold: 3 102 readinessProbe: 103 httpGet: 104 path: /-/ready 105 port: http 106 scheme: HTTP 107 initialDelaySeconds: 30 108 timeoutSeconds: 1 109 periodSeconds: 30 110 successThreshold: 1 111 failureThreshold: 3 112 terminationMessagePath: /dev/termination-log 113 terminationMessagePolicy: File 114 imagePullPolicy: IfNotPresent 115 securityContext: 116 privileged: false 117 runAsUser: 1001 118 runAsGroup: 0 119 runAsNonRoot: false 120 readOnlyRootFilesystem: false 121 allowPrivilegeEscalation: true 122 - name: envoy-sidecar 123 image: 'envoyproxy/envoy:v1.16.0' 124 args: 125 - '-c' 126 - /config/envoy.yaml 127 - '-l' 128 - debug 129 ports: 130 - name: [port_name] 131 containerPort: 10000 132 protocol: TCP 133 - name: [port_name_2] 134 containerPort: 10001 135 protocol: TCP 136 resources: {} 137 volumeMounts: 138 - name: config 139 mountPath: /config 140 mountPropagation: None 141 - name: certs 142 mountPath: /certs 143 mountPropagation: None 144 terminationMessagePath: /dev/termination-log 145 terminationMessagePolicy: File 146 imagePullPolicy: IfNotPresent 147 restartPolicy: Always 148 terminationGracePeriodSeconds: 30 149 dnsPolicy: ClusterFirst 150 serviceAccountName: thanos-global-test-querier-sa 151 serviceAccount: thanos-global-test-querier-sa 152 automountServiceAccountToken: false 153 shareProcessNamespace: false 154 securityContext: {} 155 schedulerName: default-scheduler 156 strategy: 157 type: RollingUpdate 158 rollingUpdate: 159 maxUnavailable: 25% 160 maxSurge: 25% 161 revisionHistoryLimit: 10 162 progressDeadlineSeconds: 600 163 ``` 164 165 ### Forward proxy Envoy configuration `envoy.yaml` 166 167 This is a static v2 envoy configuration (v3 example below). You will need to update this configuration for every sidecar you would like to talk to. There are also several options for dynamic configuration, like envoy XDS (and other associated dynamic config modes), or using something like terraform (if thats your deployment method) to generate the configs at deployment time. NOTE: This config **does not** send a client certificate to authenticate with remote clusters, see envoy v3 config. 168 169 ```yaml 170 admin: 171 access_log_path: /tmp/admin_access.log 172 address: 173 socket_address: { address: 0.0.0.0, port_value: 9901 } 174 175 static_resources: 176 listeners: 177 - name: sidecar_name 178 address: 179 socket_address: { address: 0.0.0.0, port_value: 10000 } 180 filter_chains: 181 - filters: 182 - name: envoy.http_connection_manager 183 config: 184 codec_type: auto 185 stat_prefix: ingress_http 186 route_config: 187 name: local_route 188 virtual_hosts: 189 - name: local_service 190 domains: ["*"] 191 routes: 192 - match: { prefix: "/" } 193 route: { cluster: sidecar_name, host_rewrite: thanos.sidecardomain.com } 194 http_filters: 195 - name: envoy.router 196 - name: sidecar_name_2 197 address: 198 socket_address: { address: 0.0.0.0, port_value: 10001 } 199 filter_chains: 200 - filters: 201 - name: envoy.http_connection_manager 202 config: 203 codec_type: auto 204 stat_prefix: ingress_http 205 route_config: 206 name: local_route 207 virtual_hosts: 208 - name: local_service 209 domains: ["*"] 210 routes: 211 - match: { prefix: "/" } 212 route: { cluster: sidecar_name_2, host_rewrite: thanos.sidecardomain.com } 213 http_filters: 214 - name: envoy.router 215 clusters: 216 - name: sidecar_name 217 connect_timeout: 30s 218 type: logical_dns 219 http2_protocol_options: {} 220 dns_lookup_family: V4_ONLY 221 lb_policy: round_robin 222 hosts: [{ socket_address: { address: thanos.sidecardomain.com, port_value: 443 }}] 223 tls_context: 224 common_tls_context: 225 validation_context: 226 trusted_ca: 227 filename: /certs/ca.crt 228 alpn_protocols: 229 - h2 230 - http/1.1 231 sni: thanos.sidecardomain.com 232 - name: sidecar_name_2 233 connect_timeout: 30s 234 type: logical_dns 235 http2_protocol_options: {} 236 dns_lookup_family: V4_ONLY 237 lb_policy: round_robin 238 hosts: [{ socket_address: { address: thanos-2.sidecardomain.com, port_value: 443 }}] 239 tls_context: 240 common_tls_context: 241 validation_context: 242 trusted_ca: 243 filename: /certs/ca.crt 244 alpn_protocols: 245 - h2 246 - http/1.1 247 sni: thanos-2.sidecardomain.com 248 ``` 249 250 ### `envoy.yaml` V3 API 251 252 This is an example envoy config using the v3 API. It does differ slightly to the above (more log formatting) but is essentially the same in functionality. This config **sends** a client certificate to authenticate with remote clusters (they must have the CA loaded in order to verify). This only implements a single port/listener, but adding more (ie. the v2 example has 2) is fairly trivial. Simple clone the `sidecar_name` listener and the `sidecar_name` cluster blocks. 253 254 ```yaml 255 admin: 256 access_log_path: /tmp/admin_access.log 257 address: 258 socket_address: { address: 0.0.0.0, port_value: 9901 } 259 260 static_resources: 261 listeners: 262 - name: sidecar_name 263 address: 264 socket_address: 265 address: 0.0.0.0 266 port_value: 10001 267 filter_chains: 268 - filters: 269 - name: envoy.http_connection_manager 270 typed_config: 271 "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager 272 codec_type: AUTO 273 access_log: 274 - name: envoy.access_loggers.file 275 typed_config: 276 "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog 277 path: /dev/stdout 278 log_format: 279 text_format: | 280 [%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%" 281 %RESPONSE_CODE% %RESPONSE_FLAGS% %RESPONSE_CODE_DETAILS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% 282 %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)% "%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%" 283 "%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%" "%UPSTREAM_HOST%" "%UPSTREAM_TRANSPORT_FAILURE_REASON%"\n 284 - name: envoy.access_loggers.file 285 typed_config: 286 "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog 287 path: /dev/stdout 288 stat_prefix: ingress_http 289 route_config: 290 name: local_route 291 virtual_hosts: 292 - name: local_service 293 domains: ["*"] 294 routes: 295 - match: 296 prefix: "/" 297 route: 298 cluster: sidecar_name 299 host_rewrite_literal: thanos.sidecardomain.com 300 http_filters: 301 - name: envoy.filters.http.router 302 clusters: 303 - name: sidecar_name 304 connect_timeout: 30s 305 type: LOGICAL_DNS 306 http2_protocol_options: {} 307 dns_lookup_family: V4_ONLY 308 lb_policy: ROUND_ROBIN 309 load_assignment: 310 cluster_name: sidecar_name 311 endpoints: 312 - lb_endpoints: 313 - endpoint: 314 address: 315 socket_address: 316 address: thanos.sidecardomain.com 317 port_value: 443 318 transport_socket: 319 name: envoy.transport_sockets.tls 320 typed_config: 321 "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext 322 common_tls_context: 323 tls_certificates: 324 - certificate_chain: 325 filename: /certs/tls.crt 326 private_key: 327 filename: /certs/tls.key 328 validation_context: 329 trusted_ca: 330 filename: /certs/cacerts.pem 331 alpn_protocols: 332 - h2 333 - http/1.1 334 sni: thanos.sidecardomain.com 335 336 ``` 337 338 ### Observer Cluster: Querier with Envoy `service.yaml` 339 340 This is the service for the envoy sidecar. You will need to define a new port for every sidecar you would like to add. 341 342 ```yaml 343 kind: Service 344 apiVersion: v1 345 metadata: 346 name: thanos-global-test-envoy 347 namespace: thanos-global 348 labels: 349 name: thanos-global-test-envoy 350 spec: 351 ports: 352 - name: [port_name] 353 protocol: TCP 354 port: 10000 355 targetPort: 10000 356 - name: [port_name_2] 357 protocol: TCP 358 port: 10001 359 targetPort: 10001 360 selector: 361 name: thanos-global-test-querier 362 type: ClusterIP 363 sessionAffinity: None 364 ``` 365 366 ### Client clusters: Sidecarc`ingress.yaml` 367 368 This is an example ingress for a remote sidecar using NGINX ingress. You must use TLS (port 443 - limitation from NGINX) as HTTP2 is only supported on a separate listener (see [here](https://github.com/kubernetes/ingress-nginx/issues/3938)) 369 370 You must have certs configured and the CA added into the envoy sidecar earlier to allow verification (if using client cert v3 envoy config) 371 372 ```yaml 373 kind: Ingress 374 apiVersion: extensions/v1beta1 375 metadata: 376 name: monitoring-rancher-monitor-thanos-gateway 377 namespace: monitoring 378 annotations: 379 nginx.ingress.kubernetes.io/backend-protocol: GRPC 380 nginx.ingress.kubernetes.io/force-ssl-redirect: 'true' 381 nginx.ingress.kubernetes.io/grpc-backend: 'true' 382 nginx.ingress.kubernetes.io/protocol: h2c 383 nginx.ingress.kubernetes.io/proxy-read-timeout: '160' 384 spec: 385 tls: 386 - hosts: 387 - thanos.sidecardomain.com 388 secretName: thanos-tls 389 rules: 390 - host: thanos.sidecardomain.com 391 http: 392 paths: 393 - path: / 394 backend: 395 serviceName: monitoring-rancher-monitor-prometheus 396 servicePort: 10901 397 ``` 398 399 ## Summary 400 401 This has outlined a scenario, potential solution and a collection of example configurations. After implementing a setup like this you can expect to be able to have a central Thanos instance that can access sidecars, store gateway's, receivers through standard unsecured gRPC etc. while simultaneously accessing resources (e.g StoreAPIs of sidecars etc.) located externally in a secure fashion using client-cert authentication and HTTPS/TLS encryption.