github.com/thanos-io/thanos@v0.32.5/docs/storage.md (about) 1 # Object Storage & Data Format 2 3 Thanos uses object storage as primary storage for metrics and metadata related to them. In this document you can learn how to configure your object storage and what is the data layout and format for primary Thanos components that are "block" aware, like: `sidecar` `compact`, `receive` and `store gateway`. 4 5 ## Configuring Access to Object Storage 6 7 Thanos supports any object stores that can be implemented against Thanos [objstore.Bucket interface](https://github.com/thanos-io/objstore/blob/main/objstore.go). 8 9 All clients can be configured using `--objstore.config-file` to reference to the configuration file or `--objstore.config` to put yaml config directly. 10 11 ### How to use our special `config` flags? 12 13 **You can either pass YAML file defined below in `--objstore.config-file` or pass the YAML content directly using `--objstore.config`** We recommend the latter as it gives an explicit static view of configuration for each component. It also saves you the fuss of creating and managing additional file. 14 15 Don't be afraid of multiline flags! 16 17 In Kubernetes it is as easy as (on Thanos sidecar example): 18 19 ```yaml 20 - args: 21 - sidecar 22 - | 23 --objstore.config=type: GCS 24 config: 25 bucket: <bucket> 26 - --prometheus.url=http://localhost:9090 27 - | 28 --tracing.config=type: STACKDRIVER 29 config: 30 service_name: "" 31 project_id: <project> 32 sample_factor: 16 33 - --tsdb.path=/prometheus-data 34 ``` 35 36 ### Supported Clients 37 38 Current object storage client implementations: 39 40 | Provider | Maturity | Aimed For | Auto-tested on CI | Maintainers | 41 |-------------------------------------------------------------------------------------------|--------------------|-----------------------|-------------------|----------------------------------| 42 | [Google Cloud Storage](#gcs) | Stable | Production Usage | yes | @bwplotka | 43 | [AWS/S3](#s3) (and all S3-compatible storages e.g disk-based [Minio](https://min.io/)) | Stable | Production Usage | yes | @bwplotka | 44 | [Azure Storage Account](#azure) | Stable | Production Usage | no | @vglafirov | 45 | [OpenStack Swift](#openstack-swift) | Beta (working PoC) | Production Usage | yes | @FUSAKLA | 46 | [Tencent COS](#tencent-cos) | Beta | Production Usage | no | @jojohappy,@hanjm | 47 | [AliYun OSS](#aliyun-oss) | Beta | Production Usage | no | @shaulboozhiao,@wujinhu | 48 | [Local Filesystem](#filesystem) | Stable | Testing and Demo only | yes | @bwplotka | 49 | [Oracle Cloud Infrastructure Object Storage](#oracle-cloud-infrastructure-object-storage) | Beta | Production Usage | yes | @aarontams,@gaurav-05,@ericrrath | 50 51 **Missing support to some object storage?** Check out [how to add your client section](#how-to-add-a-new-client-to-thanos) 52 53 NOTE: Currently Thanos requires strong consistency (write-read) for object store implementation for singleton Compaction purposes. 54 55 #### S3 56 57 Thanos uses the [minio client](https://github.com/minio/minio-go) library to upload Prometheus data into AWS S3. 58 59 You can configure an S3 bucket as an object store with YAML, either by passing the configuration directly to the `--objstore.config` parameter, or (preferably) by passing the path to a configuration file to the `--objstore.config-file` option. 60 61 NOTE: Minio client was mainly for AWS S3, but it can be configured against other S3-compatible object storages e.g Ceph 62 63 ```yaml mdox-exec="go run scripts/cfggen/main.go --name=s3.Config" 64 type: S3 65 config: 66 bucket: "" 67 endpoint: "" 68 region: "" 69 aws_sdk_auth: false 70 access_key: "" 71 insecure: false 72 signature_version2: false 73 secret_key: "" 74 session_token: "" 75 put_user_metadata: {} 76 http_config: 77 idle_conn_timeout: 1m30s 78 response_header_timeout: 2m 79 insecure_skip_verify: false 80 tls_handshake_timeout: 10s 81 expect_continue_timeout: 1s 82 max_idle_conns: 100 83 max_idle_conns_per_host: 100 84 max_conns_per_host: 0 85 tls_config: 86 ca_file: "" 87 cert_file: "" 88 key_file: "" 89 server_name: "" 90 insecure_skip_verify: false 91 disable_compression: false 92 trace: 93 enable: false 94 list_objects_version: "" 95 bucket_lookup_type: auto 96 part_size: 67108864 97 sse_config: 98 type: "" 99 kms_key_id: "" 100 kms_encryption_context: {} 101 encryption_key: "" 102 sts_endpoint: "" 103 prefix: "" 104 ``` 105 106 At a minimum, you will need to provide a value for the `bucket`, `endpoint`, `access_key`, and `secret_key` keys. The rest of the keys are optional. 107 108 However if you set `aws_sdk_auth: true` Thanos will use the default authentication methods of the AWS SDK for go based on [known environment variables](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html) (`AWS_PROFILE`, `AWS_WEB_IDENTITY_TOKEN_FILE` ... etc) and known AWS config files (~/.aws/config). If you turn this on, then the `bucket` and `endpoint` are the required config keys. 109 110 The field `prefix` can be used to transparently use prefixes in your S3 bucket. This allows you to separate blocks coming from different sources into paths with different prefixes, making it easier to understand what's going on (i.e. you don't have to use Thanos tooling to know from where which blocks came). 111 112 The AWS region to endpoint mapping can be found in this [link](https://docs.aws.amazon.com/general/latest/gr/s3.html). 113 114 Make sure you use a correct signature version. Currently AWS requires signature v4, so it needs `signature_version2: false`. If you don't specify it, you will get an `Access Denied` error. On the other hand, several S3 compatible APIs use `signature_version2: true`. 115 116 You can configure the timeout settings for the HTTP client by setting the `http_config.idle_conn_timeout` and `http_config.response_header_timeout` keys. As a rule of thumb, if you are seeing errors like `timeout awaiting response headers` in your logs, you may want to increase the value of `http_config.response_header_timeout`. 117 118 Please refer to the documentation of [the Transport type](https://golang.org/pkg/net/http/#Transport) in the `net/http` package for detailed information on what each option does. 119 120 `part_size` is specified in bytes and refers to the minimum file size used for multipart uploads, as some custom S3 implementations may have different requirements. A value of `0` means to use a default 128 MiB size. 121 122 Set `list_objects_version: "v1"` for S3 compatible APIs that don't support ListObjectsV2 (e.g. some versions of Ceph). Default value (`""`) is equivalent to `"v2"`. 123 124 `http_config.tls_config` allows configuring TLS connections. Please refer to the document of [tls_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config) for detailed information on what each option does. 125 126 `bucket_lookup_type` can be `auto`, `virtual-hosted` or `path`. Read more about it [here](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html). 127 128 For debug and testing purposes you can set 129 130 * `insecure: true` to switch to plain insecure HTTP instead of HTTPS 131 132 * `http_config.insecure_skip_verify: true` to disable TLS certificate verification (if your S3 based storage is using a self-signed certificate, for example) 133 134 * `trace.enable: true` to enable the minio client's verbose logging. Each request and response will be logged into the debug logger, so debug level logging must be enabled for this functionality. 135 136 ##### S3 Server-Side Encryption 137 138 SSE can be configued using the `sse_config`. [SSE-S3](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html), [SSE-KMS](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html), and [SSE-C](https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys.html) are supported. 139 140 * If type is set to `SSE-S3` you do not need to configure other options. 141 142 * If type is set to `SSE-KMS` you must set `kms_key_id`. The `kms_encryption_context` is optional, as [AWS provides a default encryption context](https://docs.aws.amazon.com/kms/latest/developerguide/services-s3.html#s3-encryption-context). 143 144 * If type is set to `SSE-C` you must provide a path to the encryption key using `encryption_key`. 145 146 If the SSE Config block is set but the `type` is not one of `SSE-S3`, `SSE-KMS`, or `SSE-C`, an error is raised. 147 148 You will also need to apply the following AWS IAM policy for the user to access the KMS key: 149 150 ```json 151 { 152 "Version": "2012-10-17", 153 "Statement": [ 154 { 155 "Sid": "KMSAccess", 156 "Effect": "Allow", 157 "Action": [ 158 "kms:GenerateDataKey", 159 "kms:Encrypt", 160 "kms:Decrypt" 161 ], 162 "Resource": "arn:aws:kms:<region>:<account>:key/<KMS key id>" 163 } 164 ] 165 } 166 ``` 167 168 ##### Credentials 169 170 By default Thanos will try to retrieve credentials from the following sources: 171 172 1. From config file if BOTH `access_key` and `secret_key` are present. 173 2. From the standard AWS environment variable - `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` 174 3. From `~/.aws/credentials` 175 4. IAM credentials retrieved from an instance profile. 176 177 NOTE: Getting access key from config file and secret key from other method (and vice versa) is not supported. 178 179 ##### AWS Policies 180 181 Example working AWS IAM policy for user: 182 183 * For deployment (policy for Thanos services): 184 185 ```json 186 { 187 "Version": "2012-10-17", 188 "Statement": [ 189 { 190 "Sid": "Statement", 191 "Effect": "Allow", 192 "Action": [ 193 "s3:ListBucket", 194 "s3:GetObject", 195 "s3:DeleteObject", 196 "s3:PutObject" 197 ], 198 "Resource": [ 199 "arn:aws:s3:::<bucket>/*", 200 "arn:aws:s3:::<bucket>" 201 ] 202 } 203 ] 204 } 205 ``` 206 207 (No bucket policy) 208 209 To test the policy, set env vars for S3 access for *empty, not used* bucket as well as: 210 211 ``` 212 THANOS_TEST_OBJSTORE_SKIP=GCS,AZURE,SWIFT,COS,ALIYUNOSS,BOS,OCI,OBS 213 THANOS_ALLOW_EXISTING_BUCKET_USE=true 214 ``` 215 216 And run: `GOCACHE=off go test -v -run TestObjStore_AcceptanceTest_e2e ./pkg/...` 217 218 * For testing (policy to run e2e tests): 219 220 We need access to CreateBucket and DeleteBucket and access to all buckets: 221 222 ```json 223 { 224 "Version": "2012-10-17", 225 "Statement": [ 226 { 227 "Sid": "Statement", 228 "Effect": "Allow", 229 "Action": [ 230 "s3:ListBucket", 231 "s3:GetObject", 232 "s3:DeleteObject", 233 "s3:PutObject", 234 "s3:CreateBucket", 235 "s3:DeleteBucket" 236 ], 237 "Resource": [ 238 "arn:aws:s3:::<bucket>/*", 239 "arn:aws:s3:::<bucket>" 240 ] 241 } 242 ] 243 } 244 ``` 245 246 With this policy you should be able to run set `THANOS_TEST_OBJSTORE_SKIP=GCS,AZURE,SWIFT,COS,ALIYUNOSS,BOS,OCI,OBS` and unset `S3_BUCKET` and run all tests using `make test`. 247 248 Details about AWS policies: https://docs.aws.amazon.com/AmazonS3/latest/dev/using-with-s3-actions.html 249 250 ##### STS Endpoint 251 252 If you want to use IAM credential retrieved from an instance profile, Thanos needs to authenticate through AWS STS. For this purposes you can specify your own STS Endpoint. 253 254 By default Thanos will use endpoint: https://sts.amazonaws.com and AWS region corresponding endpoints. 255 256 ##### S3 Storage Class 257 258 By default, the `STANDARD` S3 storage class will be used. To specify a storage class, add it to the `put_user_metadata` section of the config file. 259 260 For example, the config file below specifies storage class of `STANDARD_IA`. 261 262 ```yaml 263 type: S3 264 prefix: thanos-test-standard-ia 265 config: 266 endpoint: s3.us-east-1.amazonaws.com 267 region: us-east-1 268 bucket: MY_BUCKET 269 put_user_metadata: 270 X-Amz-Storage-Class: STANDARD_IA 271 trace: 272 enable: true 273 ``` 274 275 #### GCS 276 277 To configure Google Cloud Storage bucket as an object store you need to set `bucket` with GCS bucket name and configure Google Application credentials. 278 279 For example: 280 281 ```yaml mdox-exec="go run scripts/cfggen/main.go --name=gcs.Config" 282 type: GCS 283 config: 284 bucket: "" 285 service_account: "" 286 prefix: "" 287 ``` 288 289 ##### Using GOOGLE_APPLICATION_CREDENTIALS 290 291 Application credentials are configured via JSON file and only the bucket needs to be specified, the client looks for: 292 293 1. A JSON file whose path is specified by the `GOOGLE_APPLICATION_CREDENTIALS` environment variable. 294 2. A JSON file in a location known to the gcloud command-line tool. On Windows, this is `%APPDATA%/gcloud/application_default_credentials.json`. On other systems, `$HOME/.config/gcloud/application_default_credentials.json`. 295 3. On Google App Engine it uses the `appengine.AccessToken` function. 296 4. On Google Compute Engine and Google App Engine Managed VMs, it fetches credentials from the metadata server. (In this final case any provided scopes are ignored.) 297 298 You can read more on how to get application credential json file in [https://cloud.google.com/docs/authentication/production](https://cloud.google.com/docs/authentication/production) 299 300 ##### Using inline a Service Account 301 302 Another possibility is to inline the ServiceAccount into the Thanos configuration and only maintain one file. This feature was added, so that the Prometheus Operator only needs to take care of one secret file. 303 304 ```yaml 305 type: GCS 306 config: 307 bucket: "thanos" 308 service_account: |- 309 { 310 "type": "service_account", 311 "project_id": "project", 312 "private_key_id": "abcdefghijklmnopqrstuvwxyz12345678906666", 313 "private_key": "-----BEGIN PRIVATE KEY-----\...\n-----END PRIVATE KEY-----\n", 314 "client_email": "project@thanos.iam.gserviceaccount.com", 315 "client_id": "123456789012345678901", 316 "auth_uri": "https://accounts.google.com/o/oauth2/auth", 317 "token_uri": "https://oauth2.googleapis.com/token", 318 "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", 319 "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/thanos%40gitpods.iam.gserviceaccount.com" 320 } 321 ``` 322 323 ##### GCS Policies 324 325 **Note:** GCS Policies should be applied at the project level, not at the bucket level 326 327 For deployment: 328 329 `Storage Object Creator` and `Storage Object Viewer` 330 331 For testing: 332 333 `Storage Object Admin` for ability to create and delete temporary buckets. 334 335 To test the policy is working as expected, exec into the sidecar container, eg: 336 337 ```sh 338 kubectl exec -it -n <namespace> <prometheus with sidecar pod name> -c <sidecar container name> -- /bin/sh 339 ``` 340 341 Then test that you can at least list objects in the bucket, eg: 342 343 ```sh 344 thanos tools bucket ls --objstore.config="${OBJSTORE_CONFIG}" 345 ``` 346 347 #### Azure 348 349 To use Azure Storage as Thanos object store, you need to precreate storage account from Azure portal or using Azure CLI. Follow the instructions from Azure Storage Documentation: [https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account](https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=portal) 350 351 To configure Azure Storage account as an object store you need to provide a path to Azure storage config file in flag `--objstore.config-file`. 352 353 Config file format is the following: 354 355 ```yaml mdox-exec="go run scripts/cfggen/main.go --name=azure.Config" 356 type: AZURE 357 config: 358 storage_account: "" 359 storage_account_key: "" 360 storage_connection_string: "" 361 container: "" 362 endpoint: "" 363 user_assigned_id: "" 364 max_retries: 0 365 reader_config: 366 max_retry_requests: 0 367 pipeline_config: 368 max_tries: 0 369 try_timeout: 0s 370 retry_delay: 0s 371 max_retry_delay: 0s 372 http_config: 373 idle_conn_timeout: 0s 374 response_header_timeout: 0s 375 insecure_skip_verify: false 376 tls_handshake_timeout: 0s 377 expect_continue_timeout: 0s 378 max_idle_conns: 0 379 max_idle_conns_per_host: 0 380 max_conns_per_host: 0 381 tls_config: 382 ca_file: "" 383 cert_file: "" 384 key_file: "" 385 server_name: "" 386 insecure_skip_verify: false 387 disable_compression: false 388 msi_resource: "" 389 prefix: "" 390 ``` 391 392 If `msi_resource` is used, authentication is done via system-assigned managed identity. The value for Azure should be `https://<storage-account-name>.blob.core.windows.net`. 393 394 If `user_assigned_id` is used, authentication is done via user-assigned managed identity. When using `user_assigned_id` the `msi_resource` defaults to `https://<storage_account>.<endpoint>` 395 396 The generic `max_retries` will be used as value for the `pipeline_config`'s `max_tries` and `reader_config`'s `max_retry_requests`. For more control, `max_retries` could be ignored (0) and one could set specific retry values. 397 398 #### OpenStack Swift 399 400 Thanos uses [ncw/swift](https://github.com/ncw/swift) client to upload Prometheus data into [OpenStack Swift](https://docs.openstack.org/swift/latest/). 401 402 Below is an example configuration file for thanos to use OpenStack swift container as an object store. Note that if the `name` of a user, project or tenant is used one must also specify its domain by ID or name. Various examples for OpenStack authentication can be found in the [official documentation](https://developer.openstack.org/api-ref/identity/v3/index.html?expanded=password-authentication-with-scoped-authorization-detail#password-authentication-with-unscoped-authorization). 403 404 By default, OpenStack Swift has a limit for maximum file size of 5 GiB. Thanos index files are often larger than that. To resolve this issue, Thanos uses [Static Large Objects (SLO)](https://docs.openstack.org/swift/latest/overview_large_objects.html) which are uploaded as segments. These are by default put into the `segments` directory of the same container. The default limit for using SLO is 1 GiB which is also the maximum size of the segment. If you don't want to use the same container for the segments (best practise is to use `<container_name>_segments` to avoid polluting listing of the container objects) you can use the `large_file_segments_container_name` option to override the default and put the segments to other container. *In rare cases you can switch to [Dynamic Large Objects (DLO)](https://docs.openstack.org/swift/latest/overview_large_objects.html) by setting the `use_dynamic_large_objects` to true, but use it with caution since it even more relies on eventual consistency.* 405 406 ```yaml mdox-exec="go run scripts/cfggen/main.go --name=swift.Config" 407 type: SWIFT 408 config: 409 auth_version: 0 410 auth_url: "" 411 username: "" 412 user_domain_name: "" 413 user_domain_id: "" 414 user_id: "" 415 password: "" 416 domain_id: "" 417 domain_name: "" 418 application_credential_id: "" 419 application_credential_name: "" 420 application_credential_secret: "" 421 project_id: "" 422 project_name: "" 423 project_domain_id: "" 424 project_domain_name: "" 425 region_name: "" 426 container_name: "" 427 large_object_chunk_size: 1073741824 428 large_object_segments_container_name: "" 429 retries: 3 430 connect_timeout: 10s 431 timeout: 5m 432 use_dynamic_large_objects: false 433 prefix: "" 434 ``` 435 436 #### Tencent COS 437 438 To use Tencent COS as storage store, you should apply a Tencent Account to create an object storage bucket at first. Note that detailed from Tencent Cloud Documents: [https://cloud.tencent.com/document/product/436](https://cloud.tencent.com/document/product/436) 439 440 To configure Tencent Account to use COS as storage store you need to set these parameters in yaml format stored in a file: 441 442 ```yaml mdox-exec="go run scripts/cfggen/main.go --name=cos.Config" 443 type: COS 444 config: 445 bucket: "" 446 region: "" 447 app_id: "" 448 endpoint: "" 449 secret_key: "" 450 secret_id: "" 451 http_config: 452 idle_conn_timeout: 1m30s 453 response_header_timeout: 2m 454 insecure_skip_verify: false 455 tls_handshake_timeout: 10s 456 expect_continue_timeout: 1s 457 max_idle_conns: 100 458 max_idle_conns_per_host: 100 459 max_conns_per_host: 0 460 tls_config: 461 ca_file: "" 462 cert_file: "" 463 key_file: "" 464 server_name: "" 465 insecure_skip_verify: false 466 disable_compression: false 467 prefix: "" 468 ``` 469 470 The `secret_key` and `secret_id` field is required. The `http_config` field is optional for optimize HTTP transport settings. There are two ways to configure the required bucket information: 471 1. Provide the values of `bucket`, `region` and `app_id` keys. 472 2. Provide the values of `endpoint` key with url format when you want to specify vpc internal endpoint. Please refer to the document of [endpoint](https://intl.cloud.tencent.com/document/product/436/6224) for more detail. 473 474 Set the flags `--objstore.config-file` to reference to the configuration file. 475 476 #### AliYun OSS 477 478 In order to use AliYun OSS object storage, you should first create a bucket with proper Storage Class , ACLs and get the access key on the AliYun cloud. Go to [https://www.alibabacloud.com/product/oss](https://www.alibabacloud.com/product/oss) for more detail. 479 480 To use AliYun OSS object storage, please specify following yaml configuration file in `objstore.config*` flag. 481 482 ```yaml mdox-exec="go run scripts/cfggen/main.go --name=oss.Config" 483 type: ALIYUNOSS 484 config: 485 endpoint: "" 486 bucket: "" 487 access_key_id: "" 488 access_key_secret: "" 489 prefix: "" 490 ``` 491 492 Use --objstore.config-file to reference to this configuration file. 493 494 #### Baidu BOS 495 496 In order to use Baidu BOS object storage, you should apply for a Baidu Account and create an object storage bucket first. Refer to [Baidu Cloud Documents](https://cloud.baidu.com/doc/BOS/index.html) for more details. To use Baidu BOS object storage, please specify the following yaml configuration file in `--objstore.config*` flag. 497 498 ```yaml mdox-exec="go run scripts/cfggen/main.go --name=bos.Config" 499 type: BOS 500 config: 501 bucket: "" 502 endpoint: "" 503 access_key: "" 504 secret_key: "" 505 prefix: "" 506 ``` 507 508 #### Filesystem 509 510 This storage type is used when user wants to store and access the bucket in the local filesystem. We treat filesystem the same way we would treat object storage, so all optimization for remote bucket applies even though, we might have the files locally. 511 512 NOTE: This storage type is experimental and might be inefficient. It is NOT advised to use it as the main storage for metrics in production environment. Particularly there is no planned support for distributed filesystems like NFS. This is mainly useful for testing and demos. 513 514 ```yaml mdox-exec="go run scripts/cfggen/main.go --name=filesystem.Config" 515 type: FILESYSTEM 516 config: 517 directory: "" 518 prefix: "" 519 ``` 520 521 #### Oracle Cloud Infrastructure Object Storage 522 523 To configure Oracle Cloud Infrastructure (OCI) Object Storage as Thanos Object Store, you need to provide appropriate authentication credentials to your OCI tenancy. The OCI object storage client implementation for Thanos supports either the default keypair or instance principal authentication. 524 525 ##### API Signing Key 526 527 The default API signing key authentication provider leverages same [configuration as the OCI CLI](https://docs.oracle.com/en-us/iaas/Content/API/Concepts/cliconcepts.htm) which is usually stored in at `$HOME/.oci/config` or via variable names starting with the string `OCI_CLI`. If the same configuration is found in multiple places the provider will prefer the first one. 528 529 The following example configures the provider to look for an existing API signing key for authentication: 530 531 ```yaml 532 type: OCI 533 config: 534 provider: "default" 535 bucket: "" 536 compartment_ocid: "" 537 part_size: "" // Optional part size to override the OCI default of 128 MiB, value is in bytes. 538 max_request_retries: "" // Optional maximum number of retries for a request. 539 request_retry_interval: "" // Optional sleep duration in seconds between retry requests. 540 http_config: 541 idle_conn_timeout: 1m30s // Optional maximum amount of time an idle (keep-alive) connection will remain idle before closing itself. Zero means no limit. 542 response_header_timeout: 2m // Optional amount of time to wait for a server's response headers after fully writing the request. 543 tls_handshake_timeout: 10s // Optional maximum amount of time waiting to wait for a TLS handshake. Zero means no timeout. 544 expect_continue_timeout: 1s // Optional amount of time to wait for a server's first response headers. Zero means no timeout and causes the body to be sent immediately. 545 insecure_skip_verify: false // Optional. If true, crypto/tls accepts any certificate presented by the server and any host name in that certificate. 546 max_idle_conns: 100 // Optional maximum number of idle (keep-alive) connections across all hosts. Zero means no limit. 547 max_idle_conns_per_host: 100 // Optional maximum idle (keep-alive) connections to keep per-host. If zero, DefaultMaxIdleConnsPerHost=2 is used. 548 max_conns_per_host: 0 // Optional maximum total number of connections per host. 549 disable_compression: false // Optional. If true, prevents the Transport from requesting compression. 550 client_timeout: 90s // Optional time limit for requests made by the HTTP Client. 551 ``` 552 553 ##### Instance Principal Provider 554 555 For Example: 556 557 ```yaml 558 type: OCI 559 config: 560 provider: "instance-principal" 561 bucket: "" 562 compartment_ocid: "" 563 ``` 564 565 You can also include any of the optional configuration just like the example in `Default Provider`. 566 567 ##### Raw Provider 568 569 For Example: 570 571 ```yaml 572 type: OCI 573 config: 574 provider: "raw" 575 bucket: "" 576 compartment_ocid: "" 577 tenancy_ocid: "" 578 user_ocid: "" 579 region: "" 580 fingerprint: "" 581 privatekey: "" 582 passphrase: "" // Optional passphrase to encrypt the private API Signing key 583 ``` 584 585 You can also include any of the optional configuration just like the example in `Default Provider`. 586 587 ##### OCI Policies 588 589 Regardless of the method you use for authentication (raw, instance-principal), you need the following 2 policies in order for Thanos (sidecar or receive) to be able to write TSDB to OCI object storage. The difference lies in whom you are giving the permissions. 590 591 For using instance-principal and dynamic group: 592 593 ``` 594 Allow dynamic-group thanos to read buckets in compartment id ocid1.compartment.oc1..a 595 Allow dynamic-group thanos to manage objects in compartment id ocid1.compartment.oc1..a 596 ``` 597 598 For using raw provider and an IAM group: 599 600 ``` 601 Allow group thanos to read buckets in compartment id ocid1.compartment.oc1..a 602 Allow group thanos to manage objects in compartment id ocid1.compartment.oc1..a 603 ``` 604 605 ### How to add a new client to Thanos? 606 607 objstore.go 608 609 Following checklist allows adding new Go code client to supported providers: 610 611 1. Create new directory under `pkg/objstore/<provider>` 612 2. Implement [objstore.Bucket interface](https://github.com/thanos-io/objstore/blob/main//objstore.go) 613 3. Add `NewTestBucket` constructor for testing purposes, that creates and deletes temporary bucket. 614 4. Use created `NewTestBucket` in [ForeachStore method](https://github.com/thanos-io/objstore/blob/main/objtesting/foreach.go) to ensure we can run tests against new provider. (In PR) 615 5. RUN the [TestObjStoreAcceptanceTest](https://github.com/thanos-io/objstore/blob/main//objtesting/acceptance_e2e_test.go) against your provider to ensure it fits. Fix any found error until test passes. (In PR) 616 6. Add client implementation to the factory in [factory](https://github.com/thanos-io/objstore/blob/main/client/factory.go) code. (Using as small amount of flags as possible in every command) 617 7. Add client struct config to [bucketcfggen](../scripts/cfggen/main.go) to allow config auto generation. 618 619 At that point, anyone can use your provider by spec. 620 621 Check the checklist in [thanos-io/objstore](https://github.com/thanos-io/objstore#how-to-add-a-new-client-to-thanos) for more comprehensive information! 622 623 ## Data in Object Storage 624 625 Thanos supports writing and reading data in native Prometheus `TSDB blocks` in [TSDB format](https://github.com/prometheus/prometheus/tree/master/tsdb/docs/format). This is the format used by [Prometheus](https://prometheus.io) TSDB database for persisting data on the local disk. With the efficient index and [chunk](design.md#chunk) binary formats, it also fits well to be used directly from object storage using range GET API. 626 627 Following sections explain this format in details with the additional files and entries that Thanos system supports. 628 629 ### TSDB Block 630 631 Official docs for Prometheus TSDB format can be found [here](https://github.com/prometheus/prometheus/tree/master/tsdb/docs/format), but this section lists the most important elements here. 632 633 TSDB Block means particularly a set of Blobs (files) in a single directory (or `prefix` if we talk in Object Storage terms) named with [ULID](https://github.com/ulid/spec) e.g `01ARZ3NDEKTSV4RRFFQ69G5FAV`. 634 635 **Those files contain series (labels with compressed samples) for particular time duration (e.g 2h) from particular `Source` (e.g Prometheus or Thanos Receive)** 636 637 In Thanos system, all files are **strictly immutable**. (NOTE: In Prometheus too, but with some caveats like tombstones). This means that any modification like `rewrite` `deletion` or `compaction` has to be done by creating a new block and removing (with delay!) old one. 638 639 > NOTE: Any other not-known file present in this directory is ignored when reading the data. However, those can be removed when the block is being deleted from object storage/disk. 640 641 Example block file structure (on the local filesystem) can look like this: 642 643 ``` 644 01DN3SK96XDAEKRB1AN30AAW6E: 645 total 2209344 646 drwxr-xr-x 2 bwplotka bwplotka 4096 Dec 10 2019 chunks 647 -rw-r--r-- 1 bwplotka bwplotka 1962383742 Dec 10 2019 index 648 -rw-r--r-- 1 bwplotka bwplotka 6761 Dec 10 2019 meta.json 649 -rw-r--r-- 1 bwplotka bwplotka 111 Dec 10 2019 deletion-mark.json # <-- Optional marker. 650 -rw-r--r-- 1 bwplotka bwplotka 124 Dec 10 2019 no-compact-mark.json # <-- Optional marker. 651 652 01DN3SK96XDAEKRB1AN30AAW6E/chunks: 653 total 8202452 654 -rw-r--r-- 1 bwplotka bwplotka 536870490 Dec 10 2019 000001 655 -rw-r--r-- 1 bwplotka bwplotka 536869843 Dec 10 2019 000002 656 -rw-r--r-- 1 bwplotka bwplotka 536869848 Dec 10 2019 000003 657 -rw-r--r-- 1 bwplotka bwplotka 536868209 Dec 10 2019 000004 658 -rw-r--r-- 1 bwplotka bwplotka 536869517 Dec 10 2019 000005 659 -rw-r--r-- 1 bwplotka bwplotka 536870654 Dec 10 2019 000006 660 -rw-r--r-- 1 bwplotka bwplotka 536855168 Dec 10 2019 000007 661 -rw-r--r-- 1 bwplotka bwplotka 536859441 Dec 10 2019 000008 662 -rw-r--r-- 1 bwplotka bwplotka 536862863 Dec 10 2019 000009 663 -rw-r--r-- 1 bwplotka bwplotka 536868432 Dec 10 2019 000010 664 -rw-r--r-- 1 bwplotka bwplotka 536861395 Dec 10 2019 000011 665 -rw-r--r-- 1 bwplotka bwplotka 536870859 Dec 10 2019 000012 666 -rw-r--r-- 1 bwplotka bwplotka 536854971 Dec 10 2019 000013 667 -rw-r--r-- 1 bwplotka bwplotka 536846973 Dec 10 2019 000014 668 -rw-r--r-- 1 bwplotka bwplotka 536866732 Dec 10 2019 000015 669 -rw-r--r-- 1 bwplotka bwplotka 346266827 Dec 10 2019 000016 670 ``` 671 672 Let's look at each file one by one. 673 674 #### Metadata file (meta.json) 675 676 > NOTE: Currently supported meta.json version: v1 Currently supported meta.json Thanos section version: v1 677 678 This file is an important entry that described the block and its data. 679 680 This file allows you to find for example: 681 682 * The block ID (`ulid`) 683 * Duration of the block (`minTime` and `maxTime`) 684 * Important statistics (`stats.numSeries`) 685 * How many times block was re-compacted (`compaction.level`) 686 * What initial smaller blocks IDs are part of this block (`compaction.sources`) 687 * What smaller (including intermittent) blocks IDs are part of this block (`compaction.parents`) 688 * Thanos Section (only visible for blocks generated by Thanos components like `sidecar`, `receive` or `compact`): 689 * External Labels for block (identifying producers) (`thanos.labels`) 690 * Downsampling resolution if downsampling was done on this block (`thanos.downsample.resolution`). `0` means no downsampling. 691 * What component created block (`thanos.source`) 692 * Files and its sizes that are part of this block (`thanos.files`) 693 694 > NOTE: In theory, you can modify this data manually. However, components like Compactor and Store Gateway currently infinitely cache that meta.json, (sometimes on disk if configured), so manual cache removal and restart might be needed. 695 696 Example meta.json file: 697 698 ```json 699 { 700 "ulid": "01DN3SK96XDAEKRB1AN30AAW6E", 701 "minTime": 1567641600000, 702 "maxTime": 1568851200000, 703 "stats": { 704 "numSamples": 5397517846, 705 "numSeries": 8377876, 706 "numChunks": 67874256 707 }, 708 "compaction": { 709 "level": 4, 710 "sources": [ 711 "01DKZNX70TQQ0R025G66ZF1V5P", 712 "01DKZWS55317K7JGVMCSBR68Z2", // Trimmed items for readability. 713 "01DN3GH4A71RD6NYQ2VZPBQTFH" 714 ], 715 "parents": [ 716 { 717 "ulid": "01DM4WK3F9ZGW19W16MZJJFF6T", 718 "minTime": 1567641600000, 719 "maxTime": 1567814400000 720 }, 721 { 722 "ulid": "01DMA1BXHK3G2KDKAPMBTVATRT", 723 "minTime": 1567814400000, 724 "maxTime": 1567987200000 725 }, 726 { 727 "ulid": "01DMF65TY6JSTCDVTPZ094B5D6", 728 "minTime": 1567987200000, 729 "maxTime": 1568160000000 730 }, 731 { 732 "ulid": "01DMMB0SK28FKC55RNK7ZZWS1A", 733 "minTime": 1568160000000, 734 "maxTime": 1568332800000 735 }, 736 { 737 "ulid": "01DMSFSXNE8Y76G5KCQ2BABYFA", 738 "minTime": 1568332800000, 739 "maxTime": 1568505600000 740 }, 741 { 742 "ulid": "01DMYMM5SW0FPJSHQQQM05FBN9", 743 "minTime": 1568505600000, 744 "maxTime": 1568678400000 745 }, 746 { 747 "ulid": "01DN3SDE1M9W1JG7JFSM5QFP2Y", 748 "minTime": 1568678400000, 749 "maxTime": 1568851200000 750 } 751 ] 752 }, 753 "version": 1, 754 "thanos": { 755 "labels": { 756 "cluster": "eu1", 757 "monitor": "prometheus", 758 "tenant": "team-a", 759 "replica": "1" 760 }, 761 "downsample": { 762 "resolution": 0 763 }, 764 "source": "compactor", 765 "files": [ 766 { 767 "rel_path": "index", 768 "size_bytes": 1313 769 }, // Trimmed items for readability. 770 ], 771 "version": 1 772 } 773 } 774 ``` 775 776 Format in Go code can be found [here](https://github.com/thanos-io/thanos/blob/main/pkg/block/metadata/meta.go). 777 778 ##### External Labels 779 780 External labels are extremely important block metadata. They are stored in `meta.json` in `thanos.labels` section and allows to identify the producer and owner of those blocks. This information will be used further by different Thanos components: 781 782 * Those labels will be visible when data is queried. You can aggregate across those in PromQL etc. 783 * [Querier](components/query.md) to filter out store APIs to touch during query requests. 784 * Many object storage readers like [compactor](components/compact.md) and [store gateway](components/store.md) which groups the blocks by external labels. This grouping allows horizontal scalability like sharding or concurrency. 785 * Some of those labels can be chosen as **replication** labels. Querier and Compactor will then deduplicate such blocks identified by same HA groups. 786 * Some of those labels can be chosen as **tenancy** labels. This allows read, write and storage isolation mechanism. 787 788 The `meta.json` and `thanos.labels` labels are filled during block upload/creation. For example: 789 790 * Each produced TSDB block by Prometheus is labelled with Prometheus [external labels](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#configuration-file) by `sidecar` before upload to object storage. 791 * Each produced TSDB block by `compact` is labelled with whatever source blocks had. The exception is the deduplication process that removes the chosen replica flag(s). 792 * Each produced TSDB block by `receive` is labelled with labels given labels in repeated [receive](components/receive.md) `--labels` flag. 793 794 The recommended information that should be given in those labels: 795 796 Example Prometheus useful external labels: 797 798 * Replication information e.g `replica="0"` 799 * Cluster, environment, zone, so target origin e.g `cluster="eu-1-production"` or `cluster="1",env="production",region="us-west1"` 800 * Tenancy information e.g `tenant="organizationABC"` 801 802 > NOTE: Be careful with receive external flags. Remote Write clients can stream any labels. If some label will duplicate with the external label of receive, it will be masked with what receiver has specified. This is why it's recommended to have `receive_` prefix to all receive labels. (e.g to not confuse with Prometheus replicas) 803 804 Example Receive useful external labels: 805 806 * Replication information e.g `receive_replica="0"` (to not confuse with Prometheus `replica` often stated). 807 * Cluster, environment, zone, so target origin e.g `receive_cluster="eu-west1-production-1"` or `receive_cluster="1",receive_env="production",receive_region="us-west1"` 808 * Tenancy information e.g `tenant="organizationABC"` 809 810 #### Index Format (index) 811 812 > NOTE: Currently supported index file versions: v1 and v2 813 814 > This file stores the index created to allow efficient lookup for series and its samples. 815 816 **All entries are sorted lexicographically unless stated otherwise.** 817 818 From high level it allows to find: 819 820 * Label names 821 * Label values for label name 822 * All series labels 823 * Given (or all) series' chunk reference. This can be used to find [chunk](design.md#chunk) with samples in the [chunk files](#chunks-file-format) 824 825 The following describes the format of the `index` file found in each block directory. It is terminated by a table of contents which serves as an entry point into the index. 826 827 ``` 828 ┌────────────────────────────┬─────────────────────┐ 829 │ magic(0xBAAAD700) <4b> │ version(1) <1 byte> │ 830 ├────────────────────────────┴─────────────────────┤ 831 │ ┌──────────────────────────────────────────────┐ │ 832 │ │ Symbol Table │ │ 833 │ ├──────────────────────────────────────────────┤ │ 834 │ │ Series │ │ 835 │ ├──────────────────────────────────────────────┤ │ 836 │ │ Label Index 1 │ │ 837 │ ├──────────────────────────────────────────────┤ │ 838 │ │ ... │ │ 839 │ ├──────────────────────────────────────────────┤ │ 840 │ │ Label Index N │ │ 841 │ ├──────────────────────────────────────────────┤ │ 842 │ │ Postings 1 │ │ 843 │ ├──────────────────────────────────────────────┤ │ 844 │ │ ... │ │ 845 │ ├──────────────────────────────────────────────┤ │ 846 │ │ Postings N │ │ 847 │ ├──────────────────────────────────────────────┤ │ 848 │ │ Label Offset Table │ │ 849 │ ├──────────────────────────────────────────────┤ │ 850 │ │ Postings Offset Table │ │ 851 │ ├──────────────────────────────────────────────┤ │ 852 │ │ TOC │ │ 853 │ └──────────────────────────────────────────────┘ │ 854 └──────────────────────────────────────────────────┘ 855 ``` 856 857 When the index is written, an arbitrary number of padding bytes may be added between the lined out main sections above. When sequentially scanning through the file, any zero bytes after a section's specified length must be skipped. 858 859 Most of the sections described below start with a `len` field. It always specifies the number of bytes just before the trailing CRC32 checksum. The checksum is always calculated over those `len` bytes. 860 861 ##### Symbol Table 862 863 The symbol table holds a sorted list of deduplicated strings that occurred in label pairs of the stored series. They can be referenced from subsequent sections and significantly reduce the total index size. 864 865 The section contains a sequence of the string entries, each prefixed with the string's length in raw bytes. All strings are utf-8 encoded. Strings are referenced by sequential indexing. The strings are sorted in lexicographically ascending order. 866 867 ``` 868 ┌────────────────────┬─────────────────────┐ 869 │ len <4b> │ #symbols <4b> │ 870 ├────────────────────┴─────────────────────┤ 871 │ ┌──────────────────────┬───────────────┐ │ 872 │ │ len(str_1) <uvarint> │ str_1 <bytes> │ │ 873 │ ├──────────────────────┴───────────────┤ │ 874 │ │ . . . │ │ 875 │ ├──────────────────────┬───────────────┤ │ 876 │ │ len(str_n) <uvarint> │ str_n <bytes> │ │ 877 │ └──────────────────────┴───────────────┘ │ 878 ├──────────────────────────────────────────┤ 879 │ CRC32 <4b> │ 880 └──────────────────────────────────────────┘ 881 ``` 882 883 ##### Series 884 885 The section contains a sequence of series that hold the label set of the series as well as its [chunks](design.md#chunk) within the block. The series are sorted lexicographically by their label sets. Each series section is aligned to 16 bytes. The ID for a series is the `offset/16`. This serves as the series' ID in all subsequent references. Thereby, a sorted list of series IDs implies a lexicographically sorted list of series label sets. 886 887 ``` 888 ┌───────────────────────────────────────┐ 889 │ ┌───────────────────────────────────┐ │ 890 │ │ series_1 │ │ 891 │ ├───────────────────────────────────┤ │ 892 │ │ . . . │ │ 893 │ ├───────────────────────────────────┤ │ 894 │ │ series_n │ │ 895 │ └───────────────────────────────────┘ │ 896 └───────────────────────────────────────┘ 897 ``` 898 899 Every series entry first holds its number of labels, followed by tuples of symbol table references that contain the label name and value. The label pairs are lexicographically sorted. After the labels, the number of indexed [chunks](design.md#chunk) is encoded, followed by a sequence of metadata entries containing the chunks minimum (`mint`) and maximum (`maxt`) timestamp and a reference to its position in the chunk file. The `mint` is the time of the first sample and `maxt` is the time of the last sample in the chunk. Holding the time range data in the index allows dropping chunks irrelevant to queried time ranges without accessing them directly. 900 901 `mint` of the first [chunk](design.md#chunk) is stored, it's `maxt` is stored as a delta and the `mint` and `maxt` are encoded as deltas to the previous time for subsequent chunks. Similarly, the reference of the first chunk is stored and the next ref is stored as a delta to the previous one. 902 903 ``` 904 ┌──────────────────────────────────────────────────────────────────────────┐ 905 │ len <uvarint> │ 906 ├──────────────────────────────────────────────────────────────────────────┤ 907 │ ┌──────────────────────────────────────────────────────────────────────┐ │ 908 │ │ labels count <uvarint64> │ │ 909 │ ├──────────────────────────────────────────────────────────────────────┤ │ 910 │ │ ┌────────────────────────────────────────────┐ │ │ 911 │ │ │ ref(l_i.name) <uvarint32> │ │ │ 912 │ │ ├────────────────────────────────────────────┤ │ │ 913 │ │ │ ref(l_i.value) <uvarint32> │ │ │ 914 │ │ └────────────────────────────────────────────┘ │ │ 915 │ │ ... │ │ 916 │ ├──────────────────────────────────────────────────────────────────────┤ │ 917 │ │ chunks count <uvarint64> │ │ 918 │ ├──────────────────────────────────────────────────────────────────────┤ │ 919 │ │ ┌────────────────────────────────────────────┐ │ │ 920 │ │ │ c_0.mint <varint64> │ │ │ 921 │ │ ├────────────────────────────────────────────┤ │ │ 922 │ │ │ c_0.maxt - c_0.mint <uvarint64> │ │ │ 923 │ │ ├────────────────────────────────────────────┤ │ │ 924 │ │ │ ref(c_0.data) <uvarint64> │ │ │ 925 │ │ └────────────────────────────────────────────┘ │ │ 926 │ │ ┌────────────────────────────────────────────┐ │ │ 927 │ │ │ c_i.mint - c_i-1.maxt <uvarint64> │ │ │ 928 │ │ ├────────────────────────────────────────────┤ │ │ 929 │ │ │ c_i.maxt - c_i.mint <uvarint64> │ │ │ 930 │ │ ├────────────────────────────────────────────┤ │ │ 931 │ │ │ ref(c_i.data) - ref(c_i-1.data) <varint64> │ │ │ 932 │ │ └────────────────────────────────────────────┘ │ │ 933 │ │ ... │ │ 934 │ └──────────────────────────────────────────────────────────────────────┘ │ 935 ├──────────────────────────────────────────────────────────────────────────┤ 936 │ CRC32 <4b> │ 937 └──────────────────────────────────────────────────────────────────────────┘ 938 ``` 939 940 ##### Label Index 941 942 A label index section indexes the existing (combined) values for one or more label names. The `#names` field determines the number of indexed label names, followed by the total number of entries in the `#entries` field. The body holds #entries / #names tuples of symbol table references, each tuple being of `#names` length. The value tuples are sorted in lexicographically increasing order. This is no longer used. 943 944 ``` 945 ┌───────────────┬────────────────┬────────────────┐ 946 │ len <4b> │ #names <4b> │ #entries <4b> │ 947 ├───────────────┴────────────────┴────────────────┤ 948 │ ┌─────────────────────────────────────────────┐ │ 949 │ │ ref(value_0) <4b> │ │ 950 │ ├─────────────────────────────────────────────┤ │ 951 │ │ ... │ │ 952 │ ├─────────────────────────────────────────────┤ │ 953 │ │ ref(value_n) <4b> │ │ 954 │ └─────────────────────────────────────────────┘ │ 955 │ . . . │ 956 ├─────────────────────────────────────────────────┤ 957 │ CRC32 <4b> │ 958 └─────────────────────────────────────────────────┘ 959 ``` 960 961 For instance, a single label name with 4 different values will be encoded as: 962 963 ``` 964 ┌────┬───┬───┬──────────────┬──────────────┬──────────────┬──────────────┬───────┐ 965 │ 24 │ 1 │ 4 │ ref(value_0) | ref(value_1) | ref(value_2) | ref(value_3) | CRC32 | 966 └────┴───┴───┴──────────────┴──────────────┴──────────────┴──────────────┴───────┘ 967 ``` 968 969 The sequence of label index sections is finalized by a [label offset table](#label-offset-table) containing label offset entries that points to the beginning of each label index section for a given label name. 970 971 ##### Postings 972 973 Postings sections store monotonically increasing lists of series references that contain a given label pair associated with the list. 974 975 ``` 976 ┌────────────────────┬────────────────────┐ 977 │ len <4b> │ #entries <4b> │ 978 ├────────────────────┴────────────────────┤ 979 │ ┌─────────────────────────────────────┐ │ 980 │ │ ref(series_1) <4b> │ │ 981 │ ├─────────────────────────────────────┤ │ 982 │ │ ... │ │ 983 │ ├─────────────────────────────────────┤ │ 984 │ │ ref(series_n) <4b> │ │ 985 │ └─────────────────────────────────────┘ │ 986 ├─────────────────────────────────────────┤ 987 │ CRC32 <4b> │ 988 └─────────────────────────────────────────┘ 989 ``` 990 991 The sequence of postings sections is finalized by a [postings offset table](#postings-offset-table) containing postings offset entries that points to the beginning of each postings section for a given label pair. 992 993 ##### Label Offset Table 994 995 A label offset table stores a sequence of label offset entries. Every label offset entry holds the label name and the offset to its values in the label index section. They are used to track label index sections. This is no longer used. 996 997 ``` 998 ┌─────────────────────┬──────────────────────┐ 999 │ len <4b> │ #entries <4b> │ 1000 ├─────────────────────┴──────────────────────┤ 1001 │ ┌────────────────────────────────────────┐ │ 1002 │ │ n = 1 <1b> │ │ 1003 │ ├──────────────────────┬─────────────────┤ │ 1004 │ │ len(name) <uvarint> │ name <bytes> │ │ 1005 │ ├──────────────────────┴─────────────────┤ │ 1006 │ │ offset <uvarint64> │ │ 1007 │ └────────────────────────────────────────┘ │ 1008 │ . . . │ 1009 ├────────────────────────────────────────────┤ 1010 │ CRC32 <4b> │ 1011 └────────────────────────────────────────────┘ 1012 ``` 1013 1014 ##### Postings Offset Table 1015 1016 A postings offset table stores a sequence of postings offset entries, sorted by label name and value. Every postings offset entry holds the label name/value pair and the offset to its series list in the postings section. They are used to track postings sections. They are partially read into memory when an index file is loaded. 1017 1018 ``` 1019 ┌─────────────────────┬──────────────────────┐ 1020 │ len <4b> │ #entries <4b> │ 1021 ├─────────────────────┴──────────────────────┤ 1022 │ ┌────────────────────────────────────────┐ │ 1023 │ │ n = 2 <1b> │ │ 1024 │ ├──────────────────────┬─────────────────┤ │ 1025 │ │ len(name) <uvarint> │ name <bytes> │ │ 1026 │ ├──────────────────────┼─────────────────┤ │ 1027 │ │ len(value) <uvarint> │ value <bytes> │ │ 1028 │ ├──────────────────────┴─────────────────┤ │ 1029 │ │ offset <uvarint64> │ │ 1030 │ └────────────────────────────────────────┘ │ 1031 │ . . . │ 1032 ├────────────────────────────────────────────┤ 1033 │ CRC32 <4b> │ 1034 └────────────────────────────────────────────┘ 1035 ``` 1036 1037 ##### TOC 1038 1039 The table of contents serves as an entry point to the entire index and points to various sections in the file. If a reference is zero, it indicates the respective section does not exist and empty results should be returned upon lookup. 1040 1041 ``` 1042 ┌─────────────────────────────────────────┐ 1043 │ ref(symbols) <8b> │ 1044 ├─────────────────────────────────────────┤ 1045 │ ref(series) <8b> │ 1046 ├─────────────────────────────────────────┤ 1047 │ ref(label indices start) <8b> │ 1048 ├─────────────────────────────────────────┤ 1049 │ ref(label offset table) <8b> │ 1050 ├─────────────────────────────────────────┤ 1051 │ ref(postings start) <8b> │ 1052 ├─────────────────────────────────────────┤ 1053 │ ref(postings offset table) <8b> │ 1054 ├─────────────────────────────────────────┤ 1055 │ CRC32 <4b> │ 1056 └─────────────────────────────────────────┘ 1057 ``` 1058 1059 #### Chunks File Format 1060 1061 > NOTE: Currently supported index file versions: v1. 1062 1063 > NOTE: Don't confuse with `chunks format` (XOR encoded, Gorilla compressed set of samples). Overall chunk files are containing multiple series chunks (: 1064 1065 The following describes the format of a chunks file, which is created in the `chunks/` directory of a block. The maximum size per segment file is 512MiB. 1066 1067 [Chunks](design.md#chunk) in the files are referenced from the index by uint64 composed of in-file offset (lower 4 bytes) and segment sequence number (upper 4 bytes). 1068 1069 ``` 1070 ┌──────────────────────────────┐ 1071 │ magic(0x85BD40DD) <4 byte> │ 1072 ├──────────────────────────────┤ 1073 │ version(1) <1 byte> │ 1074 ├──────────────────────────────┤ 1075 │ padding(0) <3 byte> │ 1076 ├──────────────────────────────┤ 1077 │ ┌──────────────────────────┐ │ 1078 │ │ Chunk 1 │ │ 1079 │ ├──────────────────────────┤ │ 1080 │ │ ... │ │ 1081 │ ├──────────────────────────┤ │ 1082 │ │ Chunk N │ │ 1083 │ └──────────────────────────┘ │ 1084 └──────────────────────────────┘ 1085 ``` 1086 1087 ##### Chunk 1088 1089 ``` 1090 ┌───────────────┬───────────────────┬──────────────┬────────────────┐ 1091 │ len <uvarint> │ encoding <1 byte> │ data <bytes> │ CRC32 <4 byte> │ 1092 └───────────────┴───────────────────┴──────────────┴────────────────┘ 1093 ``` 1094 1095 #### Tombstones 1096 1097 Thanos ignores any tombstones files. They are also deleted by sidecar on upload.