github.com/yankunsam/loki/v2@v2.6.3-0.20220817130409-389df5235c27/docs/sources/upgrading/_index.md (about) 1 --- 2 title: Upgrading 3 weight: 250 4 --- 5 6 # Upgrading Grafana Loki 7 8 Every attempt is made to keep Grafana Loki backwards compatible, such that upgrades should be low risk and low friction. 9 10 Unfortunately Loki is software and software is hard and sometimes we are forced to make decisions between ease of use and ease of maintenance. 11 12 If we have any expectation of difficulty upgrading we will document it here. 13 14 As more versions are released it becomes more likely unexpected problems arise moving between multiple versions at once. 15 If possible try to stay current and do sequential updates. If you want to skip versions, try it in a development environment before attempting to upgrade production. 16 17 # Checking for config changes 18 19 Using docker you can check changes between 2 versions of Loki with a command like this: 20 21 ``` 22 export OLD_LOKI=2.3.0 23 export NEW_LOKI=2.4.1 24 export CONFIG_FILE=loki-local-config.yaml 25 diff --color=always --side-by-side <(docker run --rm -t -v "${PWD}":/config grafana/loki:${OLD_LOKI} -config.file=/config/${CONFIG_FILE} -print-config-stderr 2>&1 | sed '/Starting Loki/q' | tr -d '\r') <(docker run --rm -t -v "${PWD}":/config grafana/loki:${NEW_LOKI} -config.file=/config/${CONFIG_FILE} -print-config-stderr 2>&1 | sed '/Starting Loki/q' | tr -d '\r') | less -R 26 ``` 27 28 the `tr -d '\r'` is likely not necessary for most people, seems like WSL2 was sneaking in some windows newline characters... 29 30 The output is incredibly verbose as it shows the entire internal config struct used to run Loki, you can play around with the diff command if you prefer to only show changes or a different style output. 31 32 ## Main / Unreleased 33 34 ### Loki 35 36 #### Fifocache is deprecated 37 38 We introduced a new cache called `embedded-cache` which is an in-process cache system that make it possible to run Loki without the need for an external cache (like Memcached, Redis, etc). It can be run in two modes `distributed: false` (default, and same as old `fifocache`) and `distributed: true` which runs cache in distributed fashion sharding keys across peers if Loki is run in microservices or SSD mode. 39 40 Currently `embedded-cache` with `distributed: true` can be enabled only for results cache. 41 42 #### Evenly spread queriers across kubernetes nodes 43 44 We now evenly spread queriers across the available kubernetes nodes, but allowing more than one querier to be scheduled into the same node. 45 If you want to run at most a single querier per node, set `$._config.querier.use_topology_spread` to false. 46 47 #### Default value for `server.http-listen-port` changed 48 49 This value now defaults to 3100, so the Loki process doesn't require special privileges. Previously, it had been set to port 80, which is a privileged port. If you need Loki to listen on port 80, you can set it back to the previous default using `-server.http-listen-port=80`. 50 51 #### docker-compose setup has been updated 52 53 The docker-compose [setup](https://github.com/grafana/loki/blob/main/production/docker) has been updated to **v2.6.0** and includes many improvements. 54 55 Notable changes include: 56 - authentication (multi-tenancy) is **enabled** by default; you can disable it in `production/docker/config/loki.yaml` by setting `auth_enabled: false` 57 - storage is now using Minio instead of local filesystem 58 - move your current storage into `.data/minio` and it should work transparently 59 - log-generator was added - if you don't need it, simply remove the service from `docker-compose.yaml` or don't start the service 60 61 #### Configuration for deletes has changed 62 63 The global `deletion_mode` option in the compactor configuration moved to runtime configurations. 64 65 - The `deletion_mode` option needs to be removed from your compactor configuration 66 - The `deletion_mode` global override needs to be set to the desired mode: `disabled`, `filter-only`, or `filter-and-delete`. By default, `filter-and-delete` is enabled. 67 - Any `allow_delete` per-tenant overrides need to be removed or changed to `deletion_mode` overrides with the desired mode. 68 69 ## 2.6.0 70 71 ### Loki 72 73 #### Implementation of unwrapped `rate` aggregation changed 74 75 The implementation of the `rate()` aggregation function changed back to the previous implemention prior to [#5013](https://github.com/grafana/loki/pulls/5013). 76 This means that the rate per second is calculated based on the sum of the extracted values, instead of the average increase over time. 77 78 If you want the extracted values to be treated as [Counter](https://prometheus.io/docs/concepts/metric_types/#counter) metric, you should use the new `rate_counter()` aggregation function, which calculates the per-second average rate of increase of the vector. 79 80 #### Default value for `azure.container-name` changed 81 82 This value now defaults to `loki`, it was previously set to `cortex`. If you are relying on this container name for your chunks or ruler storage, you will have to manually specify `-azure.container-name=cortex` or `-ruler.storage.azure.container-name=cortex` respectively. 83 84 ## 2.5.0 85 86 ### Loki 87 88 #### `split_queries_by_interval` yaml configuration has moved. 89 90 It was previously possible to define this value in two places 91 92 ```yaml 93 query_range: 94 split_queries_by_interval: 10m 95 ``` 96 97 and/or 98 99 ``` 100 limits_config: 101 split_queries_by_interval: 10m 102 ``` 103 104 In 2.5.0 it can only be defined in the `limits_config` section, **Loki will fail to start if you do not remove the `split_queries_by_interval` config from the `query_range` section.** 105 106 Additionally, it has a new default value of `30m` rather than `0`. 107 108 The CLI flag is not changed and remains `querier.split-queries-by-interval`. 109 110 #### Dropped support for old Prometheus rules configuration format 111 112 Alerting rules previously could be specified in two formats: 1.x format (legacy one, named `v0` internally) and 2.x. 113 We decided to drop support for format `1.x` as it is fairly old and keeping support for it required a lot of code. 114 115 In case you're still using the legacy format, take a look at 116 [Alerting Rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) for instructions 117 on how to write alerting rules in the new format. 118 119 For reference, the newer format follows a structure similar to the one below: 120 ```yaml 121 groups: 122 - name: example 123 rules: 124 - alert: HighErrorRate 125 expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5 126 for: 10m 127 labels: 128 severity: page 129 annotations: 130 summary: High request latency 131 ``` 132 133 Meanwhile, the legacy format is a string in the following format: 134 ``` 135 ALERT <alert name> 136 IF <expression> 137 [ FOR <duration> ] 138 [ LABELS <label set> ] 139 [ ANNOTATIONS <label set> ] 140 ``` 141 142 #### Changes to default configuration values 143 144 * `parallelise_shardable_queries` under the `query_range` config now defaults to `true`. 145 * `split_queries_by_interval` under the `limits_config` config now defaults to `30m`, it was `0s`. 146 * `max_chunk_age` in the `ingester` config now defaults to `2h` previously it was `1h`. 147 * `query_ingesters_within` under the `querier` config now defaults to `3h`, previously it was `0s`. Any query (or subquery) that has an end time more than `3h` ago will not be sent to the ingesters, this saves work on the ingesters for data they normally don't contain. If you regularly write old data to Loki you may need to return this value to `0s` to always query ingesters. 148 * `max_concurrent` under the `querier` config now defaults to `10` instead of `20`. 149 * `match_max_concurrent` under the `frontend_worker` config now defaults to true, this supersedes the `parallelism` setting which can now be removed from your config. Controlling query parallelism of a single process can now be done with the `querier` `max_concurrent` setting. 150 * `flush_op_timeout` under the `ingester` configuration block now defaults to `10m`, increased from `10s`. This can help when replaying a large WAL on Loki startup, and avoid `msg="failed to flush" ... context deadline exceeded` errors. 151 152 ### Promtail 153 154 #### `gcplog` labels have changed 155 156 - Resource labels have been moved from `__<NAME>` to `__gcp_resource_labels_<NAME>` 157 e.g. if you previously used `__project_id` then you'll need to update your relabel config to use `__gcp_resource_labels_project_id`. 158 - `resource_type` has been moved to `__gcp_resource_type` 159 160 #### `promtail_log_entries_bytes_bucket` histogram has been removed. 161 162 This histogram reports the distribution of log line sizes by file. It has 8 buckets for every file being tailed. 163 164 This creates a lot of series and we don't think this metric has enough value to offset the amount of series genereated so we are removing it. 165 166 While this isn't a direct replacement, two metrics we find more useful are size and line counters configured via pipeline stages, an example of how to configure these metrics can be found in the [metrics pipeline stage docs](https://grafana.com/docs/loki/latest/clients/promtail/stages/metrics/#counter) 167 168 ### Jsonnet 169 170 #### Compactor config defined as command line args moved to yaml config 171 172 Following 2 compactor configs that were defined as command line arguments in jsonnet are now moved to yaml config: 173 174 ```yaml 175 # Directory where files can be downloaded for compaction. 176 # CLI flag: -boltdb.shipper.compactor.working-directory 177 [working_directory: <string>] 178 179 # The shared store used for storing boltdb files. 180 # Supported types: gcs, s3, azure, swift, filesystem. 181 # CLI flag: -boltdb.shipper.compactor.shared-store 182 [shared_store: <string>] 183 ``` 184 185 ## 2.4.0 186 187 The following are important changes which should be reviewed and understood prior to upgrading Loki. 188 189 ### Loki 190 191 The following changes pertain to upgrading Loki. 192 193 #### The single binary no longer runs a table-manager 194 195 Single binary Loki means running loki with `-target=all` which is the default if no `-target` flag is passed. 196 197 This will impact anyone in the following scenarios: 198 199 1. Running a single binary Loki with any index type other than `boltdb-shipper` or `boltdb` 200 2. Relying on retention with the configs `retention_deletes_enabled` and `retention_period` 201 202 Anyone in situation #1 who is not using `boltdb-shipper` or `boltdb` (e.g. `cassandra` or `bigtable`) should modify their Loki command to include `-target=all,table-manager` this will instruct Loki to run a table-manager for you. 203 204 Anyone in situation #2, you have two options, the first (and not recommended) is to run Loki with a table-manager by adding `-target=all,table-manager`. 205 206 The second and recommended solution, is to use deletes via the compactor: 207 208 ``` 209 compactor: 210 retention_enabled: true 211 limits_config: 212 retention_period: [30d] 213 ``` 214 215 See the [retention docs](../operations/storage/retention) for more info. 216 217 #### Log messages on startup: proto: duplicate proto type registered: 218 219 PR [#3842](https://github.com/grafana/loki/pull/3842) **cyriltovena**: Fork cortex chunk storage into Loki. 220 221 Since Cortex doesn't plan to use the `chunk` package anymore, we decided to fork it into our storage package to 222 be able to evolve and modify it easily. However, as a side-effect, we still vendor Cortex which includes this forked 223 code and protobuf files resulting in log messages like these at startup: 224 225 ``` 226 2021-11-04 15:30:02.437911 I | proto: duplicate proto type registered: purgeplan.DeletePlan 227 2021-11-04 15:30:02.437936 I | proto: duplicate proto type registered: purgeplan.ChunksGroup 228 2021-11-04 15:30:02.437939 I | proto: duplicate proto type registered: purgeplan.ChunkDetails 229 ... 230 ``` 231 232 The messages are harmless and we will work to remove them in the future. 233 234 #### Change of some default limits to common values 235 236 PR [4415](https://github.com/grafana/loki/pull/4415) **DylanGuedes**: the default value of some limits were changed to protect users from overwhelming their cluster with ingestion load caused by relying on default configs. 237 238 We suggest you double check if the following parameters are 239 present in your Loki config: `ingestion_rate_strategy`, `max_global_streams_per_user` 240 `max_query_length` `max_query_parallelism` `max_streams_per_user` 241 `reject_old_samples` `reject_old_samples_max_age`. If they are not present, we recommend you double check that the new values will not negatively impact your system. The changes are: 242 243 | config | new default | old default | 244 | --- | --- | --- | 245 | ingestion_rate_strategy | "global" | "local" | 246 | max_global_streams_per_user | 5000 | 0 (no limit) | 247 | max_query_length | "721h" | "0h" (no limit) | 248 | max_query_parallelism | 32 | 14 | 249 | max_streams_per_user | 0 (no limit) | 10000 | 250 | reject_old_samples | true | false | 251 | reject_old_samples_max_age | "168h" | "336h" | 252 | per_stream_rate_limit | 3MB | - | 253 | per_stream_rate_limit_burst | 15MB | - | 254 255 #### Change of configuration defaults 256 257 | config | new default | old default| 258 | --- | --- | --- | 259 | chunk_retain_period | 0s | 30s | 260 | chunk_idle_period | 30m | 1h | 261 | chunk_target_size | 1572864 | 1048576 | 262 263 * chunk_retain_period is necessary when using an index queries cache which is not enabled by default. If you have configured an index_queries_cache_config section make sure that you set chunk_retain_period larger than your cache TTL 264 * chunk_idle_period is how long before a chunk which receives no logs is flushed. 265 * chunk_target_size was increased to flush slightly larger chunks, if using memcache for a chunks store make sure it will accept files up to 1.5MB in size. 266 267 #### In memory FIFO caches enabled by default 268 269 Loki now enables a results cache and chunks cache in memory to improve performance. This can however increase memory usage as the cache's by default are allowed to consume up to 1GB of memory. 270 271 If you would like to disable these caches or change this memory limit: 272 273 Disable: 274 275 ``` 276 chunk_store_config: 277 chunk_cache_config: 278 enable_fifocache: false 279 query_range: 280 results_cache: 281 cache: 282 enable_fifocache: false 283 ``` 284 285 Resize: 286 287 ``` 288 chunk_store_config: 289 chunk_cache_config: 290 enable_fifocache: true 291 fifocache: 292 max_size_bytes: 500MB 293 query_range: 294 results_cache: 295 cache: 296 enable_fifocache: true 297 fifocache: 298 max_size_bytes: 500MB 299 ``` 300 301 #### Ingester Lifecycler `final_sleep` now defaults to `0s` 302 303 * [4608](https://github.com/grafana/loki/pull/4608) **trevorwhitney**: Change default value of ingester lifecycler's `final_sleep` from `30s` to `0s` 304 305 This final sleep exists to keep Loki running for long enough to get one final Prometheus scrape before shutting down, however it also causes Loki to sit idle for 30s on shutdown which is an annoying experience for many people. 306 307 We decided the default would be better to disable this sleep behavior but anyone can set this config variable directly to return to the previous behavior. 308 309 #### Ingester WAL now defaults to on, and chunk transfers are disabled by default 310 311 * [4543](https://github.com/grafana/loki/pull/4543) **trevorwhitney**: Change more default values and improve application of common storage config 312 * [4629](https://github.com/grafana/loki/pull/4629) **owen-d**: Default the WAL to enabled in the Loki jsonnet library 313 * [4624](https://github.com/grafana/loki/pull/4624) **chaudum**: Disable chunk transfers in jsonnet lib 314 315 This changes a few default values, resulting in the ingester WAL now being on by default, 316 and chunk transfer retries are disabled by default. Note, this now means Loki will depend on local disk by default for it's WAL (write ahead log) directory. This defaults to `wal` but can be overridden via the `--ingester.wal-dir` or via `path_prefix` in the common configuration section. Below are config snippets with the previous defaults, and another with the new values. 317 318 Previous defaults: 319 ```yaml 320 ingester: 321 max_transfer_retries: 10 322 wal: 323 enabled: false 324 ``` 325 326 New defaults: 327 ```yaml 328 ingester: 329 max_transfer_retries: 0 330 wal: 331 enabled: true 332 ``` 333 334 #### Memberlist config now automatically applies to all non-configured rings 335 * [4400](https://github.com/grafana/loki/pull/4400) **trevorwhitney**: Config: automatically apply memberlist config too all rings when provided 336 337 This change affects the behavior of the ingester, distributor, and ruler rings. Previously, if you wanted to use memberlist for all of these rings, you 338 had to provide a `memberlist` configuration as well as specify `store: memberlist` for the `kvstore` of each of the rings you wanted to use memberlist. 339 For example, your configuration might look something like this: 340 341 ```yaml 342 memberlist: 343 join_members: 344 - loki.namespace.svc.cluster.local 345 distributor: 346 ring: 347 kvstore: 348 store: memberlist 349 ingester: 350 lifecycler: 351 ring: 352 kvstore: 353 store: memberlist 354 ruler: 355 ring: 356 kvstore: 357 store: memberlist 358 ``` 359 360 Now, if your provide a `memberlist` configuration with at least one `join_members`, loki will default all rings to use a `kvstore` of type `memberlist`. 361 You can change this behavior by overriding specific configurations. For example, if you wanted to use `consul` for you `ruler` rings, but `memberlist` 362 for the `ingester` and `distributor`, you could do so with the following config (although we don't know why someone would want to do this): 363 364 ```yaml 365 memberlist: 366 join_members: 367 - loki.namespace.svc.cluster.local 368 ruler: 369 ring: 370 kvstore: 371 store: consul 372 consul: 373 host: consul.namespace.svc.cluster.local:8500 374 ``` 375 376 #### Changed defaults for some GRPC server settings 377 * [4435](https://github.com/grafana/loki/pull/4435) **trevorwhitney**: Change default values for two GRPC settings so querier can connect to frontend/scheduler 378 379 This changes two default values, `grpc_server_min_time_between_pings` and `grpc_server_ping_without_stream_allowed` used by the GRPC server. 380 381 *Previous Values*: 382 ``` 383 server: 384 grpc_server_min_time_between_pings: '5m' 385 grpc_server_ping_without_stream_allowed: false 386 ``` 387 388 *New Values*: 389 ``` 390 server: 391 grpc_server_min_time_between_pings: '10s' 392 grpc_server_ping_without_stream_allowed: true 393 ``` 394 395 [This issue](https://github.com/grafana/loki/issues/4375) has some more information on the change. 396 397 #### Some metric prefixes have changed from `cortex_` to `loki_` 398 399 * [#3842](https://github.com/grafana/loki/pull/3842)/[#4253](https://github.com/grafana/loki/pull/4253) **jordanrushing**: Metrics related to chunk storage and runtime config have changed their prefixes from `cortex_` to `loki_`. 400 401 ``` 402 cortex_runtime_config* -> loki_runtime_config* 403 cortex_chunks_store* -> loki_chunks_store* 404 ``` 405 406 #### Recording rules storage is now durable 407 408 * [4344](https://github.com/grafana/loki/pull/4344) **dannykopping**: per-tenant WAL 409 410 Previously, samples generated by recording rules would only be buffered in memory before being remote-written to Prometheus; from this 411 version, the `ruler` now writes these samples to a per-tenant Write-Ahead Log for durability. More details about the 412 per-tenant WAL can be found [here](https://grafana.com/docs/loki/latest/operations/recording-rules/). 413 414 The `ruler` now requires persistent storage - please see the 415 [Operations](https://grafana.com/docs/loki/latest/operations/recording-rules/#deployment) page for more details about deployment. 416 417 ### Promtail 418 419 The following changes pertain to upgrading Promtail. 420 421 #### Promtail no longer insert `promtail_instance` label when scraping `gcplog` target 422 * [4556](https://github.com/grafana/loki/pull/4556) **james-callahan**: Remove `promtail_instance` label that was being added by promtail when scraping `gcplog` target. 423 424 425 ## 2.3.0 426 427 ### Loki 428 429 #### Query restriction introduced for queries which do not have at least one equality matcher 430 431 PR [3216](https://github.com/grafana/loki/pull/3216) **sandeepsukhani**: check for stream selectors to have at least one equality matcher 432 433 This change now rejects any query which does not contain at least one equality matcher, an example may better illustrate: 434 435 `{namespace=~".*"}` 436 437 This query will now be rejected, however there are several ways to modify it for it to succeed: 438 439 Add at least one equals label matcher: 440 441 `{cluster="us-east-1",namespace=~".*"}` 442 443 Use `.+` instead of `.*` 444 445 `{namespace=~".+"}` 446 447 This difference may seem subtle but if we break it down `.` matches any character, `*` matches zero or more of the preceding character and `+` matches one or more of the preceding character. The `.*` case will match empty values where `.+` will not, this is the important difference. `{namespace=""}` is an invalid request (unless you add another equals label matcher like the example above) 448 449 The reasoning for this change has to do with how index lookups work in Loki, if you don't have at least one equality matcher Loki has to perform a complete index table scan which is an expensive and slow operation. 450 451 452 ## 2.2.0 453 454 ### Loki 455 456 **Be sure to upgrade to 2.0 or 2.1 BEFORE upgrading to 2.2** 457 458 In Loki 2.2 we changed the internal version of our chunk format from v2 to v3, this is a transparent change and is only relevant if you every try to _downgrade_ a Loki installation. We incorporated the code to read v3 chunks in 2.0.1 and 2.1, as well as 2.2 and any future releases. 459 460 **If you upgrade to 2.2+ any chunks created can only be read by 2.0.1, 2.1 and 2.2+** 461 462 This makes it important to first upgrade to 2.0, 2.0.1, or 2.1 **before** upgrading to 2.2 so that if you need to rollback for any reason you can do so easily. 463 464 **Note:** 2.0 and 2.0.1 are identical in every aspect except 2.0.1 contains the code necessary to read the v3 chunk format. Therefor if you are on 2.0 and ugrade to 2.2, if you want to rollback, you must rollback to 2.0.1. 465 466 ### Loki Config 467 468 **Read this if you use the query-frontend and have `sharded_queries_enabled: true`** 469 470 We discovered query scheduling related to sharded queries over long time ranges could lead to unfair work scheduling by one single query in the per tenant work queue. 471 472 The `max_query_parallelism` setting is designed to limit how many split and sharded units of 'work' for a single query are allowed to be put into the per tenant work queue at one time. The previous behavior would split the query by time using the `split_queries_by_interval` and compare this value to `max_query_parallelism` when filling the queue, however with sharding enabled, every split was then sharded into 16 additional units of work after the `max_query_parallelism` limit was applied. 473 474 In 2.2 we changed this behavior to apply the `max_query_parallelism` after splitting _and_ sharding a query resulting a more fair and expected queue scheduling per query. 475 476 **What this means** Loki will be putting much less work into the work queue per query if you are using the query frontend and have sharding_queries_enabled (which you should). **You may need to increase your `max_query_parallelism` setting if you are noticing slower query performance** In practice, you may not see a difference unless you were running a cluster with a LOT of queriers or queriers with a very high `parallelism` frontend_worker setting. 477 478 You could consider multiplying your current `max_query_parallelism` setting by 16 to obtain the previous behavior, though in practice we suspect few people would really want it this high unless you have a significant querier worker pool. 479 480 **Also be aware to make sure `max_outstanding_per_tenant` is always greater than `max_query_parallelism` or large queries will automatically fail with a 429 back to the user.** 481 482 483 484 ### Promtail 485 486 For 2.0 we eliminated the long deprecated `entry_parser` configuration in Promtail configs, however in doing so we introduced a very confusing and erroneous default behavior: 487 488 If you did not specify a `pipeline_stages` entry you would be provided with a default which included the `docker` pipeline stage. This can lead to some very confusing results. 489 490 In [3404](https://github.com/grafana/loki/pull/3404), we corrected this behavior 491 492 **If you are using docker, and any of your `scrape_configs` are missing a `pipeline_stages` definition**, you should add the following to obtain the correct behaviour: 493 494 ```yaml 495 pipeline_stages: 496 - docker: {} 497 ``` 498 499 ## 2.1.0 500 501 The upgrade from 2.0.0 to 2.1.0 should be fairly smooth, please be aware of these two things: 502 503 ### Helm charts have moved! 504 505 Helm charts are now located at: https://github.com/grafana/helm-charts/ 506 507 The helm repo URL is now: https://grafana.github.io/helm-charts 508 509 ### Fluent Bit plugin renamed 510 511 Fluent bit officially supports Loki as an output plugin now! WoooHOOO! 512 513 However this created a naming conflict with our existing output plugin (the new native output uses the name `loki`) so we have renamed our plugin. 514 515 In time our plan is to deprecate and eliminate our output plugin in favor of the native Loki support. However until then you can continue using the plugin with the following change: 516 517 Old: 518 519 ``` 520 [Output] 521 Name loki 522 ``` 523 524 New: 525 526 ``` 527 [Output] 528 Name grafana-loki 529 ``` 530 531 ## 2.0.0 532 533 This is a major Loki release and there are some very important upgrade considerations. 534 For the most part, there are very few impactful changes and for most this will be a seamless upgrade. 535 536 2.0.0 Upgrade Topics: 537 538 * [IMPORTANT If you are using a docker image, read this!](#important-if-you-are-using-a-docker-image-read-this) 539 * [IMPORTANT boltdb-shipper upgrade considerations](#important-boltdb-shipper-upgrade-considerations) 540 * [IMPORTANT results_cachemax_freshness removed from yaml config](#important-results_cachemax_freshness-removed-from-yaml-config) 541 * [Promtail removed entry_parser config](#promtail-config-removed) 542 * [If you would like to use the new single store index and v11 schema](#upgrading-schema-to-use-boltdb-shipper-andor-v11-schema) 543 544 ### **IMPORTANT If you are using a docker image, read this!** 545 546 (This includes, Helm, Tanka, docker-compose etc.) 547 548 The default config file in the docker image, as well as the default helm values.yaml and jsonnet for Tanka all specify a schema definition to make things easier to get started. 549 550 >**If you have not specified your own config file with your own schema definition (or you do not have a custom schema definition in your values.yaml), upgrading to 2.0 will break things!** 551 552 In 2.0 the defaults are now v11 schema and the `boltdb-shipper` index type. 553 554 555 If you are using an index type of `aws`, `bigtable`, or `cassandra` this means you have already defined a custom schema and there is _nothing_ further you need to do regarding the schema. 556 You can consider however adding a new schema entry to use the new `boltdb-shipper` type if you want to move away from these separate index stores and instead use just one object store. 557 558 #### What to do 559 560 The minimum action required is to create a config which specifies the schema to match what the previous defaults were. 561 562 (Keep in mind this will only tell Loki to use the old schema default, if you would like to upgrade to v11 and/or move to the single store boltdb-shipper, [see below](#upgrading-schema-to-use-boltdb-shipper-andor-v11-schema)) 563 564 There are three places we have hard coded the schema definition: 565 566 ##### Helm 567 568 Helm has shipped with the same internal schema in the values.yaml file for a very long time. 569 570 If you are providing your own values.yaml file then there is no _required_ action because you already have a schema definition. 571 572 **If you are not providing your own values.yaml file, you will need to make one!** 573 574 We suggest using the included [values.yaml file from the 1.6.0 tag](https://raw.githubusercontent.com/grafana/loki/v1.6.0/production/helm/loki/values.yaml) 575 576 This matches what the default values.yaml file had prior to 2.0 and is necessary for Loki to work post 2.0 577 578 As mentioned above, you should also consider looking at moving to the v11 schema and boltdb-shipper [see below](#upgrading-schema-to-use-boltdb-shipper-andor-v11-schema) for more information. 579 580 ##### Tanka 581 582 This likely only affects a small portion of tanka users because the default schema config for Loki was forcing `GCS` and `bigtable`. 583 584 **If your `main.jsonnet` (or somewhere in your manually created jsonnet) does not have a schema config section then you will need to add one like this!** 585 586 ```jsonnet 587 { 588 _config+:: { 589 using_boltdb_shipper: false, 590 loki+: { 591 schema_config+: { 592 configs: [{ 593 from: '2018-04-15', 594 store: 'bigtable', 595 object_store: 'gcs', 596 schema: 'v11', 597 index: { 598 prefix: '%s_index_' % $._config.table_prefix, 599 period: '168h', 600 }, 601 }], 602 }, 603 }, 604 } 605 } 606 ``` 607 608 >**NOTE** If you had set `index_period_hours` to a value other than 168h (the previous default) you must update this in the above config `period:` to match what you chose. 609 610 >**NOTE** We have changed the default index store to `boltdb-shipper` it's important to add `using_boltdb_shipper: false,` until you are ready to change (if you want to change) 611 612 Changing the jsonnet config to use the `boltdb-shipper` type is the same as [below](#upgrading-schema-to-use-boltdb-shipper-andor-v11-schema) where you need to add a new schema section. 613 614 **HOWEVER** Be aware when you change `using_boltdb_shipper: true` the deployment type for the ingesters and queriers will change to statefulsets! Statefulsets are required for the ingester and querier using boltdb-shipper. 615 616 ##### Docker (e.g. docker-compose) 617 618 For docker related cases you will have to mount a Loki config file separate from what's shipped inside the container 619 620 I would recommend taking the previous default file from the [1.6.0 tag on github](https://raw.githubusercontent.com/grafana/loki/v1.6.0/cmd/loki/loki-docker-config.yaml) 621 622 How you get this mounted and in use by Loki might vary based on how you are using the image, but this is a common example: 623 624 ```shell 625 docker run -d --name=loki --mount type=bind,source="path to loki-config.yaml",target=/etc/loki/local-config.yaml 626 ``` 627 628 The Loki docker image is expecting to find the config file at `/etc/loki/local-config.yaml` 629 630 631 ### IMPORTANT: boltdb-shipper upgrade considerations. 632 633 Significant changes have taken place between 1.6.0 and 2.0.0 for boltdb-shipper index type, if you are already running this index and are upgrading some extra caution is warranted. 634 635 Please strongly consider taking a complete backup of the `index` directory in your object store, this location might be slightly different depending on what store you use. 636 It should be a folder named index with a bunch of folders inside with names like `index_18561`,`index_18560`... 637 638 The chunks directory should not need any special backups. 639 640 If you have an environment to test this in please do so before upgrading against critical data. 641 642 There are 2 significant changes warranting the backup of this data because they will make rolling back impossible: 643 * A compactor is included which will take existing index files and compact them to one per day and remove non compacted files 644 * All index files are now gzipped before uploading 645 646 The second part is important because 1.6.0 does not understand how to read the gzipped files, so any new files uploaded or any files compacted become unreadable to 1.6.0 or earlier. 647 648 _THIS BEING SAID_ we are not expecting problems, our testing so far has not uncovered any problems, but some extra precaution might save data loss in unforeseen circumstances! 649 650 Please report any problems via GitHub issues or reach us on the #loki slack channel. 651 652 **Note if are using boltdb-shipper and were running with high availability and separate filesystems** 653 654 This was a poorly documented and even more experimental mode we toyed with using boltdb-shipper. For now we removed the documentation and also any kind of support for this mode. 655 656 To use boltdb-shipper in 2.0 you need a shared storage (S3, GCS, etc), the mode of running with separate filesystem stores in HA using a ring is not officially supported. 657 658 We didn't do anything explicitly to limit this functionality however we have not had any time to actually test this which is why we removed the docs and are listing it as not supported. 659 660 #### If running in microservices, deploy ingesters before queriers 661 662 Ingesters now expose a new RPC method that queriers use when the index type is `boltdb-shipper`. 663 Queriers generally roll out faster than ingesters, so if new queriers query older ingesters using the new RPC, the queries would fail. 664 To avoid any query downtime during the upgrade, rollout ingesters before queriers. 665 666 #### If running the compactor, ensure it has delete permissions for the object storage. 667 668 The compactor is an optional but suggested component that combines and deduplicates the boltdb-shipper index files. When compacting index files, the compactor writes a new file and deletes unoptimized files. Ensure that the compactor has appropriate permissions for deleting files, for example, s3:DeleteObject permission for AWS S3. 669 670 ### IMPORTANT: `results_cache.max_freshness` removed from YAML config 671 672 The `max_freshness` config from `results_cache` has been removed in favour of another flag called `max_cache_freshness_per_query` in `limits_config` which has the same effect. 673 If you happen to have `results_cache.max_freshness` set please use `limits_config.max_cache_freshness_per_query` YAML config instead. 674 675 ### Promtail config removed 676 677 The long deprecated `entry_parser` config in Promtail has been removed, use [pipeline_stages]({{< relref "../clients/promtail/configuration/#pipeline_stages" >}}) instead. 678 679 ### Upgrading schema to use boltdb-shipper and/or v11 schema 680 681 If you would also like to take advantage of the new Single Store (boltdb-shipper) index, as well as the v11 schema if you aren't already using it. 682 683 You can do this by adding a new schema entry. 684 685 Here is an example: 686 687 ```yaml 688 schema_config: 689 configs: 690 - from: 2018-04-15 ① 691 store: boltdb ①④ 692 object_store: filesystem ①④ 693 schema: v11 ② 694 index: 695 prefix: index_ ① 696 period: 168h ① 697 - from: 2020-10-24 ③ 698 store: boltdb-shipper 699 object_store: filesystem ④ 700 schema: v11 701 index: 702 prefix: index_ 703 period: 24h ⑤ 704 ``` 705 ① Make sure all of these match your current schema config 706 ② Make sure this matches your previous schema version, Helm for example is likely v9 707 ③ Make sure this is a date in the **FUTURE** keep in mind Loki only knows UTC so make sure it's a future UTC date 708 ④ Make sure this matches your existing config (e.g. maybe you were using gcs for your object_store) 709 ⑤ 24h is required for boltdb-shipper 710 711 There are more examples on the [Storage description page]({{< relref "../storage/_index.md#examples" >}}) including the information you need to setup the `storage` section for boltdb-shipper. 712 713 714 ## 1.6.0 715 716 ### Important: Ksonnet port changed and removed NET_BIND_SERVICE capability from Docker image 717 718 In 1.5.0 we changed the Loki user to not run as root which created problems binding to port 80. 719 To address this we updated the docker image to add the NET_BIND_SERVICE capability to the loki process 720 which allowed Loki to bind to port 80 as a non root user, so long as the underlying system allowed that 721 linux capability. 722 723 This has proved to be a problem for many reasons and in PR [2294](https://github.com/grafana/loki/pull/2294/files) 724 the capability was removed. 725 726 It is now no longer possible for the Loki to be started with a port less than 1024 with the published docker image. 727 728 The default for Helm has always been port 3100, and Helm users should be unaffect unless they changed the default. 729 730 **Ksonnet users however should closely check their configuration, in PR 2294 the loki port was changed from 80 to 3100** 731 732 733 ### IMPORTANT: If you run Loki in microservices mode, special rollout instructions 734 735 A new ingester GRPC API has been added allowing to speed up metric queries, to ensure a rollout without query errors **make sure you upgrade all ingesters first.** 736 Once this is done you can then proceed with the rest of the deployment, this is to ensure that queriers won't look for an API not yet available. 737 738 If you roll out everything at once, queriers with this new code will attempt to query ingesters which may not have the new method on the API and queries will fail. 739 740 This will only affect reads(queries) and not writes and only for the duration of the rollout. 741 742 ### IMPORTANT: Scrape config changes to both Helm and Ksonnet will affect labels created by Promtail 743 744 PR [2091](https://github.com/grafana/loki/pull/2091) Makes several changes to the Promtail scrape config: 745 746 ```` 747 This is triggered by https://github.com/grafana/jsonnet-libs/pull/261 748 749 The above PR changes the instance label to be actually unique within 750 a scrape config. It also adds a pod and a container target label 751 so that metrics can easily be joined with metrics from cAdvisor, KSM, 752 and the Kubelet. 753 754 This commit adds the same to the Loki scrape config. It also removes 755 the container_name label. It is the same as the container label 756 and was already added to Loki previously. However, the 757 container_name label is deprecated and has disappeared in K8s 1.16, 758 so that it will soon become useless for direct joining. 759 ```` 760 761 TL;DR 762 763 The following label have been changed in both the Helm and Ksonnet Promtail scrape configs: 764 765 `instance` -> `pod` 766 `container_name` -> `container` 767 768 769 ### Experimental boltdb-shipper changes 770 771 PR [2166](https://github.com/grafana/loki/pull/2166) now forces the index to have a period of exactly `24h`: 772 773 Loki will fail to start with an error if the active schema or upcoming schema are not set to a period of `24h` 774 775 You can add a new schema config like this: 776 777 ```yaml 778 schema_config: 779 configs: 780 - from: 2020-01-01 <----- This is your current entry, date will be different 781 store: boltdb-shipper 782 object_store: aws 783 schema: v11 784 index: 785 prefix: index_ 786 period: 168h 787 - from: [INSERT FUTURE DATE HERE] <----- Add another entry, set a future date 788 store: boltdb-shipper 789 object_store: aws 790 schema: v11 791 index: 792 prefix: index_ 793 period: 24h <--- This must be 24h 794 ``` 795 If you are not on `schema: v11` this would be a good opportunity to make that change _in the new schema config_ also. 796 797 **NOTE** If the current time in your timezone is after midnight UTC already, set the date one additional day forward. 798 799 There was also a significant overhaul to how boltdb-shipper internals, this should not be visible to a user but as this 800 feature is experimental and under development bug are possible! 801 802 The most noticeable change if you look in the storage, Loki no longer updates an existing file and instead creates a 803 new index file every 15mins, this is an important move to make sure objects in the object store are immutable and 804 will simplify future operations like compaction and deletion. 805 806 ### Breaking CLI flags changes 807 808 The following CLI flags where changed to improve consistency, they are not expected to be widely used 809 810 ```diff 811 - querier.query_timeout 812 + querier.query-timeout 813 814 - distributor.extra-query-delay 815 + querier.extra-query-delay 816 817 - max-chunk-batch-size 818 + store.max-chunk-batch-size 819 820 - ingester.concurrent-flushed 821 + ingester.concurrent-flushes 822 ``` 823 824 ### Loki Canary metric name changes 825 826 When adding some new features to the canary we realized the existing metrics were not compliant with standards for counter names, the following metrics have been renamed: 827 828 ```nohighlight 829 loki_canary_total_entries -> loki_canary_entries_total 830 loki_canary_out_of_order_entries -> loki_canary_out_of_order_entries_total 831 loki_canary_websocket_missing_entries -> loki_canary_websocket_missing_entries_total 832 loki_canary_missing_entries -> loki_canary_missing_entries_total 833 loki_canary_unexpected_entries -> loki_canary_unexpected_entries_total 834 loki_canary_duplicate_entries -> loki_canary_duplicate_entries_total 835 loki_canary_ws_reconnects -> loki_canary_ws_reconnects_total 836 loki_canary_response_latency -> loki_canary_response_latency_seconds 837 ``` 838 839 ### Ksonnet Changes 840 841 In `production/ksonnet/loki/config.libsonnet` the variable `storage_backend` used to have a default value of `'bigtable,gcs'`. 842 This has been changed to providing no default and will error if not supplied in your environment jsonnet, 843 here is an example of what you should add to have the same behavior as the default (namespace and cluster should already be defined): 844 845 ```jsonnet 846 _config+:: { 847 namespace: 'loki-dev', 848 cluster: 'us-central1', 849 storage_backend: 'gcs,bigtable', 850 ``` 851 852 Defaulting to `gcs,bigtable` was confusing for anyone using ksonnet with other storage backends as it would manifest itself with obscure bigtable errors. 853 854 ## 1.5.0 855 856 Note: The required upgrade path outlined for version 1.4.0 below is still true for moving to 1.5.0 from any release older than 1.4.0 (e.g. 1.3.0->1.5.0 needs to also look at the 1.4.0 upgrade requirements). 857 858 ### Breaking config changes! 859 860 Loki 1.5.0 vendors Cortex v1.0.0 (congratulations!), which has a [massive list of changes](https://cortexmetrics.io/docs/changelog/#1-0-0-2020-04-02). 861 862 While changes in the command line flags affect Loki as well, we usually recommend people to use configuration file instead. 863 864 Cortex has done lot of cleanup in the configuration files, and you are strongly urged to take a look at the [annotated diff for config file](https://cortexmetrics.io/docs/changelog/#config-file-breaking-changes) before upgrading to Loki 1.5.0. 865 866 Following fields were removed from YAML configuration completely: `claim_on_rollout` (always true), `normalise_tokens` (always true). 867 868 #### Test Your Config 869 870 To see if your config needs to change, one way to quickly test is to download a 1.5.0 (or newer) binary from the [release page](https://github.com/grafana/loki/releases/tag/v1.5.0) 871 872 Then run the binary providing your config file `./loki-linux-amd64 -config.file=myconfig.yaml` 873 874 If there are configs which are no longer valid you will see errors immediately: 875 876 ```shell 877 ./loki-linux-amd64 -config.file=loki-local-config.yaml 878 failed parsing config: loki-local-config.yaml: yaml: unmarshal errors: 879 line 35: field dynamodbconfig not found in type aws.StorageConfig 880 ``` 881 882 Referencing the [list of diffs](https://cortexmetrics.io/docs/changelog/#config-file-breaking-changes) I can see this config changed: 883 884 ```diff 885 - dynamodbconfig: 886 + dynamodb: 887 ``` 888 889 Also several other AWS related configs changed and would need to udpate those as well. 890 891 892 ### Loki Docker Image User and File Location Changes 893 894 To improve security concerns, in 1.5.0 the Docker container no longer runs the loki process as `root` and instead the process runs as user `loki` with UID `10001` and GID `10001` 895 896 This may affect people in a couple ways: 897 898 #### Loki Port 899 900 If you are running Loki with a config that opens a port number above 1024 (which is the default, 3100 for HTTP and 9095 for GRPC) everything should work fine in regards to ports. 901 902 If you are running Loki with a config that opens a port number less than 1024 Linux normally requires root permissions to do this, HOWEVER in the Docker container we run `setcap cap_net_bind_service=+ep /usr/bin/loki` 903 904 This capability lets the loki process bind to a port less than 1024 when run as a non root user. 905 906 Not every environment will allow this capability however, it's possible to restrict this capability in linux. If this restriction is in place, you will be forced to run Loki with a config that has HTTP and GRPC ports above 1024. 907 908 #### Filesystem 909 910 **Please note the location Loki is looking for files with the provided config in the docker image has changed** 911 912 In 1.4.0 and earlier the included config file in the docker container was using directories: 913 914 ``` 915 /tmp/loki/index 916 /tmp/loki/chunks 917 ``` 918 919 In 1.5.0 this has changed: 920 921 ``` 922 /loki/index 923 /loki/chunks 924 ``` 925 926 This will mostly affect anyone using docker-compose or docker to run Loki and are specifying a volume to persist storage. 927 928 **There are two concerns to track here, one is the correct ownership of the files and the other is making sure your mounts updated to the new location.** 929 930 One possible upgrade path would look like this: 931 932 If I were running Loki with this command `docker run -d --name=loki --mount source=loki-data,target=/tmp/loki -p 3100:3100 grafana/loki:1.4.0` 933 934 This would mount a docker volume named `loki-data` to the `/tmp/loki` folder which is where Loki will persist the `index` and `chunks` folder in 1.4.0 935 936 To move to 1.5.0 I can do the following (please note that your container names and paths and volumes etc may be different): 937 938 ``` 939 docker stop loki 940 docker rm loki 941 docker run --rm --name="loki-perm" -it --mount source=loki-data,target=/mnt ubuntu /bin/bash 942 cd /mnt 943 chown -R 10001:10001 ./* 944 exit 945 docker run -d --name=loki --mount source=loki-data,target=/loki -p 3100:3100 grafana/loki:1.5.0 946 ``` 947 948 Notice the change in the `target=/loki` for 1.5.0 to the new data directory location specified in the [included Loki config file](https://github.com/grafana/loki/tree/master/cmd/loki/loki-docker-config.yaml). 949 950 The intermediate step of using an ubuntu image to change the ownership of the Loki files to the new user might not be necessary if you can easily access these files to run the `chown` command directly. 951 That is if you have access to `/var/lib/docker/volumes` or if you mounted to a different local filesystem directory, you can change the ownership directly without using a container. 952 953 954 ### Loki Duration Configs 955 956 If you get an error like: 957 958 ```nohighlight 959 ./loki-linux-amd64-1.5.0 -log.level=debug -config.file=/etc/loki/config.yml 960 failed parsing config: /etc/loki/config.yml: not a valid duration string: "0" 961 ``` 962 963 This is because of some underlying changes that no longer allow durations without a unit. 964 965 Unfortunately the yaml parser doesn't give a line number but it's likely to be one of these two: 966 967 ```yaml 968 chunk_store_config: 969 max_look_back_period: 0s # DURATION VALUES MUST HAVE A UNIT EVEN IF THEY ARE ZERO 970 971 table_manager: 972 retention_deletes_enabled: false 973 retention_period: 0s # DURATION VALUES MUST HAVE A UNIT EVEN IF THEY ARE ZERO 974 ``` 975 976 ### Promtail Config Changes 977 978 The underlying backoff library used in Promtail had a config change which wasn't originally noted in the release notes: 979 980 If you get this error: 981 982 ```nohighlight 983 Unable to parse config: /etc/promtail/promtail.yaml: yaml: unmarshal errors: 984 line 3: field maxbackoff not found in type util.BackoffConfig 985 line 4: field maxretries not found in type util.BackoffConfig 986 line 5: field minbackoff not found in type util.BackoffConfig 987 ``` 988 989 The new values are: 990 991 ```yaml 992 min_period: 993 max_period: 994 max_retries: 995 ``` 996 997 ## 1.4.0 998 999 Loki 1.4.0 vendors Cortex v0.7.0-rc.0 which contains [several breaking config changes](https://github.com/cortexproject/cortex/blob/v0.7.0-rc.0/CHANGELOG). 1000 1001 One such config change which will affect Loki users: 1002 1003 In the [cache_config](../../configuration#cache_config): 1004 1005 `defaul_validity` has changed to `default_validity` 1006 1007 Also in the unlikely case you were configuring your schema via arguments and not a config file, this is no longer supported. This is not something we had ever provided as an option via docs and is unlikely anyone is doing, but worth mentioning. 1008 1009 The other config changes should not be relevant to Loki. 1010 1011 ### Required Upgrade Path 1012 1013 The newly vendored version of Cortex removes code related to de-normalized tokens in the ring. What you need to know is this: 1014 1015 *Note:* A "shared ring" as mentioned below refers to using *consul* or *etcd* for values in the following config: 1016 1017 ```yaml 1018 kvstore: 1019 # The backend storage to use for the ring. Supported values are 1020 # consul, etcd, inmemory 1021 store: <string> 1022 ``` 1023 1024 - Running without using a shared ring (inmemory): No action required 1025 - Running with a shared ring and upgrading from v1.3.0 -> v1.4.0: No action required 1026 - Running with a shared ring and upgrading from any version less than v1.3.0 (e.g. v1.2.0) -> v1.4.0: **ACTION REQUIRED** 1027 1028 There are two options for upgrade if you are not on version 1.3.0 and are using a shared ring: 1029 1030 - Upgrade first to v1.3.0 **BEFORE** upgrading to v1.4.0 1031 1032 OR 1033 1034 **Note:** If you are running a single binary you only need to add this flag to your single binary command. 1035 1036 1. Add the following configuration to your ingesters command: `-ingester.normalise-tokens=true` 1037 1. Restart your ingesters with this config 1038 1. Proceed with upgrading to v1.4.0 1039 1. Remove the config option (only do this after everything is running v1.4.0) 1040 1041 **Note:** It's also possible to enable this flag via config file, see the [`lifecycler_config`](https://github.com/grafana/loki/tree/v1.3.0/docs/configuration#lifecycler_config) configuration option. 1042 1043 If using the Helm Loki chart: 1044 1045 ```yaml 1046 extraArgs: 1047 ingester.normalise-tokens: true 1048 ``` 1049 1050 If using the Helm Loki-Stack chart: 1051 1052 ```yaml 1053 loki: 1054 extraArgs: 1055 ingester.normalise-tokens: true 1056 ``` 1057 1058 #### What will go wrong 1059 1060 If you attempt to add a v1.4.0 ingester to a ring created by Loki v1.2.0 or older which does not have the commandline argument `-ingester.normalise-tokens=true` (or configured via [config file](https://github.com/grafana/loki/tree/v1.3.0/docs/configuration#lifecycler_config)), the v1.4.0 ingester will remove all the entries in the ring for all the other ingesters as it cannot "see" them. 1061 1062 This will result in distributors failing to write and a general ingestion failure for the system. 1063 1064 If this happens to you, you will want to rollback your deployment immediately. You need to remove the v1.4.0 ingester from the ring ASAP, this should allow the existing ingesters to re-insert their tokens. You will also want to remove any v1.4.0 distributors as they will not understand the old ring either and will fail to send traffic.