github.com/ferranbt/nomad@v0.9.3-0.20190607002617-85c449b7667c/website/source/guides/operating-a-job/advanced-scheduling/spread.html.md

github.com/ferranbt/nomad@v0.9.3-0.20190607002617-85c449b7667c/website/source/guides/operating-a-job/advanced-scheduling/spread.html.md (about)

     1  ---
     2  layout: "guides"
     3  page_title: "Spread"
     4  sidebar_current: "guides-advanced-scheduling"
     5  description: |-
     6    The following guide walks the user through using the spread stanza in Nomad.
     7  ---
     8  
     9  # Increasing Failure Tolerance with Spread
    10  
    11  The Nomad scheduler uses a bin packing algorithm when making job placements on nodes to optimize resource utilization and density of applications. Although bin packing ensures optimal resource utilization, it can lead to some nodes carrying a majority of allocations for a given job. This can cause cascading failures where the failure of a single node or a single data center can lead to application unavailability.
    12  
    13  The [spread stanza][spread-stanza] solves this problem by allowing operators to distribute their workloads in a customized way based on [attributes][attributes] and/or [client metadata][client-metadata]. By using spread criteria in their job specification, Nomad job operators can ensure that failures across a domain such as datacenter or rack don't affect application availability.
    14  
    15  ## Reference Material
    16  
    17  - The [spread][spread-stanza] stanza documentation
    18  - [Scheduling][scheduling] with Nomad
    19  
    20  ## Estimated Time to Complete
    21  
    22  20 minutes
    23  
    24  ## Challenge
    25  
    26  Consider a Nomad application that needs to be deployed to multiple datacenters within a region. Datacenter `dc1` has four nodes while `dc2` has one node. This application has 10 instances and 7 of them must be deployed to `dc1` since it receives more user traffic and we need to make sure the application doesn't suffer downtime due to not enough running instances to process requests. The remaining 3 allocations can be deployed to `dc2`.
    27  
    28  ## Solution
    29  
    30  Use the `spread` stanza in the Nomad [job specification][job-specification] to ensure the 70% of the workload is being placed in datacenter `dc1` and 30% is being placed in `dc2`. The Nomad operator can use the [percent][percent] option with a [target][target] to customize the spread.
    31  
    32  ## Prerequisites
    33  
    34  To perform the tasks described in this guide, you need to have a Nomad
    35  environment with Consul installed. You can use this [repo](https://github.com/hashicorp/nomad/tree/master/terraform#provision-a-nomad-cluster-in-the-cloud) to easily provision a sandbox environment. This guide will assume a cluster with one server node and five client nodes.
    36  
    37  -> **Please Note:** This guide is for demo purposes and is only using a single
    38  server
    39  node. In a production cluster, 3 or 5 server nodes are recommended.
    40  
    41  ## Steps
    42  
    43  ### Step 1: Place One of the Client Nodes in a Different Datacenter
    44  
    45  We are going to customize the spread for our job placement between the datacenters our nodes are located in. Choose one of your client nodes and edit `/etc/nomad.d/nomad.hcl` to change its location to `dc2`. A snippet of an example configuration file is show below with the required change is shown below.
    46  
    47  ```shell
    48  data_dir = "/opt/nomad/data"
    49  bind_addr = "0.0.0.0"
    50  datacenter = "dc2"
    51  
    52  # Enable the client
    53  client {
    54    enabled = true
    55  ...
    56  ```
    57  After making the change on your chosen client node, restart the Nomad service
    58  
    59  ```shell
    60  $ sudo systemctl restart nomad
    61  ```
    62  
    63  If everything worked correctly, you should be able to run the `nomad` [node status][node-status] command and see that one of your nodes is now in datacenter `dc2`.
    64  
    65  ```shell
    66  $ nomad node status
    67  ID        DC   Name              Class   Drain  Eligibility  Status
    68  5d16d949  dc2  ip-172-31-62-240  <none>  false  eligible     ready
    69  7b381152  dc1  ip-172-31-59-115  <none>  false  eligible     ready
    70  10cc48cc  dc1  ip-172-31-58-46   <none>  false  eligible     ready
    71  93f1e628  dc1  ip-172-31-58-113  <none>  false  eligible     ready
    72  12894b80  dc1  ip-172-31-62-90   <none>  false  eligible     ready
    73  ```
    74  
    75  ### Step 2: Create a Job with the `spread` Stanza
    76  
    77  Create a file with the name `redis.nomad` and place the following content in it:
    78  
    79  ```hcl
    80  job "redis" {
    81   datacenters = ["dc1", "dc2"]
    82   type = "service"
    83  
    84   spread {
    85     attribute = "${node.datacenter}"
    86     weight = 100
    87     target "dc1" {
    88       percent = 70
    89     }
    90     target "dc2" {
    91       percent = 30
    92     }
    93   }
    94  
    95   group "cache1" {
    96     count = 10
    97  
    98     task "redis" {
    99       driver = "docker"
   100  
   101       config {
   102         image = "redis:latest"
   103         port_map {
   104           db = 6379
   105         }
   106       }
   107  
   108       resources {
   109         network {
   110           port "db" {}
   111         }
   112       }
   113  
   114       service {
   115         name = "redis-cache"
   116         port = "db"
   117         check {
   118           name     = "alive"
   119           type     = "tcp"
   120           interval = "10s"
   121           timeout  = "2s"
   122         }
   123       }
   124     }
   125   }
   126  }
   127  ```
   128  Note that we used the `spread` stanza and specified the [datacenter][attributes]
   129  attribute while targeting `dc1` and `dc2` with the percent options. This will tell the Nomad scheduler to make an attempt to distribute 70% of the workload on `dc1` and 30% of the workload on `dc2`.
   130  
   131  ### Step 3: Register the Job `redis.nomad`
   132  
   133  Run the Nomad job with the following command:
   134  
   135  ```shell
   136  $ nomad run redis.nomad
   137  ==> Monitoring evaluation "c3dc5ebd"
   138      Evaluation triggered by job "redis"
   139      Allocation "7a374183" created: node "5d16d949", group "cache1"
   140      Allocation "f4361df1" created: node "7b381152", group "cache1"
   141      Allocation "f7af42dc" created: node "5d16d949", group "cache1"
   142      Allocation "0638edf2" created: node "10cc48cc", group "cache1"
   143      Allocation "49bc6038" created: node "12894b80", group "cache1"
   144      Allocation "c7e5679a" created: node "5d16d949", group "cache1"
   145      Allocation "cf91bf65" created: node "7b381152", group "cache1"
   146      Allocation "d16b606c" created: node "12894b80", group "cache1"
   147      Allocation "27866df0" created: node "93f1e628", group "cache1"
   148      Allocation "8531a6fc" created: node "7b381152", group "cache1"
   149      Evaluation status changed: "pending" -> "complete"
   150  ```
   151  
   152  Note that three of the ten allocations have been placed on node `5d16d949`. This is the node we configured to be in datacenter `dc2`. The Nomad scheduler has distributed 30% of the workload to `dc2` as we specified in the `spread` stanza.
   153  
   154  Keep in mind that the Nomad scheduler still factors in other components into the overall scoring of nodes when making placements, so you should not expect the spread stanza to strictly implement your distribution preferences like a [constraint][constraint-stanza]. We will take a detailed look at the scoring in the next few steps.
   155  
   156  ### Step 4: Check the Status of the `redis` Job
   157  
   158  At this point, we are going to check the status of our job and verify where our allocations have been placed. Run the following command:
   159  
   160  ```shell
   161  $ nomad status redis
   162  ```
   163  
   164  You should see 10 instances of your job running in the `Summary` section of the output as show below:
   165  
   166  ```shell
   167  ...
   168  Summary
   169  Task Group  Queued  Starting  Running  Failed  Complete  Lost
   170  cache1      0       0         10       0       0         0
   171  
   172  Allocations
   173  ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
   174  0638edf2  10cc48cc  cache1      0        run      running  2m20s ago  2m ago
   175  27866df0  93f1e628  cache1      0        run      running  2m20s ago  1m57s ago
   176  49bc6038  12894b80  cache1      0        run      running  2m20s ago  1m58s ago
   177  7a374183  5d16d949  cache1      0        run      running  2m20s ago  2m1s ago
   178  8531a6fc  7b381152  cache1      0        run      running  2m20s ago  2m2s ago
   179  c7e5679a  5d16d949  cache1      0        run      running  2m20s ago  1m55s ago
   180  cf91bf65  7b381152  cache1      0        run      running  2m20s ago  1m57s ago
   181  d16b606c  12894b80  cache1      0        run      running  2m20s ago  2m1s ago
   182  f4361df1  7b381152  cache1      0        run      running  2m20s ago  2m3s ago
   183  f7af42dc  5d16d949  cache1      0        run      running  2m20s ago  1m54s ago
   184  ```
   185  
   186  You can cross-check this output with the results of the `nomad node status` command to verify that 30% of your workload has been placed on the node in `dc2` (in our case, that node is `5d16d949`).
   187  
   188  ### Step 5: Obtain Detailed Scoring Information on Job Placement
   189  
   190  The Nomad scheduler will not always spread your
   191  workload in the way you have specified in the `spread` stanza even if the
   192  resources are available. This is because spread scoring is factored in with
   193  other metrics as well before making a scheduling decision. In this step, we will take a look at some of those other factors.
   194  
   195  Using the output from the previous step, take any allocation that has been placed on a node and use the nomad [alloc status][alloc status] command with the [verbose][verbose] option to obtain detailed scoring information on it. In this example, we will use the allocation ID `0638edf2` (your allocation IDs will be different).
   196  
   197  ```shell
   198  $ nomad alloc status -verbose 0638edf2 
   199  ``` 
   200  The resulting output will show the `Placement Metrics` section at the bottom.
   201  
   202  ```shell
   203  ...
   204  Placement Metrics
   205  Node                                  node-affinity  allocation-spread  binpack  job-anti-affinity  node-reschedule-penalty  final score
   206  10cc48cc-2913-af54-74d5-d7559f373ff2  0              0.429              0.33     0                  0                        0.379
   207  93f1e628-e509-b1ab-05b7-0944056f781d  0              0.429              0.515    -0.2               0                        0.248
   208  12894b80-4943-4d5c-5716-c626c6b99be3  0              0.429              0.515    -0.2               0                        0.248
   209  7b381152-3802-258b-4155-6d7dfb344dd4  0              0.429              0.515    -0.2               0                        0.248
   210  5d16d949-85aa-3fd3-b5f4-51094cbeb77a  0              0.333              0.515    -0.2               0                        0.216
   211  ```
   212  
   213  Note that the results from the `allocation-spread`, `binpack`, `job-anti-affinity`, `node-reschedule-penalty`, and `node-affinity` columns are combined to produce the numbers listed in the `final score` column for each node. The Nomad scheduler uses the final score for each node in deciding where to make placements.
   214  
   215  ## Next Steps
   216  
   217  Change the values of the `percent` options on your targets in the `spread` stanza and observe how the placement behavior along with the final score given to each node changes (use the `nomad alloc status` command as shown in the previous step).
   218  
   219  [alloc status]: /docs/commands/alloc/status.html
   220  [attributes]: /docs/runtime/interpolation.html#node-variables-
   221  [client-metadata]: /docs/configuration/client.html#meta
   222  [constraint-stanza]: /docs/job-specification/constraint.html
   223  [job-specification]: /docs/job-specification/index.html
   224  [node-status]: /docs/commands/node/status.html
   225  [percent]: /docs/job-specification/spread.html#percent
   226  [spread-stanza]: /docs/job-specification/spread.html
   227  [scheduling]: /docs/internals/scheduling/scheduling.html
   228  [target]: /docs/job-specification/spread.html#target
   229  [verbose]: /docs/commands/alloc/status.html#verbose