github.com/ferranbt/nomad@v0.9.3-0.20190607002617-85c449b7667c/website/source/docs/internals/scheduling/preemption.html.md (about)

     1  ---
     2  layout: "docs"
     3  page_title: "Preemption"
     4  sidebar_current: "docs-internals-scheduling-preemption"
     5  description: |-
     6    Learn about how preemption works in Nomad.
     7  ---
     8  
     9  # Preemption
    10  
    11  Preemption allows Nomad to kill existing allocations in order to place allocations for a higher priority job.
    12  The evicted allocation is temporary displaced until the cluster has capacity to run it. This allows operators to
    13  run high priority jobs even under resource contention across the cluster.
    14  
    15  
    16  ~> **Advanced Topic!** This page covers technical details of Nomad. You do not
    17  ~> need to understand these details to effectively use Nomad. The details are
    18  ~> documented here for those who wish to learn about them without having to
    19  ~> go spelunking through the source code.
    20  
    21  # Preemption in Nomad
    22  
    23  Every job in Nomad has a priority associated with it. Priorities impact scheduling at the evaluation and planning
    24  stages by sorting the respective queues accordingly (higher priority jobs get moved ahead in the queues).
    25  
    26  Prior to Nomad 0.9, when a cluster is at capacity, any allocations that result from a newly scheduled or updated
    27  job remain in the pending state until sufficient resources become available - regardless of the defined priority.
    28  This leads to priority inversion, where a low priority task can prevent high priority tasks from completing.
    29  
    30  Nomad 0.9 brings preemption capabilities to system jobs. The Nomad scheduler will evict lower priority running allocations
    31  to free up capacity for new allocations resulting from relatively higher priority jobs, sending evicted allocations back
    32  into the plan queue.
    33  
    34  # Details
    35  
    36  Preemption is enabled by default in Nomad 0.9. Operators can use the [scheduler config](/api/operator.html#update-scheduler-configuration) API endpoint to disable preemption.
    37  
    38  Nomad uses the [job priority](/docs/job-specification/job.html#priority) field to determine what running allocations can be preempted.
    39  In order to prevent a cascade of preemptions due to jobs close in priority being preempted, only allocations from jobs with a priority
    40  delta of more than 10 from the job needing placement are eligible for preemption.
    41  
    42  For example, consider a node with the following distribution of allocations:
    43  
    44  | Job           | Priority      | Allocations  | Total Used capacity |
    45  | ------------- |-------------| --------------   |------------
    46  | cache         | 70 | a6        |  2 GB Memory, 0.5 GB Disk, 1 CPU
    47  | batch-analytics|  50     |   a4, a5       | <1 GB Memory, 0.5 GB Disk, 0.5 CPU>, <1 GB Memory, 0.5 GB Disk, 0.5 CPU>
    48  | email-marketing |   20   |    a1, a2        | <0.5 GB Memory, 0.8 GB Disk>, <0.5 GB Memory, 0.2 GB Disk>
    49  
    50  If a job `webapp` with priority `75` needs placement on the above node, only allocations from `batch-analytics` and `email-marketing` are considered
    51  eligible to be preempted because they are of a lower priority. Allocations from the `cache` job will never be preempted because its priority value `70`
    52  is lesser than the required delta of `10`.
    53  
    54  Allocations are selected starting from the lowest priority, and scored according
    55  to how closely they fit the job's required capacity. For example, if the `75` priority job needs 1GB disk and 2GB memory, Nomad will preempt
    56  allocations `a1`, `a2` and `a4` to satisfy those requirements.
    57  
    58  # Preemption Visibility
    59  
    60  Operators can use the [allocation API](/api/allocations.html#read-allocation) or the `alloc status` command to get visibility into
    61  whether an allocation has been preempted. Preempted allocations will have their DesiredStatus set to “evict”. The `Allocation` object
    62  in the API also has two additional fields related to preemption.
    63  
    64  - `PreemptedAllocs` - This field is set on an allocation that caused preemption. It contains the allocation ids of allocations
    65    that were preempted to place this allocation. In the above example, allocations created for the job `webapp` will have the values
    66    `a1`, `a2` and `a4` set.
    67  - `PreemptedByAllocID` - This field is set on allocations that were preempted by the scheduler. It contains the allocation ID of the allocation
    68    that preempted it. In the above example, allocations `a1`, `a2` and `a4` will have this field set to the ID of the allocation from the job `webapp`.
    69  
    70  # Integration with Nomad plan
    71  
    72  `nomad plan` allows operators to dry run the scheduler. If the scheduler determines that
    73  preemption is necessary to place the job, it shows additional information in the CLI output for
    74  `nomad plan` as seen below.
    75  
    76  ```sh
    77  $ nomad plan example.nomad
    78  
    79  + Job: "test"
    80  + Task Group: "test" (1 create)
    81    + Task: "test" (forces create)
    82  
    83  Scheduler dry-run:
    84  - All tasks successfully allocated.
    85  
    86  Preemptions:
    87  
    88  Alloc ID                              Job ID    Task Group
    89  ddef9521                              my-batch   analytics
    90  ae59fe45                              my-batch   analytics
    91  ```
    92  
    93  Note that, the allocations shown in the `nomad plan` output above
    94  are not guaranteed to be the same ones picked when running the job later.
    95  They provide the operator a sample of the type of allocations that could be preempted.
    96  
    97  [Omega]: https://research.google.com/pubs/pub41684.html
    98  [Borg]: https://research.google.com/pubs/pub43438.html
    99  [img-data-model]: /assets/images/nomad-data-model.png
   100  [img-eval-flow]: /assets/images/nomad-evaluation-flow.png