github.com/smintz/nomad@v0.8.3/website/source/docs/internals/scheduling.html.md (about) 1 --- 2 layout: "docs" 3 page_title: "Scheduling" 4 sidebar_current: "docs-internals-scheduling" 5 description: |- 6 Learn about how scheduling works in Nomad. 7 --- 8 9 # Scheduling 10 11 Scheduling is a core function of Nomad. It is the process of assigning tasks 12 from jobs to client machines. This process must respect the constraints as 13 declared in the job, and optimize for resource utilization. This page documents 14 the details of how scheduling works in Nomad to help both users and developers 15 build a mental model. The design is heavily inspired by Google's work on both 16 [Omega: flexible, scalable schedulers for large compute clusters][Omega] and 17 [Large-scale cluster management at Google with Borg][Borg]. 18 19 ~> **Advanced Topic!** This page covers technical details of Nomad. You do not 20 ~> need to understand these details to effectively use Nomad. The details are 21 ~> documented here for those who wish to learn about them without having to 22 ~> go spelunking through the source code. 23 24 # Scheduling in Nomad 25 26 [![Nomad Data Model][img-data-model]][img-data-model] 27 28 There are four primary "nouns" in Nomad; jobs, nodes, allocations, and 29 evaluations. Jobs are submitted by users and represent a _desired state_. A job 30 is a declarative description of tasks to run which are bounded by constraints 31 and require resources. Tasks can be scheduled on nodes in the cluster running 32 the Nomad client. The mapping of tasks in a job to clients is done using 33 allocations. An allocation is used to declare that a set of tasks in a job 34 should be run on a particular node. Scheduling is the process of determining 35 the appropriate allocations and is done as part of an evaluation. 36 37 An evaluation is created any time the external state, either desired or 38 emergent, changes. The desired state is based on jobs, meaning the desired 39 state changes if a new job is submitted, an existing job is updated, or a job 40 is deregistered. The emergent state is based on the client nodes, and so we 41 must handle the failure of any clients in the system. These events trigger the 42 creation of a new evaluation, as Nomad must _evaluate_ the state of the world 43 and reconcile it with the desired state. 44 45 This diagram shows the flow of an evaluation through Nomad: 46 47 [![Nomad Evaluation Flow][img-eval-flow]][img-eval-flow] 48 49 The lifecycle of an evaluation begins with an event causing the evaluation to 50 be created. Evaluations are created in the `pending` state and are enqueued 51 into the evaluation broker. There is a single evaluation broker which runs on 52 the leader server. The evaluation broker is used to manage the queue of pending 53 evaluations, provide priority ordering, and ensure at least once delivery. 54 55 Nomad servers run scheduling workers, defaulting to one per CPU core, which are 56 used to process evaluations. The workers dequeue evaluations from the broker, 57 and then invoke the appropriate scheduler as specified by the job. Nomad ships 58 with a `service` scheduler that optimizes for long-lived services, a `batch` 59 scheduler that is used for fast placement of batch jobs, a `system` scheduler 60 that is used to run jobs on every node, and a `core` scheduler which is used 61 for internal maintenance. Nomad can be extended to support custom schedulers as 62 well. 63 64 Schedulers are responsible for processing an evaluation and generating an 65 allocation _plan_. The plan is the set of allocations to evict, update, or 66 create. The specific logic used to generate a plan may vary by scheduler, but 67 generally the scheduler needs to first reconcile the desired state with the 68 real state to determine what must be done. New allocations need to be placed 69 and existing allocations may need to be updated, migrated, or stopped. 70 71 Placing allocations is split into two distinct phases, feasibility checking and 72 ranking. In the first phase the scheduler finds nodes that are feasible by 73 filtering unhealthy nodes, those missing necessary drivers, and those failing 74 the specified constraints. 75 76 The second phase is ranking, where the scheduler scores feasible nodes to find 77 the best fit. Scoring is primarily based on bin packing, which is used to 78 optimize the resource utilization and density of applications, but is also 79 augmented by affinity and anti-affinity rules. Nomad automatically applies a job 80 anti-affinity rule which discourages colocating multiple instances of a task 81 group. The combination of this anti-affinity and bin packing optimizes for 82 density while reducing the probability of correlated failures. 83 84 Once the scheduler has ranked enough nodes, the highest ranking node is 85 selected and added to the allocation plan. 86 87 When planning is complete, the scheduler submits the plan to the leader which 88 adds the plan to the plan queue. The plan queue manages pending plans, provides 89 priority ordering, and allows Nomad to handle concurrency races. Multiple 90 schedulers are running in parallel without locking or reservations, making 91 Nomad optimistically concurrent. As a result, schedulers might overlap work on 92 the same node and cause resource over-subscription. The plan queue allows the 93 leader node to protect against this and do partial or complete rejections of a 94 plan. 95 96 As the leader processes plans, it creates allocations when there is no conflict 97 and otherwise informs the scheduler of a failure in the plan result. The plan 98 result provides feedback to the scheduler, allowing it to terminate or explore 99 alternate plans if the previous plan was partially or completely rejected. 100 101 Once the scheduler has finished processing an evaluation, it updates the status 102 of the evaluation and acknowledges delivery with the evaluation broker. This 103 completes the lifecycle of an evaluation. Allocations that were created, 104 modified or deleted as a result will be picked up by client nodes and will 105 begin execution. 106 107 [Omega]: https://research.google.com/pubs/pub41684.html 108 [Borg]: https://research.google.com/pubs/pub43438.html 109 [img-data-model]: /assets/images/nomad-data-model.png 110 [img-eval-flow]: /assets/images/nomad-evaluation-flow.png