github.com/ncodes/nomad@v0.5.7-0.20170403112158-97adf4a74fb3/website/source/docs/operating-a-job/inspecting-state.html.md (about) 1 --- 2 layout: "docs" 3 page_title: "Inspecting State - Operating a Job" 4 sidebar_current: "docs-operating-a-job-inspecting-state" 5 description: |- 6 Nomad exposes a number of tools and techniques for inspecting a running job. 7 This is helpful in ensuring the job started successfully. Additionally, it 8 can inform us of any errors that occurred while starting the job. 9 --- 10 11 # Inspecting State 12 13 A successful job submission is not an indication of a successfully-running job. 14 This is the nature of a highly-optimistic scheduler. A successful job submission 15 means the server was able to issue the proper scheduling commands. It does not 16 indicate the job is actually running. To verify the job is running, we need to 17 inspect its state. 18 19 This section will utilize the job named "docs" from the [previous 20 sections](/docs/operating-a-job/submitting-jobs.html), but these operations 21 and command largely apply to all jobs in Nomad. 22 23 ## Job Status 24 25 After a job is submitted, you can query the status of that job using the status 26 command: 27 28 ```shell 29 $ nomad status 30 ``` 31 32 Here is some sample output: 33 34 ```text 35 ID Type Priority Status 36 docs service 50 running 37 ``` 38 39 At a high level, we can see that our job is currently running, but what does 40 "running" actually mean. By supplying the name of a job to the status command, 41 we can ask Nomad for more detailed job information: 42 43 ```shell 44 $ nomad status docs 45 ``` 46 47 Here is some sample output 48 49 ```text 50 ID = docs 51 Name = docs 52 Type = service 53 Priority = 50 54 Datacenters = dc1 55 Status = running 56 Periodic = false 57 58 Summary 59 Task Group Queued Starting Running Failed Complete Lost 60 example 0 0 3 0 0 0 61 62 Allocations 63 ID Eval ID Node ID Task Group Desired Status Created At 64 04d9627d 42d788a3 a1f934c9 example run running <timestamp> 65 e7b8d4f5 42d788a3 012ea79b example run running <timestamp> 66 5cbf23a1 42d788a3 1e1aa1e0 example run running <timestamp> 67 ``` 68 69 Here we can see that there are three instances of this task running, each with 70 its own allocation. For more information on the `status` command, please see the 71 [CLI documentation for <tt>status</tt>](/docs/commands/status.html). 72 73 ## Evaluation Status 74 75 You can think of an evaluation as a submission to the scheduler. An example 76 below shows status output for a job where some allocations were placed 77 successfully, but did not have enough resources to place all of the desired 78 allocations. 79 80 If we issue the status command with the `-evals` flag, we could see there is an 81 outstanding evaluation for this hypothetical job: 82 83 ```text 84 $ nomad status -evals docs 85 ID = docs 86 Name = docs 87 Type = service 88 Priority = 50 89 Datacenters = dc1 90 Status = running 91 Periodic = false 92 93 Evaluations 94 ID Priority Triggered By Status Placement Failures 95 5744eb15 50 job-register blocked N/A - In Progress 96 8e38e6cf 50 job-register complete true 97 98 Placement Failure 99 Task Group "example": 100 * Resources exhausted on 1 nodes 101 * Dimension "cpu exhausted" exhausted on 1 nodes 102 103 Allocations 104 ID Eval ID Node ID Task Group Desired Status Created At 105 12681940 8e38e6cf 4beef22f example run running <timestamp> 106 395c5882 8e38e6cf 4beef22f example run running <timestamp> 107 4d7c6f84 8e38e6cf 4beef22f example run running <timestamp> 108 843b07b8 8e38e6cf 4beef22f example run running <timestamp> 109 a8bc6d3e 8e38e6cf 4beef22f example run running <timestamp> 110 b0beb907 8e38e6cf 4beef22f example run running <timestamp> 111 da21c1fd 8e38e6cf 4beef22f example run running <timestamp> 112 ``` 113 114 In the above example we see that the job has a "blocked" evaluation that is in 115 progress. When Nomad can not place all the desired allocations, it creates a 116 blocked evaluation that waits for more resources to become available. 117 118 The `eval-status` command enables us to examine any evaluation in more detail. 119 For the most part this should never be necessary but can be useful to see why 120 all of a job's allocations were not placed. For example if we run it on the job 121 named docs, which had a placement failure according to the above output, we 122 might see: 123 124 ```text 125 $ nomad eval-status 8e38e6cf 126 ID = 8e38e6cf 127 Status = complete 128 Status Description = complete 129 Type = service 130 TriggeredBy = job-register 131 Job ID = docs 132 Priority = 50 133 Placement Failures = true 134 135 Failed Placements 136 Task Group "example" (failed to place 3 allocations): 137 * Resources exhausted on 1 nodes 138 * Dimension "cpu exhausted" exhausted on 1 nodes 139 140 Evaluation "5744eb15" waiting for additional capacity to place remainder 141 ``` 142 143 For more information on the `eval-status` command, please see the [CLI documentation for <tt>eval-status</tt>](/docs/commands/eval-status.html). 144 145 ## Allocation Status 146 147 You can think of an allocation as an instruction to schedule. Just like an 148 application or service, an allocation has logs and state. The `alloc-status` 149 command gives us the most recent events that occurred for a task, its resource 150 usage, port allocations and more: 151 152 ```text 153 $ nomad alloc-status 04d9627d 154 ID = 04d9627d 155 Eval ID = 42d788a3 156 Name = docs.example[2] 157 Node ID = a1f934c9 158 Job ID = docs 159 Client Status = running 160 161 Task "server" is "running" 162 Task Resources 163 CPU Memory Disk IOPS Addresses 164 0/100 MHz 728 KiB/10 MiB 300 MiB 0 http: 10.1.1.196:5678 165 166 Recent Events: 167 Time Type Description 168 10/09/16 00:36:06 UTC Started Task started by client 169 10/09/16 00:36:05 UTC Received Task received by client 170 ``` 171 172 The `alloc-status` command is a good starting to point for debugging an 173 application that did not start. Hypothetically assume a user meant to start a 174 Docker container named "redis:2.8", but accidentally put a comma instead of a 175 period, typing "redis:2,8". 176 177 When the job is executed, it produces a failed allocation. The `alloc-status` 178 command will give us the reason why: 179 180 ```text 181 $ nomad alloc-status 04d9627d 182 # ... 183 184 Recent Events: 185 Time Type Description 186 06/28/16 15:50:22 UTC Not Restarting Error was unrecoverable 187 06/28/16 15:50:22 UTC Driver Failure failed to create image: Failed to pull `redis:2,8`: API error (500): invalid tag format 188 06/28/16 15:50:22 UTC Received Task received by client 189 ``` 190 191 Unfortunately not all failures are as easily debuggable. If the `alloc-status` 192 command shows many restarts, there is likely an application-level issue during 193 start up. For example: 194 195 ``` 196 $ nomad alloc-status 04d9627d 197 # ... 198 199 Recent Events: 200 Time Type Description 201 06/28/16 15:56:16 UTC Restarting Task restarting in 5.178426031s 202 06/28/16 15:56:16 UTC Terminated Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1" 203 06/28/16 15:56:16 UTC Started Task started by client 204 06/28/16 15:56:00 UTC Restarting Task restarting in 5.00123931s 205 06/28/16 15:56:00 UTC Terminated Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1" 206 06/28/16 15:55:59 UTC Started Task started by client 207 06/28/16 15:55:48 UTC Received Task received by client 208 ``` 209 210 To debug these failures, we will need to utilize the "logs" command, which is 211 discussed in the [accessing logs](/docs/operating-a-job/accessing-logs.html) 212 section of this documentation. 213 214 For more information on the `alloc-status` command, please see the [CLI 215 documentation for <tt>alloc-status</tt>](/docs/commands/alloc-status.html).