github.com/smintz/nomad@v0.8.3/website/source/guides/operating-a-job/inspecting-state.html.md (about) 1 --- 2 layout: "guides" 3 page_title: "Inspecting State - Operating a Job" 4 sidebar_current: "guides-operating-a-job-inspecting-state" 5 description: |- 6 Nomad exposes a number of tools and techniques for inspecting a running job. 7 This is helpful in ensuring the job started successfully. Additionally, it 8 can inform us of any errors that occurred while starting the job. 9 --- 10 11 # Inspecting State 12 13 A successful job submission is not an indication of a successfully-running job. 14 This is the nature of a highly-optimistic scheduler. A successful job submission 15 means the server was able to issue the proper scheduling commands. It does not 16 indicate the job is actually running. To verify the job is running, we need to 17 inspect its state. 18 19 This section will utilize the job named "docs" from the [previous 20 sections](/guides/operating-a-job/submitting-jobs.html), but these operations 21 and command largely apply to all jobs in Nomad. 22 23 ## Job Status 24 25 After a job is submitted, you can query the status of that job using the job 26 status command: 27 28 ```text 29 $ nomad job status 30 ID Type Priority Status 31 docs service 50 running 32 ``` 33 34 At a high level, we can see that our job is currently running, but what does 35 "running" actually mean. By supplying the name of a job to the job status 36 command, we can ask Nomad for more detailed job information: 37 38 ```text 39 $ nomad job status docs 40 ID = docs 41 Name = docs 42 Type = service 43 Priority = 50 44 Datacenters = dc1 45 Status = running 46 Periodic = false 47 48 Summary 49 Task Group Queued Starting Running Failed Complete Lost 50 example 0 0 3 0 0 0 51 52 Allocations 53 ID Eval ID Node ID Task Group Desired Status Created At 54 04d9627d 42d788a3 a1f934c9 example run running <timestamp> 55 e7b8d4f5 42d788a3 012ea79b example run running <timestamp> 56 5cbf23a1 42d788a3 1e1aa1e0 example run running <timestamp> 57 ``` 58 59 Here we can see that there are three instances of this task running, each with 60 its own allocation. For more information on the `status` command, please see the 61 [CLI documentation for <tt>status</tt>](/docs/commands/status.html). 62 63 ## Evaluation Status 64 65 You can think of an evaluation as a submission to the scheduler. An example 66 below shows status output for a job where some allocations were placed 67 successfully, but did not have enough resources to place all of the desired 68 allocations. 69 70 If we issue the status command with the `-evals` flag, we could see there is an 71 outstanding evaluation for this hypothetical job: 72 73 ```text 74 $ nomad job status -evals docs 75 ID = docs 76 Name = docs 77 Type = service 78 Priority = 50 79 Datacenters = dc1 80 Status = running 81 Periodic = false 82 83 Evaluations 84 ID Priority Triggered By Status Placement Failures 85 5744eb15 50 job-register blocked N/A - In Progress 86 8e38e6cf 50 job-register complete true 87 88 Placement Failure 89 Task Group "example": 90 * Resources exhausted on 1 nodes 91 * Dimension "cpu" exhausted on 1 nodes 92 93 Allocations 94 ID Eval ID Node ID Task Group Desired Status Created At 95 12681940 8e38e6cf 4beef22f example run running <timestamp> 96 395c5882 8e38e6cf 4beef22f example run running <timestamp> 97 4d7c6f84 8e38e6cf 4beef22f example run running <timestamp> 98 843b07b8 8e38e6cf 4beef22f example run running <timestamp> 99 a8bc6d3e 8e38e6cf 4beef22f example run running <timestamp> 100 b0beb907 8e38e6cf 4beef22f example run running <timestamp> 101 da21c1fd 8e38e6cf 4beef22f example run running <timestamp> 102 ``` 103 104 In the above example we see that the job has a "blocked" evaluation that is in 105 progress. When Nomad can not place all the desired allocations, it creates a 106 blocked evaluation that waits for more resources to become available. 107 108 The `eval status` command enables us to examine any evaluation in more detail. 109 For the most part this should never be necessary but can be useful to see why 110 all of a job's allocations were not placed. For example if we run it on the job 111 named docs, which had a placement failure according to the above output, we 112 might see: 113 114 ```text 115 $ nomad eval status 8e38e6cf 116 ID = 8e38e6cf 117 Status = complete 118 Status Description = complete 119 Type = service 120 TriggeredBy = job-register 121 Job ID = docs 122 Priority = 50 123 Placement Failures = true 124 125 Failed Placements 126 Task Group "example" (failed to place 3 allocations): 127 * Resources exhausted on 1 nodes 128 * Dimension "cpu" exhausted on 1 nodes 129 130 Evaluation "5744eb15" waiting for additional capacity to place remainder 131 ``` 132 133 For more information on the `eval status` command, please see the [CLI documentation for <tt>eval status</tt>](/docs/commands/eval-status.html). 134 135 ## Allocation Status 136 137 You can think of an allocation as an instruction to schedule. Just like an 138 application or service, an allocation has logs and state. The `alloc status` 139 command gives us the most recent events that occurred for a task, its resource 140 usage, port allocations and more: 141 142 ```text 143 $ nomad alloc status 04d9627d 144 ID = 04d9627d 145 Eval ID = 42d788a3 146 Name = docs.example[2] 147 Node ID = a1f934c9 148 Job ID = docs 149 Client Status = running 150 151 Task "server" is "running" 152 Task Resources 153 CPU Memory Disk IOPS Addresses 154 0/100 MHz 728 KiB/10 MiB 300 MiB 0 http: 10.1.1.196:5678 155 156 Recent Events: 157 Time Type Description 158 10/09/16 00:36:06 UTC Started Task started by client 159 10/09/16 00:36:05 UTC Received Task received by client 160 ``` 161 162 The `alloc status` command is a good starting to point for debugging an 163 application that did not start. Hypothetically assume a user meant to start a 164 Docker container named "redis:2.8", but accidentally put a comma instead of a 165 period, typing "redis:2,8". 166 167 When the job is executed, it produces a failed allocation. The `alloc status` 168 command will give us the reason why: 169 170 ```text 171 $ nomad alloc status 04d9627d 172 # ... 173 174 Recent Events: 175 Time Type Description 176 06/28/16 15:50:22 UTC Not Restarting Error was unrecoverable 177 06/28/16 15:50:22 UTC Driver Failure failed to create image: Failed to pull `redis:2,8`: API error (500): invalid tag format 178 06/28/16 15:50:22 UTC Received Task received by client 179 ``` 180 181 Unfortunately not all failures are as easily debuggable. If the `alloc status` 182 command shows many restarts, there is likely an application-level issue during 183 start up. For example: 184 185 ```text 186 $ nomad alloc status 04d9627d 187 # ... 188 189 Recent Events: 190 Time Type Description 191 06/28/16 15:56:16 UTC Restarting Task restarting in 5.178426031s 192 06/28/16 15:56:16 UTC Terminated Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1" 193 06/28/16 15:56:16 UTC Started Task started by client 194 06/28/16 15:56:00 UTC Restarting Task restarting in 5.00123931s 195 06/28/16 15:56:00 UTC Terminated Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1" 196 06/28/16 15:55:59 UTC Started Task started by client 197 06/28/16 15:55:48 UTC Received Task received by client 198 ``` 199 200 To debug these failures, we will need to utilize the "logs" command, which is 201 discussed in the [accessing logs](/guides/operating-a-job/accessing-logs.html) 202 section of this documentation. 203 204 For more information on the `alloc status` command, please see the [CLI 205 documentation for <tt>alloc status</tt>](/docs/commands/alloc/status.html).