github.com/mattyr/nomad@v0.3.3-0.20160919021406-3485a065154a/website/source/docs/jobops/inspecting.html.md (about) 1 --- 2 layout: "docs" 3 page_title: "Operating a Job: Inspecting State" 4 sidebar_current: "docs-jobops-inspection" 5 description: |- 6 Learn how to inspect a Nomad Job. 7 --- 8 9 # Inspecting state 10 11 Once a job is submitted, the next step is to ensure it is running. This section 12 will assume we have submitted a job with the name _example_. 13 14 To get a high-level over view of our job we can use the [`nomad status` 15 command](/docs/commands/status.html). This command will display the list of 16 running allocations, as well as any recent placement failures. An example below 17 shows that the job has some allocations placed but did not have enough resources 18 to place all of the desired allocations. We run with `-evals` to see that there 19 is an outstanding evaluation for the job: 20 21 ``` 22 $ nomad status example 23 ID = example 24 Name = example 25 Type = service 26 Priority = 50 27 Datacenters = dc1 28 Status = running 29 Periodic = false 30 31 Evaluations 32 ID Priority Triggered By Status Placement Failures 33 5744eb15 50 job-register blocked N/A - In Progress 34 8e38e6cf 50 job-register complete true 35 36 Placement Failure 37 Task Group "cache": 38 * Resources exhausted on 1 nodes 39 * Dimension "cpu exhausted" exhausted on 1 nodes 40 41 Allocations 42 ID Eval ID Node ID Task Group Desired Status Created At 43 12681940 8e38e6cf 4beef22f cache run running 08/08/16 21:03:19 CDT 44 395c5882 8e38e6cf 4beef22f cache run running 08/08/16 21:03:19 CDT 45 4d7c6f84 8e38e6cf 4beef22f cache run running 08/08/16 21:03:19 CDT 46 843b07b8 8e38e6cf 4beef22f cache run running 08/08/16 21:03:19 CDT 47 a8bc6d3e 8e38e6cf 4beef22f cache run running 08/08/16 21:03:19 CDT 48 b0beb907 8e38e6cf 4beef22f cache run running 08/08/16 21:03:19 CDT 49 da21c1fd 8e38e6cf 4beef22f cache run running 08/08/16 21:03:19 CDT 50 ``` 51 52 In the above example we see that the job has a "blocked" evaluation that is in 53 progress. When Nomad can not place all the desired allocations, it creates a 54 blocked evaluation that waits for more resources to become available. We can use 55 the [`eval-status` command](/docs/commands/eval-status.html) to examine any 56 evaluation in more detail. For the most part this should never be necessary but 57 can be useful to see why all of a job's allocations were not placed. For 58 example if we run it on the _example_ job, which had a placement failure 59 according to the above output, we see: 60 61 ``` 62 nomad eval-status 8e38e6cf 63 ID = 8e38e6cf 64 Status = complete 65 Status Description = complete 66 Type = service 67 TriggeredBy = job-register 68 Job ID = example 69 Priority = 50 70 Placement Failures = true 71 72 Failed Placements 73 Task Group "cache" (failed to place 3 allocations): 74 * Resources exhausted on 1 nodes 75 * Dimension "cpu exhausted" exhausted on 1 nodes 76 77 Evaluation "5744eb15" waiting for additional capacity to place remainder 78 ``` 79 80 More interesting though is the [`alloc-status` 81 command](/docs/commands/alloc-status.html). This command gives us the most 82 recent events that occurred for a task, its resource usage, port allocations and 83 more: 84 85 ``` 86 nomad alloc-status 12 87 ID = 12681940 88 Eval ID = 8e38e6cf 89 Name = example.cache[1] 90 Node ID = 4beef22f 91 Job ID = example 92 Client Status = running 93 Created At = 06/28/16 15:37:44 UTC 94 95 Task "redis" is "running" 96 Task Resources 97 CPU Memory Disk IOPS Addresses 98 2/500 6.3 MiB/256 MiB 300 MiB 0 db: 127.0.0.1:57161 99 100 Recent Events: 101 Time Type Description 102 06/28/16 15:46:42 UTC Started Task started by client 103 06/28/16 15:46:10 UTC Restarting Task restarting in 30.863215327s 104 06/28/16 15:46:10 UTC Terminated Exit Code: 137, Exit Message: "Docker container exited with non-zero exit code: 137" 105 06/28/16 15:37:46 UTC Started Task started by client 106 06/28/16 15:37:44 UTC Received Task received by client 107 ``` 108 109 In the above example we forced killed the Docker container so that we could see 110 in the event history that Nomad detected the failure and restarted the 111 allocation. 112 113 The `alloc-status` command is a good starting to point for debugging an 114 application that did not start. In this example task we are trying to start a 115 redis image using `redis:2.8` but the user has accidentally put a comma instead 116 of a period, typing `redis:2,8`. 117 118 119 When the job is run, it produces an allocation that fails. The `alloc-status` 120 command gives us the reason why: 121 122 ``` 123 nomad alloc-status c0f1 124 ID = c0f1b34c 125 Eval ID = 4df393cb 126 Name = example.cache[0] 127 Node ID = 13063955 128 Job ID = example 129 Client Status = failed 130 Created At = 06/28/16 15:50:22 UTC 131 132 Task "redis" is "dead" 133 Task Resources 134 CPU Memory Disk IOPS Addresses 135 500 256 MiB 300 MiB 0 db: 127.0.0.1:23285 136 137 Recent Events: 138 Time Type Description 139 06/28/16 15:50:22 UTC Not Restarting Error was unrecoverable 140 06/28/16 15:50:22 UTC Driver Failure failed to create image: Failed to pull `redis:2,8`: API error (500): invalid tag format 141 06/28/16 15:50:22 UTC Received Task received by client 142 ``` 143 144 Not all failures are this easily debuggable. If the `alloc-status` command shows 145 many restarts occurring as in the example below, it is a good hint that the error 146 is occurring at the application level during start up. These failures can be 147 debugged by looking at logs which is covered in the [Nomad Job Logging 148 documentation](/docs/jobops/logs.html). 149 150 ``` 151 $ nomad alloc-status e6b6 152 ID = e6b625a1 153 Eval ID = 68b742e8 154 Name = example.cache[0] 155 Node ID = 83ef596c 156 Job ID = example 157 Client Status = pending 158 Created At = 06/28/16 15:55:48 159 160 Task "redis" is "pending" 161 Task Resources 162 CPU Memory Disk IOPS Addresses 163 500 256 MiB 300 MiB 0 db: 127.0.0.1:30153 164 165 Recent Events: 166 Time Type Description 167 06/28/16 15:56:16 UTC Restarting Task restarting in 5.178426031s 168 06/28/16 15:56:16 UTC Terminated Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1" 169 06/28/16 15:56:16 UTC Started Task started by client 170 06/28/16 15:56:00 UTC Restarting Task restarting in 5.00123931s 171 06/28/16 15:56:00 UTC Terminated Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1" 172 06/28/16 15:55:59 UTC Started Task started by client 173 06/28/16 15:55:48 UTC Received Task received by client 174 ```