github.com/ncodes/nomad@v0.5.7-0.20170403112158-97adf4a74fb3/website/source/docs/operating-a-job/inspecting-state.html.md

github.com/ncodes/nomad@v0.5.7-0.20170403112158-97adf4a74fb3/website/source/docs/operating-a-job/inspecting-state.html.md (about)

     1  ---
     2  layout: "docs"
     3  page_title: "Inspecting State - Operating a Job"
     4  sidebar_current: "docs-operating-a-job-inspecting-state"
     5  description: |-
     6    Nomad exposes a number of tools and techniques for inspecting a running job.
     7    This is helpful in ensuring the job started successfully. Additionally, it
     8    can inform us of any errors that occurred while starting the job.
     9  ---
    10  
    11  # Inspecting State
    12  
    13  A successful job submission is not an indication of a successfully-running job.
    14  This is the nature of a highly-optimistic scheduler. A successful job submission
    15  means the server was able to issue the proper scheduling commands. It does not
    16  indicate the job is actually running. To verify the job is running, we need to
    17  inspect its state.
    18  
    19  This section will utilize the job named "docs" from the [previous
    20  sections](/docs/operating-a-job/submitting-jobs.html), but these operations
    21  and command largely apply to all jobs in Nomad.
    22  
    23  ## Job Status
    24  
    25  After a job is submitted, you can query the status of that job using the status
    26  command:
    27  
    28  ```shell
    29  $ nomad status
    30  ```
    31  
    32  Here is some sample output:
    33  
    34  ```text
    35  ID    Type     Priority  Status
    36  docs  service  50        running
    37  ```
    38  
    39  At a high level, we can see that our job is currently running, but what does
    40  "running" actually mean. By supplying the name of a job to the status command,
    41  we can ask Nomad for more detailed job information:
    42  
    43  ```shell
    44  $ nomad status docs
    45  ```
    46  
    47  Here is some sample output
    48  
    49  ```text
    50  ID          = docs
    51  Name        = docs
    52  Type        = service
    53  Priority    = 50
    54  Datacenters = dc1
    55  Status      = running
    56  Periodic    = false
    57  
    58  Summary
    59  Task Group  Queued  Starting  Running  Failed  Complete  Lost
    60  example     0       0         3        0       0         0
    61  
    62  Allocations
    63  ID        Eval ID   Node ID   Task Group  Desired  Status    Created At
    64  04d9627d  42d788a3  a1f934c9  example     run      running   <timestamp>
    65  e7b8d4f5  42d788a3  012ea79b  example     run      running   <timestamp>
    66  5cbf23a1  42d788a3  1e1aa1e0  example     run      running   <timestamp>
    67  ```
    68  
    69  Here we can see that there are three instances of this task running, each with
    70  its own allocation. For more information on the `status` command, please see the
    71  [CLI documentation for <tt>status</tt>](/docs/commands/status.html).
    72  
    73  ## Evaluation Status
    74  
    75  You can think of an evaluation as a submission to the scheduler. An example
    76  below shows status output for a job where some allocations were placed
    77  successfully, but did not have enough resources to place all of the desired
    78  allocations.
    79  
    80  If we issue the status command with the `-evals` flag, we could see there is an
    81  outstanding evaluation for this hypothetical job:
    82  
    83  ```text
    84  $ nomad status -evals docs
    85  ID          = docs
    86  Name        = docs
    87  Type        = service
    88  Priority    = 50
    89  Datacenters = dc1
    90  Status      = running
    91  Periodic    = false
    92  
    93  Evaluations
    94  ID        Priority  Triggered By  Status    Placement Failures
    95  5744eb15  50        job-register  blocked   N/A - In Progress
    96  8e38e6cf  50        job-register  complete  true
    97  
    98  Placement Failure
    99  Task Group "example":
   100    * Resources exhausted on 1 nodes
   101    * Dimension "cpu exhausted" exhausted on 1 nodes
   102  
   103  Allocations
   104  ID        Eval ID   Node ID   Task Group  Desired  Status   Created At
   105  12681940  8e38e6cf  4beef22f  example       run      running  <timestamp>
   106  395c5882  8e38e6cf  4beef22f  example       run      running  <timestamp>
   107  4d7c6f84  8e38e6cf  4beef22f  example       run      running  <timestamp>
   108  843b07b8  8e38e6cf  4beef22f  example       run      running  <timestamp>
   109  a8bc6d3e  8e38e6cf  4beef22f  example       run      running  <timestamp>
   110  b0beb907  8e38e6cf  4beef22f  example       run      running  <timestamp>
   111  da21c1fd  8e38e6cf  4beef22f  example       run      running  <timestamp>
   112  ```
   113  
   114  In the above example we see that the job has a "blocked" evaluation that is in
   115  progress. When Nomad can not place all the desired allocations, it creates a
   116  blocked evaluation that waits for more resources to become available.
   117  
   118  The `eval-status` command enables us to examine any evaluation in more detail.
   119  For the most part this should never be necessary but can be useful to see why
   120  all of a job's allocations were not placed. For example if we run it on the job
   121  named docs, which had a placement failure according to the above output, we
   122  might see:
   123  
   124  ```text
   125  $ nomad eval-status 8e38e6cf
   126  ID                 = 8e38e6cf
   127  Status             = complete
   128  Status Description = complete
   129  Type               = service
   130  TriggeredBy        = job-register
   131  Job ID             = docs
   132  Priority           = 50
   133  Placement Failures = true
   134  
   135  Failed Placements
   136  Task Group "example" (failed to place 3 allocations):
   137    * Resources exhausted on 1 nodes
   138    * Dimension "cpu exhausted" exhausted on 1 nodes
   139  
   140  Evaluation "5744eb15" waiting for additional capacity to place remainder
   141  ```
   142  
   143  For more information on the `eval-status` command, please see the [CLI documentation for <tt>eval-status</tt>](/docs/commands/eval-status.html).
   144  
   145  ## Allocation Status
   146  
   147  You can think of an allocation as an instruction to schedule. Just like an
   148  application or service, an allocation has logs and state. The `alloc-status`
   149  command gives us the most recent events that occurred for a task, its resource
   150  usage, port allocations and more:
   151  
   152  ```text
   153  $ nomad alloc-status 04d9627d
   154  ID            = 04d9627d
   155  Eval ID       = 42d788a3
   156  Name          = docs.example[2]
   157  Node ID       = a1f934c9
   158  Job ID        = docs
   159  Client Status = running
   160  
   161  Task "server" is "running"
   162  Task Resources
   163  CPU        Memory          Disk     IOPS  Addresses
   164  0/100 MHz  728 KiB/10 MiB  300 MiB  0     http: 10.1.1.196:5678
   165  
   166  Recent Events:
   167  Time                   Type      Description
   168  10/09/16 00:36:06 UTC  Started   Task started by client
   169  10/09/16 00:36:05 UTC  Received  Task received by client
   170  ```
   171  
   172  The `alloc-status` command is a good starting to point for debugging an
   173  application that did not start. Hypothetically assume a user meant to start a
   174  Docker container named "redis:2.8", but accidentally put a comma instead of a
   175  period, typing "redis:2,8".
   176  
   177  When the job is executed, it produces a failed allocation. The `alloc-status`
   178  command will give us the reason why:
   179  
   180  ```text
   181  $ nomad alloc-status 04d9627d
   182  # ...
   183  
   184  Recent Events:
   185  Time                   Type            Description
   186  06/28/16 15:50:22 UTC  Not Restarting  Error was unrecoverable
   187  06/28/16 15:50:22 UTC  Driver Failure  failed to create image: Failed to pull `redis:2,8`: API error (500): invalid tag format
   188  06/28/16 15:50:22 UTC  Received        Task received by client
   189  ```
   190  
   191  Unfortunately not all failures are as easily debuggable. If the `alloc-status`
   192  command shows many restarts, there is likely an application-level issue during
   193  start up. For example:
   194  
   195  ```
   196  $ nomad alloc-status 04d9627d
   197  # ...
   198  
   199  Recent Events:
   200  Time                   Type        Description
   201  06/28/16 15:56:16 UTC  Restarting  Task restarting in 5.178426031s
   202  06/28/16 15:56:16 UTC  Terminated  Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
   203  06/28/16 15:56:16 UTC  Started     Task started by client
   204  06/28/16 15:56:00 UTC  Restarting  Task restarting in 5.00123931s
   205  06/28/16 15:56:00 UTC  Terminated  Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
   206  06/28/16 15:55:59 UTC  Started     Task started by client
   207  06/28/16 15:55:48 UTC  Received    Task received by client
   208  ```
   209  
   210  To debug these failures, we will need to utilize the "logs" command, which is
   211  discussed in the [accessing logs](/docs/operating-a-job/accessing-logs.html)
   212  section of this documentation.
   213  
   214  For more information on the `alloc-status` command, please see the [CLI
   215  documentation for <tt>alloc-status</tt>](/docs/commands/alloc-status.html).