github.com/mattyr/nomad@v0.3.3-0.20160919021406-3485a065154a/website/source/docs/jobops/inspecting.html.md

github.com/mattyr/nomad@v0.3.3-0.20160919021406-3485a065154a/website/source/docs/jobops/inspecting.html.md (about)

     1  ---
     2  layout: "docs"
     3  page_title: "Operating a Job: Inspecting State"
     4  sidebar_current: "docs-jobops-inspection"
     5  description: |-
     6    Learn how to inspect a Nomad Job.
     7  ---
     8  
     9  # Inspecting state
    10  
    11  Once a job is submitted, the next step is to ensure it is running. This section
    12  will assume we have submitted a job with the name _example_.
    13  
    14  To get a high-level over view of our job we can use the [`nomad status`
    15  command](/docs/commands/status.html). This command will display the list of
    16  running allocations, as well as any recent placement failures. An example below
    17  shows that the job has some allocations placed but did not have enough resources
    18  to place all of the desired allocations. We run with `-evals` to see that there
    19  is an outstanding evaluation for the job:
    20  
    21  ```
    22  $ nomad status example
    23  ID          = example
    24  Name        = example
    25  Type        = service
    26  Priority    = 50
    27  Datacenters = dc1
    28  Status      = running
    29  Periodic    = false
    30  
    31  Evaluations
    32  ID        Priority  Triggered By  Status    Placement Failures
    33  5744eb15  50        job-register  blocked   N/A - In Progress
    34  8e38e6cf  50        job-register  complete  true
    35  
    36  Placement Failure
    37  Task Group "cache":
    38    * Resources exhausted on 1 nodes
    39    * Dimension "cpu exhausted" exhausted on 1 nodes
    40  
    41  Allocations
    42  ID        Eval ID   Node ID   Task Group  Desired  Status   Created At
    43  12681940  8e38e6cf  4beef22f  cache       run      running  08/08/16 21:03:19 CDT
    44  395c5882  8e38e6cf  4beef22f  cache       run      running  08/08/16 21:03:19 CDT
    45  4d7c6f84  8e38e6cf  4beef22f  cache       run      running  08/08/16 21:03:19 CDT
    46  843b07b8  8e38e6cf  4beef22f  cache       run      running  08/08/16 21:03:19 CDT
    47  a8bc6d3e  8e38e6cf  4beef22f  cache       run      running  08/08/16 21:03:19 CDT
    48  b0beb907  8e38e6cf  4beef22f  cache       run      running  08/08/16 21:03:19 CDT
    49  da21c1fd  8e38e6cf  4beef22f  cache       run      running  08/08/16 21:03:19 CDT
    50  ```
    51  
    52  In the above example we see that the job has a "blocked" evaluation that is in
    53  progress. When Nomad can not place all the desired allocations, it creates a
    54  blocked evaluation that waits for more resources to become available. We can use
    55  the [`eval-status` command](/docs/commands/eval-status.html) to examine any
    56  evaluation in more detail. For the most part this should never be necessary but
    57  can be useful to see why all of a job's allocations were not placed. For
    58  example if we run it on the _example_ job, which had a placement failure
    59  according to the above output, we see:
    60  
    61  ```
    62  nomad eval-status 8e38e6cf
    63  ID                 = 8e38e6cf
    64  Status             = complete
    65  Status Description = complete
    66  Type               = service
    67  TriggeredBy        = job-register
    68  Job ID             = example
    69  Priority           = 50
    70  Placement Failures = true
    71  
    72  Failed Placements
    73  Task Group "cache" (failed to place 3 allocations):
    74    * Resources exhausted on 1 nodes
    75    * Dimension "cpu exhausted" exhausted on 1 nodes
    76  
    77  Evaluation "5744eb15" waiting for additional capacity to place remainder
    78  ```
    79  
    80  More interesting though is the [`alloc-status`
    81  command](/docs/commands/alloc-status.html). This command gives us the most
    82  recent events that occurred for a task, its resource usage, port allocations and
    83  more:
    84  
    85  ```
    86  nomad alloc-status 12
    87  ID            = 12681940
    88  Eval ID       = 8e38e6cf
    89  Name          = example.cache[1]
    90  Node ID       = 4beef22f
    91  Job ID        = example
    92  Client Status = running
    93  Created At    = 06/28/16 15:37:44 UTC
    94  
    95  Task "redis" is "running"
    96  Task Resources
    97  CPU    Memory           Disk     IOPS  Addresses
    98  2/500  6.3 MiB/256 MiB  300 MiB  0     db: 127.0.0.1:57161
    99  
   100  Recent Events:
   101  Time                   Type        Description
   102  06/28/16 15:46:42 UTC  Started     Task started by client
   103  06/28/16 15:46:10 UTC  Restarting  Task restarting in 30.863215327s
   104  06/28/16 15:46:10 UTC  Terminated  Exit Code: 137, Exit Message: "Docker container exited with non-zero exit code: 137"
   105  06/28/16 15:37:46 UTC  Started     Task started by client
   106  06/28/16 15:37:44 UTC  Received    Task received by client
   107  ```
   108  
   109  In the above example we forced killed the Docker container so that we could see
   110  in the event history that Nomad detected the failure and restarted the
   111  allocation.
   112  
   113  The `alloc-status` command is a good starting to point for debugging an
   114  application that did not start. In this example task we are trying to start a
   115  redis image using `redis:2.8` but the user has accidentally put a comma instead
   116  of a period, typing `redis:2,8`.
   117  
   118  
   119  When the job is run, it produces an allocation that fails. The `alloc-status`
   120  command gives us the reason why:
   121  
   122  ```
   123  nomad alloc-status c0f1
   124  ID            = c0f1b34c
   125  Eval ID       = 4df393cb
   126  Name          = example.cache[0]
   127  Node ID       = 13063955
   128  Job ID        = example
   129  Client Status = failed
   130  Created At    = 06/28/16 15:50:22 UTC
   131  
   132  Task "redis" is "dead"
   133  Task Resources
   134  CPU  Memory   Disk     IOPS  Addresses
   135  500  256 MiB  300 MiB  0     db: 127.0.0.1:23285
   136  
   137  Recent Events:
   138  Time                   Type            Description
   139  06/28/16 15:50:22 UTC  Not Restarting  Error was unrecoverable
   140  06/28/16 15:50:22 UTC  Driver Failure  failed to create image: Failed to pull `redis:2,8`: API error (500): invalid tag format
   141  06/28/16 15:50:22 UTC  Received        Task received by client
   142  ```
   143  
   144  Not all failures are this easily debuggable. If the `alloc-status` command shows
   145  many restarts occurring as in the example below, it is a good hint that the error
   146  is occurring at the application level during start up. These failures can be
   147  debugged by looking at logs which is covered in the [Nomad Job Logging
   148  documentation](/docs/jobops/logs.html).
   149  
   150  ```
   151  $ nomad alloc-status e6b6
   152  ID            = e6b625a1
   153  Eval ID       = 68b742e8
   154  Name          = example.cache[0]
   155  Node ID       = 83ef596c
   156  Job ID        = example
   157  Client Status = pending
   158  Created At    = 06/28/16 15:55:48
   159  
   160  Task "redis" is "pending"
   161  Task Resources
   162  CPU  Memory   Disk     IOPS  Addresses
   163  500  256 MiB  300 MiB  0     db: 127.0.0.1:30153
   164  
   165  Recent Events:
   166  Time                   Type        Description
   167  06/28/16 15:56:16 UTC  Restarting  Task restarting in 5.178426031s
   168  06/28/16 15:56:16 UTC  Terminated  Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
   169  06/28/16 15:56:16 UTC  Started     Task started by client
   170  06/28/16 15:56:00 UTC  Restarting  Task restarting in 5.00123931s
   171  06/28/16 15:56:00 UTC  Terminated  Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
   172  06/28/16 15:55:59 UTC  Started     Task started by client
   173  06/28/16 15:55:48 UTC  Received    Task received by client
   174  ```