github.com/pachyderm/pachyderm@v1.13.4/doc/docs/master/how-tos/monitor-job-progress.md (about)

     1  # Monitor Job Progress
     2  
     3  After a pipeline starts a job, you can run one of the following commands
     4  to monitor its status:
     5  
     6  * `pachctl list pipeline`
     7  
     8    This command shows all the pipelines that run in your cluster
     9    and the status of the last job. The `STATE` column shows the current
    10    state of the pipeline. If you see that a pipeline is in `running`
    11    state, it means that pods were spun up in the underlying Kubernetes
    12    cluster for this pipeline. The running state does not necessarily mean
    13    that the pipeline is actively processing a job. If you see `failed` in
    14    the `STATE`
    15    column, this means that the Kubernetes cluster failed to schedule a pod for
    16    this pipeline.
    17  
    18    The `LAST JOB` column shows the status of the most recent job that ran
    19    for this pipeline, which can be either `success`, `failed`, or
    20    `crashing`. If a pipeline is in a `failed` state, you need to find the
    21    reason of the failure and fix it. The crashing state indicates that
    22    the pipeline worker is failing for potentially transient reasons. The
    23    most common reasons for crashing are image pull failures, such as
    24    incorrect image name or registry credentials, or scheduling failures,
    25    such as not enough resources on your Kubernetes cluster.
    26  
    27    **Example:**
    28  
    29    ```shell
    30    NAME    VERSION INPUT                 CREATED       STATE / LAST JOB    DESCRIPTION
    31    montage 1       (edges:/ тип images:/)  2 seconds ago starting / starting A montage pipeline
    32    edges   1       images:/*             2 seconds ago running / starting  An edge detection pipeline.
    33    ```
    34  
    35  * `pachctl list job`
    36  
    37    This command shows the jobs that were run for each pipeline. For each job,
    38    Pachyderm shows the number of datums in the **PROGRESS** section, the amount
    39    of downloaded and uploaded data, duration, and other important information.
    40    The format of the progress bar is `DATUMS PROCESSED + DATUMS SKIPPED / TOTAL DATUMS`.
    41  
    42    For more information, see
    43    [Datum Processing States](../../concepts/pipeline-concepts/datum/datum-processing-states/).
    44  
    45    **Example:**
    46  
    47    ```shell
    48    svetlanakarslioglu@Svetlanas-MBP examples % pachctl list job
    49    ID                               PIPELINE STARTED       DURATION           RESTART PROGRESS    DL       UL       STATE
    50    7321952b9a214d3dbb64cc4369cc67da montage  6 minutes ago 1 second           0       1 + 0 / 1   371.9KiB 1.283MiB success
    51    95adc138e82e48949909364e8b9dbb53 edges    6 minutes ago 1 second           0       2 + 1 / 3   181.1KiB 111.4KiB success
    52    84fe22432f22492c9fd4f23036c3c8b5 montage  6 minutes ago Less than a second 0       1 + 0 / 1   79.49KiB 378.6KiB success
    53    2fbbc54ab3514d8a94d1b7a75bab96a7 edges    6 minutes ago Less than a second 0       1 + 0 / 1   57.27KiB 22.22KiB success
    54    ```
    55  
    56  * `pachctl list commit <repo>`
    57  
    58    This command shows the status of the downstream jobs further in
    59    the DAG that result from this commit.
    60    In the [Hyperparameter Tuning example](https://github.com/pachyderm/pachyderm/tree/master/examples/ml/hyperparameter), we have four pipelines,
    61    or a four-stage pipeline. Every subsequent pipeline takes the results
    62    in the output repository of the previous pipeline and performs a
    63    computation. Therefore, each step is executed one after another.
    64    The **PROGRESS** bar in the output of the `pachctl list commit <first-repo-in-dag>`
    65    command reflects these changes.
    66  
    67    Running the command against the first repo in the DAG displays
    68    a progress bar that shows job progress for all steps in your DAG.
    69  
    70    The following animation shows how the progress bar is updated
    71    when a job for each pipeline completes.
    72  
    73    <p><small>(Click to enlarge)</small></p>
    74    [ ![Progress bar](../assets/images/list_commit_progress_bar.gif)](../assets/images/list_commit_progress_bar.gif)
    75  
    76    The progress bar is equally divided to the number of steps, or pipelines,
    77    you have in your DAG. In the example above, it is four steps.
    78    If one of the jobs fails, you will see the progress bar turn red
    79    for that pipeline step. To troubleshoot, look into that particular
    80    pipeline job.
    81  
    82  !!! note "See Also"
    83      [Pipeline Troubleshooting](../../troubleshooting/pipeline_troubleshooting/)