github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/concepts/plugins/task-drivers.mdx (about)

     1  ---
     2  layout: docs
     3  page_title: Task Driver Plugins
     4  description: Learn how to author a Nomad task driver plugin.
     5  ---
     6  
     7  # Task Drivers
     8  
     9  Task drivers in Nomad are the runtime components that execute workloads. For
    10  a real world example of a Nomad task driver plugin implementation, see the [LXC
    11  driver source][lxcdriver].
    12  
    13  ## Authoring Task Driver Plugins
    14  
    15  Authoring a task driver (shortened to driver in this documentation) in Nomad
    16  consists of implementing the [DriverPlugin][driverplugin] interface and adding
    17  a main package to launch the plugin. A driver plugin is long-lived and its
    18  lifetime is not bound to the Nomad client. This means that the Nomad client can
    19  be restarted without restarting the driver. Nomad will ensure that one
    20  instance of the driver is running, meaning if the driver crashes or otherwise
    21  terminates, Nomad will launch another instance of it.
    22  
    23  Drivers should maintain as little state as possible. State for a task is stored
    24  by the Nomad client on task creation. This enables a pattern where the driver
    25  can maintain an in-memory state of the running tasks, and if necessary the
    26  Nomad client can recover tasks into the driver state.
    27  
    28  The [driver plugin skeleton project][skeletonproject] exists to help bootstrap
    29  the development of new driver plugins. It provides most of the boilerplate
    30  necessary for a driver plugin, along with detailed comments.
    31  
    32  ## Task Driver Plugin API
    33  
    34  The [base plugin][baseplugin] must be implemented in addition to the following
    35  functions.
    36  
    37  ### `TaskConfigSchema() (*hclspec.Spec, error)`
    38  
    39  This function returns the schema for the driver configuration of the task. For
    40  more information on `hclspec.Spec` see the HCL section in the [base
    41  plugin][baseplugin] documentation.
    42  
    43  ### `Capabilities() (*Capabilities, error)`
    44  
    45  Capabilities define what features the driver implements. Example:
    46  
    47  ```go
    48  type Capabilities struct {
    49      // SendSignals marks the driver as being able to send signals
    50      SendSignals bool
    51  
    52      // Exec marks the driver as being able to execute arbitrary commands
    53      // such as health checks. Used by the ScriptExecutor interface.
    54      Exec bool
    55  
    56      //FSIsolation indicates what kind of filesystem isolation the driver supports.
    57      FSIsolation FSIsolation
    58  
    59      //NetIsolationModes lists the set of isolation modes supported by the driver
    60      NetIsolationModes []NetIsolationMode
    61  
    62      // MustInitiateNetwork tells Nomad that the driver must create the network
    63      // namespace and that the CreateNetwork and DestroyNetwork RPCs are implemented.
    64      MustInitiateNetwork bool
    65  
    66      // MountConfigs tells Nomad which mounting config options the driver supports.
    67      MountConfigs MountConfigSupport
    68  
    69      // RemoteTasks indicates this driver runs tasks on remote systems
    70      // instead of locally. The Nomad client can use this information to
    71      // adjust behavior such as propogating task handles between allocations
    72      // to avoid downtime when a client is lost.
    73      RemoteTasks bool
    74  }
    75  ```
    76  
    77  The file system isolation options are:
    78  
    79  - `FSIsolationImage`: The task driver isolates tasks as machine images.
    80  - `FSIsolationChroot`: The task driver isolates tasks with `chroot` or
    81    `pivot_root`.
    82  - `FSIsolationNone`: The task driver has no filesystem isolation.
    83  
    84  The network isolation modes are:
    85  
    86  - `NetIsolationModeHost`: The task driver supports disabling network isolation
    87    and using the host network.
    88  - `NetIsolationModeGroup`: The task driver supports using the task group
    89    network namespace.
    90  - `NetIsolationModeTask`: The task driver supports isolating the network to
    91    just the task.
    92  - `NetIsolationModeNone`: There is no network to isolate. This is used for
    93    task that the client manages remotely.
    94  
    95  #### Remote Task Drivers
    96  
    97  [Remote Task Drivers][rtd] should set `RemoteTasks` to `true`. Remote Task
    98  Drivers are task driver plugins that execute tasks on a different system than
    99  the Nomad client. This means the tasks lifecycle is distinct from the Nomad
   100  client's.
   101  
   102  For task driver plugin authors there are 2 important new behaviors when
   103  `RemoteTasks` is `true`:
   104  
   105  1. The `TaskHandle` returned by `StartTask` will be propagated to replacement
   106     allocations if the Nomad client is drained or down. Nomad will call
   107     `RecoverTask` instead of `StartTask` for remote tasks in replacement
   108     allocations when a `TaskHandle` has been propagated from the previous
   109     allocation.
   110  2. If the Nomad client managing a remote task is drained or if the allocation
   111     was `lost`, the remote task is sent a special `DETACH` kill signal. This
   112     indicates the plugin should stop managing the remote task, but _not_ stop
   113     it.
   114  
   115  These behaviors are meant to keep remote tasks running even when the Nomad
   116  client managing them is shutdown. Remote tasks are stopped when the job is
   117  explicitly stopped like traditional tasks.
   118  
   119  ### `Fingerprint(context.Context) (<-chan *Fingerprint, error)`
   120  
   121  This function is called by the client when the plugin is started. It allows the
   122  driver to indicate its health to the client. The channel returned should
   123  immediately send an initial Fingerprint, then send periodic updates at an
   124  interval that is appropriate for the driver until the context is canceled.
   125  
   126  The fingerprint consists of a `HealthState` and `HealthDescription` to inform
   127  the client about its health. Additionally an `Attributes` field is available
   128  for the driver to add additional attributes to the client node. The fingerprint
   129  `HealthState` can be one of three states.
   130  
   131  - `HealthStateUndetected`: Indicates that the necessary dependencies for the
   132    driver are not detected on the system. Ex. java runtime for the java driver
   133  - `HealthStateUnhealthy`: Indicates that something is wrong with the driver
   134    runtime. Ex. docker daemon stopped for the Docker driver
   135  - `HealthStateHealthy`: All systems go
   136  
   137  ### `StartTask(*TaskConfig) (*TaskHandle, *DriverNetwork, error)`
   138  
   139  This function takes a [`TaskConfig`][taskconfig] which includes all of the configuration
   140  needed to launch the task. Additionally the driver configuration can be decoded
   141  from the `TaskConfig` by calling `*TaskConfig.DecodeDriverConfig(t interface{})`
   142  passing in a pointer to the driver specific configuration struct. The
   143  `TaskConfig` includes an `ID` field which future operations on the task will be
   144  referenced by.
   145  
   146  Drivers return a [`*TaskHandle`][taskhandle] which contains
   147  the required information for the driver to reattach to the running task in the
   148  case of plugin crashes or restarts. Some of this required state
   149  will be specific to the driver implementation, thus a `DriverState` field
   150  exists to allow the driver to encode custom state into the struct. Helper
   151  fields exist on the `TaskHandle` to `GetDriverState` and `SetDriverState`
   152  removing the need for the driver to handle serialization.
   153  
   154  A `*DriverNetwork` can optionally be returned to describe the network of the
   155  task if it is modified by the driver. An example of this is in the Docker
   156  driver where tasks can be attached to a specific Docker network.
   157  
   158  If an error occurs, it is expected that the driver will cleanup any created
   159  resources prior to returning the error.
   160  
   161  #### Logging
   162  
   163  Nomad handles all rotation and plumbing of task logs. In order for task stdout
   164  and stderr to be received by Nomad, they must be written to the correct
   165  location. Prior to starting the task through the driver, the Nomad client
   166  creates FIFOs for stdout and stderr. These paths are given to the driver in the
   167  `TaskConfig`. The [`fifo` package][fifopackage] can be used to support
   168  cross platform writing to these paths.
   169  
   170  #### TaskHandle Schema Versioning
   171  
   172  A `Version` field is available on the TaskHandle struct to facilitate backwards
   173  compatible recovery of tasks. This field is opaque to Nomad, but allows the
   174  driver to handle recover tasks that were created by an older version of the
   175  plugin.
   176  
   177  ### `RecoverTask(*TaskHandle) error`
   178  
   179  When a driver is restarted it is not expected to persist any internal state to
   180  disk. To support this, Nomad will attempt to recover a task that was
   181  previously started if the driver does not recognize the task ID. During task
   182  recovery, Nomad calls `RecoverTask` passing the `TaskHandle` that was
   183  returned by the `StartTask` function. If no error was returned, it is
   184  expected that the driver can now operate on the task by referencing the task
   185  ID. If an error occurs, the Nomad client will mark the task as `lost`.
   186  
   187  ### `WaitTask(context.Context, id string) (<-chan *ExitResult, error)`
   188  
   189  The `WaitTask` function is expected to return a channel that will send an
   190  `*ExitResult` when the task exits or close the channel when the context is
   191  canceled. It is also expected that calling `WaitTask` on an exited task will
   192  immediately send an `*ExitResult` on the returned channel.
   193  
   194  ### `StopTask(taskID string, timeout time.Duration, signal string) error`
   195  
   196  The `StopTask` function is expected to stop a running task by sending the given
   197  signal to it. If the task does not stop during the given timeout, the driver
   198  must forcefully kill the task.
   199  
   200  `StopTask` does not clean up resources of the task or remove it from the
   201  driver's internal state. A call to `WaitTask` after `StopTask` is valid and
   202  should be handled.
   203  
   204  ### `DestroyTask(taskID string, force bool) error`
   205  
   206  The `DestroyTask` function cleans up and removes a task that has terminated. If
   207  force is set to true, the driver must destroy the task even if it is still
   208  running. If `WaitTask` is called after `DestroyTask`, it should return
   209  `drivers.ErrTaskNotFound` as no task state should exist after `DestroyTask` is
   210  called.
   211  
   212  ### `InspectTask(taskID string) (*TaskStatus, error)`
   213  
   214  The `InspectTask` function returns detailed status information for the
   215  referenced `taskID`.
   216  
   217  ### `TaskStats(context.Context, id string, time.Duration) (<-chan *cstructs.TaskResourceUsage, error)`
   218  
   219  The `TaskStats` function returns a channel which the driver should send stats
   220  to at the given interval. The driver must send stats at the given interval
   221  until the given context is canceled or the task terminates.
   222  
   223  ### `TaskEvents(context.Context) (<-chan *TaskEvent, error)`
   224  
   225  The Nomad client publishes events associated with an allocation. The
   226  `TaskEvents` function allows the driver to publish driver specific events about
   227  tasks and the Nomad client will associate them with the correct allocation.
   228  
   229  An `Eventer` utility is available in the
   230  `github.com/hashicorp/nomad/drivers/shared/eventer` package implements an
   231  event loop and publishing mechanism for use in the `TaskEvents` function.
   232  
   233  ### `SignalTask(taskID string, signal string) error`
   234  
   235  > Optional - can be skipped by embedding `drivers.DriverSignalTaskNotSupported`
   236  
   237  The `SignalTask` function is used by drivers which support sending OS signals
   238  (`SIGHUP`, `SIGKILL`, `SIGUSR1` etc.) to the task. It is an optional function
   239  and is listed as a capability in the driver `Capabilities` struct.
   240  
   241  ### `ExecTask(taskID string, cmd []string, timeout time.Duration) (*ExecTaskResult, error)`
   242  
   243  > Optional - can be skipped by embedding `drivers.DriverExecTaskNotSupported`
   244  
   245  The `ExecTask` function is used by the Nomad client to execute commands inside
   246  the task execution context. For example, the Docker driver executes commands
   247  inside the running container. `ExecTask` is called for Consul script checks.
   248  
   249  [lxcdriver]: https://github.com/hashicorp/nomad-driver-lxc
   250  [driverplugin]: https://github.com/hashicorp/nomad/blob/v0.9.0/plugins/drivers/driver.go#L39-L57
   251  [skeletonproject]: https://github.com/hashicorp/nomad-skeleton-driver-plugin
   252  [baseplugin]: /docs/concepts/plugins/base
   253  [taskconfig]: https://godoc.org/github.com/hashicorp/nomad/plugins/drivers#TaskConfig
   254  [taskhandle]: https://godoc.org/github.com/hashicorp/nomad/plugins/drivers#TaskHandle
   255  [fifopackage]: https://godoc.org/github.com/hashicorp/nomad/client/lib/fifo
   256  [rtd]: /plugins/drivers/remote