github.com/Cloud-Foundations/Dominator@v0.3.4/cmd/subd/README.md (about)

     1  # subd
     2  The daemon that runs on every dominated system.
     3  
     4  This daemon continuously checksum scans the root file-system and responds to
     5  **poll**, **fetch files** and **update** RPC requests from the
     6  *[dominator](../dominator/README.md)*.
     7  In order to have a neglibible impact on system workload, it lowers its priority
     8  (nice 15 by default), restricts itself to one CPU and automatically rate limits
     9  its I/O to be 2% of the media speed.
    10  
    11  ## Status page
    12  *Subd* provides a web interface on port `6969` which provides a status page,
    13  access to performance metrics and logs. If *subd* is running on host `myhost`
    14  then the URL of the main status page is `http://myhost:6969/`. An RPC over HTTP
    15  interface is also provided over the same port.
    16  
    17  ## Startup
    18  *Subd* is started at boot time, usually by one of the provided
    19  [init scripts](../../init.d/). The *subd* process is baby-sat by the init
    20  script; if the process dies the init script will re-start *subd*. It may be
    21  stopped with the command:
    22  
    23  ```
    24  service subd stop
    25  ```
    26  
    27  which also kills the baby-sitting init script. It may be started with the
    28  comand:
    29  
    30  ```
    31  service subd start
    32  ```
    33  
    34  There are many command-line flags which may change the behaviour of *subd* but
    35  the defaults should be adequate for most deployments. Built-in help is available
    36  with the command:
    37  
    38  ```
    39  subd -h
    40  ```
    41  
    42  ## Security
    43  RPC access is restricted using TLS client authentication. *Subd* expects a root
    44  certificate in the file `/etc/ssl/CA.pem` which it trusts to sign certificates
    45  which grant access. It also requires a certificate and key which grant it the
    46  ability to **fetch** files from the objectserver. These should be in the files
    47  `/etc/ssl/subd/cert.pem` and `/etc/ssl/subd/key.pem`, respectively.
    48  
    49  If any of these files are missing, *subd* will refuse to start. This prevents
    50  accidental deployments without access control.
    51  
    52  ## Control and debugging
    53  The *[subtool](../subtool/README.md)* utility may be used to manipulate various
    54  operating parameters of a running *subd* and perform RPC requests.
    55  
    56  ## DisruptionManager
    57  Disruptive updates can be controlled using an optional *Disruption Manager*
    58  which *subd* can run to request, check and cancel requests to perform a
    59  disruptive upgrade (an upgrade where a *HighImpact* trigger is called). This may
    60  be used to request that new work will not be scheduled on the machine and wait
    61  for existing work to complete before performing the upgrade.
    62  
    63  The *Disruption Manager* is a simple tool which takes one of the following
    64  arguments:
    65  - **cancel**: cancel a request to disrupt
    66  - **check**: check whether disruptions are permitted
    67  - **request**: request to perform disruption
    68  
    69  Regardless of the argument provided, the tool must return one of the following
    70  exit codes:
    71  - **0**: disruption is permitted
    72  - **1**: disruption has been requested (and acknowledged) but not yet permitted
    73  - **2**: disruption is denied (not currently permitted)
    74  
    75  Any other exit code is considered an error, and *subd* may retry again soon.
    76  
    77  After a **request** to perform a disruptive upgrade, if the exit code is **1**
    78  (disruption requested and acknowledged), the **request** will be re-issued
    79  periodically. If however the exit code is **2** (upgrade is not permitted), the
    80  **request** will be re-issued more frequently.
    81  
    82  Once a machine enters the `disruption is permitted state`, it must remain in
    83  that state until a `cancel` command is made, or more than one hour has passed
    84  since the last `request` is made.
    85  
    86  The *DisruptionManager* may be called frequently (up to every second) by every
    87  machine in the fleet.