github.com/Cloud-Foundations/Dominator@v0.3.4/cmd/subd/README.md (about) 1 # subd 2 The daemon that runs on every dominated system. 3 4 This daemon continuously checksum scans the root file-system and responds to 5 **poll**, **fetch files** and **update** RPC requests from the 6 *[dominator](../dominator/README.md)*. 7 In order to have a neglibible impact on system workload, it lowers its priority 8 (nice 15 by default), restricts itself to one CPU and automatically rate limits 9 its I/O to be 2% of the media speed. 10 11 ## Status page 12 *Subd* provides a web interface on port `6969` which provides a status page, 13 access to performance metrics and logs. If *subd* is running on host `myhost` 14 then the URL of the main status page is `http://myhost:6969/`. An RPC over HTTP 15 interface is also provided over the same port. 16 17 ## Startup 18 *Subd* is started at boot time, usually by one of the provided 19 [init scripts](../../init.d/). The *subd* process is baby-sat by the init 20 script; if the process dies the init script will re-start *subd*. It may be 21 stopped with the command: 22 23 ``` 24 service subd stop 25 ``` 26 27 which also kills the baby-sitting init script. It may be started with the 28 comand: 29 30 ``` 31 service subd start 32 ``` 33 34 There are many command-line flags which may change the behaviour of *subd* but 35 the defaults should be adequate for most deployments. Built-in help is available 36 with the command: 37 38 ``` 39 subd -h 40 ``` 41 42 ## Security 43 RPC access is restricted using TLS client authentication. *Subd* expects a root 44 certificate in the file `/etc/ssl/CA.pem` which it trusts to sign certificates 45 which grant access. It also requires a certificate and key which grant it the 46 ability to **fetch** files from the objectserver. These should be in the files 47 `/etc/ssl/subd/cert.pem` and `/etc/ssl/subd/key.pem`, respectively. 48 49 If any of these files are missing, *subd* will refuse to start. This prevents 50 accidental deployments without access control. 51 52 ## Control and debugging 53 The *[subtool](../subtool/README.md)* utility may be used to manipulate various 54 operating parameters of a running *subd* and perform RPC requests. 55 56 ## DisruptionManager 57 Disruptive updates can be controlled using an optional *Disruption Manager* 58 which *subd* can run to request, check and cancel requests to perform a 59 disruptive upgrade (an upgrade where a *HighImpact* trigger is called). This may 60 be used to request that new work will not be scheduled on the machine and wait 61 for existing work to complete before performing the upgrade. 62 63 The *Disruption Manager* is a simple tool which takes one of the following 64 arguments: 65 - **cancel**: cancel a request to disrupt 66 - **check**: check whether disruptions are permitted 67 - **request**: request to perform disruption 68 69 Regardless of the argument provided, the tool must return one of the following 70 exit codes: 71 - **0**: disruption is permitted 72 - **1**: disruption has been requested (and acknowledged) but not yet permitted 73 - **2**: disruption is denied (not currently permitted) 74 75 Any other exit code is considered an error, and *subd* may retry again soon. 76 77 After a **request** to perform a disruptive upgrade, if the exit code is **1** 78 (disruption requested and acknowledged), the **request** will be re-issued 79 periodically. If however the exit code is **2** (upgrade is not permitted), the 80 **request** will be re-issued more frequently. 81 82 Once a machine enters the `disruption is permitted state`, it must remain in 83 that state until a `cancel` command is made, or more than one hour has passed 84 since the last `request` is made. 85 86 The *DisruptionManager* may be called frequently (up to every second) by every 87 machine in the fleet.