github.com/kaisenlinux/docker.io@v0.0.0-20230510090727-ea55db55fac7/swarmkit/README.md (about) 1 # [SwarmKit](https://github.com/docker/swarmkit) 2 3 [](https://godoc.org/github.com/docker/swarmkit) 4 [](https://circleci.com/gh/docker/swarmkit) 5 [](https://codecov.io/github/docker/swarmkit?branch=master) 6 [](http://doyouevenbadge.com/report/github.com/docker/swarmkit) 7 8 *SwarmKit* is a toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more. 9 10 Its main benefits are: 11 12 - **Distributed**: *SwarmKit* uses the [Raft Consensus Algorithm](https://raft.github.io/) in order to coordinate and does not rely on a single point of failure to perform decisions. 13 - **Secure**: Node communication and membership within a *Swarm* are secure out of the box. *SwarmKit* uses mutual TLS for node *authentication*, *role authorization* and *transport encryption*, automating both certificate issuance and rotation. 14 - **Simple**: *SwarmKit* is operationally simple and minimizes infrastructure dependencies. It does not need an external database to operate. 15 16 ## Overview 17 18 Machines running *SwarmKit* can be grouped together in order to form a *Swarm*, coordinating tasks with each other. 19 Once a machine joins, it becomes a *Swarm Node*. Nodes can either be *worker* nodes or *manager* nodes. 20 21 - **Worker Nodes** are responsible for running Tasks using an *Executor*. *SwarmKit* comes with a default *Docker Container Executor* that can be easily swapped out. 22 - **Manager Nodes** on the other hand accept specifications from the user and are responsible for reconciling the desired state with the actual cluster state. 23 24 An operator can dynamically update a Node's role by promoting a Worker to Manager or demoting a Manager to Worker. 25 26 *Tasks* are organized in *Services*. A service is a higher level abstraction that allows the user to declare the desired state of a group of tasks. 27 Services define what type of task should be created as well as how to execute them (e.g. run this many replicas at all times) and how to update them (e.g. rolling updates). 28 29 ## Features 30 31 Some of *SwarmKit*'s main features are: 32 33 - **Orchestration** 34 35 - **Desired State Reconciliation**: *SwarmKit* constantly compares the desired state against the current cluster state and reconciles the two if necessary. For instance, if a node fails, *SwarmKit* reschedules its tasks onto a different node. 36 37 - **Service Types**: There are different types of services. The project currently ships with two of them out of the box 38 39 - **Replicated Services** are scaled to the desired number of replicas. 40 - **Global Services** run one task on every available node in the cluster. 41 42 - **Configurable Updates**: At any time, you can change the value of one or more fields for a service. After you make the update, *SwarmKit* reconciles the desired state by ensuring all tasks are using the desired settings. By default, it performs a lockstep update - that is, update all tasks at the same time. This can be configured through different knobs: 43 44 - **Parallelism** defines how many updates can be performed at the same time. 45 - **Delay** sets the minimum delay between updates. *SwarmKit* will start by shutting down the previous task, bring up a new one, wait for it to transition to the *RUNNING* state *then* wait for the additional configured delay. Finally, it will move onto other tasks. 46 47 - **Restart Policies**: The orchestration layer monitors tasks and reacts to failures based on the specified policy. The operator can define restart conditions, delays and limits (maximum number of attempts in a given time window). *SwarmKit* can decide to restart a task on a different machine. This means that faulty nodes will gradually be drained of their tasks. 48 49 - **Scheduling** 50 51 - **Resource Awareness**: *SwarmKit* is aware of resources available on nodes and will place tasks accordingly. 52 - **Constraints**: Operators can limit the set of nodes where a task can be scheduled by defining constraint expressions. Multiple constraints find nodes that satisfy every expression, i.e., an `AND` match. Constraints can match node attributes in the following table. Note that `engine.labels` are collected from Docker Engine with information like operating system, drivers, etc. `node.labels` are added by cluster administrators for operational purpose. For example, some nodes have security compliant labels to run tasks with compliant requirements. 53 54 | node attribute | matches | example | 55 |:------------- |:-------------| :-------------| 56 | node.id | node's ID | `node.id == 2ivku8v2gvtg4`| 57 | node.hostname | node's hostname | `node.hostname != node-2`| 58 | node.ip | node's IP address | `node.ip != 172.19.17.0/24`| 59 | node.role | node's manager or worker role | `node.role == manager`| 60 | node.platform.os | node's operating system | `node.platform.os == linux`| 61 | node.platform.arch | node's architecture | `node.platform.arch == x86_64`| 62 | node.labels | node's labels added by cluster admins | `node.labels.security == high`| 63 | engine.labels | Docker Engine's labels | `engine.labels.operatingsystem == ubuntu 14.04`| 64 65 - **Strategies**: The project currently ships with a *spread strategy* which will attempt to schedule tasks on the least loaded 66 nodes, provided they meet the constraints and resource requirements. 67 68 - **Cluster Management** 69 70 - **State Store**: Manager nodes maintain a strongly consistent, replicated (Raft based) and extremely fast (in-memory reads) view of the cluster which allows them to make quick scheduling decisions while tolerating failures. 71 - **Topology Management**: Node roles (*Worker* / *Manager*) can be dynamically changed through API/CLI calls. 72 - **Node Management**: An operator can alter the desired availability of a node: Setting it to *Paused* will prevent any further tasks from being scheduled to it while *Drained* will have the same effect while also re-scheduling its tasks somewhere else (mostly for maintenance scenarios). 73 74 - **Security** 75 76 - **Mutual TLS**: All nodes communicate with each other using mutual *TLS*. Swarm managers act as a *Root Certificate Authority*, issuing certificates to new nodes. 77 - **Token-based Join**: All nodes require a cryptographic token to join the swarm, which defines that node's role. Tokens can be rotated as often as desired without affecting already-joined nodes. 78 - **Certificate Rotation**: TLS Certificates are rotated and reloaded transparently on every node, allowing a user to set how frequently rotation should happen (the current default is 3 months, the minimum is 30 minutes). 79 80 ## Build 81 82 Requirements: 83 84 - Go 1.6 or higher 85 - A [working golang](https://golang.org/doc/code.html) environment 86 - [Protobuf 3.x or higher](https://developers.google.com/protocol-buffers/docs/downloads) to regenerate protocol buffer files (e.g. using `make generate`) 87 88 *SwarmKit* is built in Go and leverages a standard project structure to work well with Go tooling. 89 If you are new to Go, please see [BUILDING.md](BUILDING.md) for a more detailed guide. 90 91 Once you have *SwarmKit* checked out in your `$GOPATH`, the `Makefile` can be used for common tasks. 92 93 From the project root directory, run the following to build `swarmd` and `swarmctl`: 94 95 ```sh 96 $ make binaries 97 ``` 98 99 ## Test 100 101 Before running tests for the first time, setup the tooling: 102 103 ```sh 104 $ make setup 105 ``` 106 107 Then run: 108 109 ```sh 110 $ make all 111 ``` 112 113 ## Usage Examples 114 115 ### Setting up a Swarm 116 117 These instructions assume that `swarmd` and `swarmctl` are in your PATH. 118 119 (Before starting, make sure `/tmp/node-N` don't exist) 120 121 Initialize the first node: 122 123 ```sh 124 $ swarmd -d /tmp/node-1 --listen-control-api /tmp/node-1/swarm.sock --hostname node-1 125 ``` 126 127 Before joining cluster, the token should be fetched: 128 129 ``` 130 $ export SWARM_SOCKET=/tmp/node-1/swarm.sock 131 $ swarmctl cluster inspect default 132 ID : 87d2ecpg12dfonxp3g562fru1 133 Name : default 134 Orchestration settings: 135 Task history entries: 5 136 Dispatcher settings: 137 Dispatcher heartbeat period: 5s 138 Certificate Authority settings: 139 Certificate Validity Duration: 2160h0m0s 140 Join Tokens: 141 Worker: SWMTKN-1-3vi7ajem0jed8guusgvyl98nfg18ibg4pclify6wzac6ucrhg3-0117z3s2ytr6egmmnlr6gd37n 142 Manager: SWMTKN-1-3vi7ajem0jed8guusgvyl98nfg18ibg4pclify6wzac6ucrhg3-d1ohk84br3ph0njyexw0wdagx 143 ``` 144 145 In two additional terminals, join two nodes. From the example below, replace `127.0.0.1:4242` 146 with the address of the first node, and use the `<Worker Token>` acquired above. 147 In this example, the `<Worker Token>` is `SWMTKN-1-3vi7ajem0jed8guusgvyl98nfg18ibg4pclify6wzac6ucrhg3-0117z3s2ytr6egmmnlr6gd37n`. 148 If the joining nodes run on the same host as `node-1`, select a different remote 149 listening port, e.g., `--listen-remote-api 127.0.0.1:4343`. 150 151 ```sh 152 $ swarmd -d /tmp/node-2 --hostname node-2 --join-addr 127.0.0.1:4242 --join-token <Worker Token> 153 $ swarmd -d /tmp/node-3 --hostname node-3 --join-addr 127.0.0.1:4242 --join-token <Worker Token> 154 ``` 155 156 If joining as a manager, also specify the listen-control-api. 157 158 ```sh 159 $ swarmd -d /tmp/node-4 --hostname node-4 --join-addr 127.0.0.1:4242 --join-token <Manager Token> --listen-control-api /tmp/node-4/swarm.sock --listen-remote-api 127.0.0.1:4245 160 ``` 161 162 In a fourth terminal, use `swarmctl` to explore and control the cluster. Before 163 running `swarmctl`, set the `SWARM_SOCKET` environment variable to the path of the 164 manager socket that was specified in `--listen-control-api` when starting the 165 manager. 166 167 To list nodes: 168 169 ``` 170 $ export SWARM_SOCKET=/tmp/node-1/swarm.sock 171 $ swarmctl node ls 172 ID Name Membership Status Availability Manager Status 173 -- ---- ---------- ------ ------------ -------------- 174 3x12fpoi36eujbdkgdnbvbi6r node-2 ACCEPTED READY ACTIVE 175 4spl3tyipofoa2iwqgabsdcve node-1 ACCEPTED READY ACTIVE REACHABLE * 176 dknwk1uqxhnyyujq66ho0h54t node-3 ACCEPTED READY ACTIVE 177 zw3rwfawdasdewfq66ho34eaw node-4 ACCEPTED READY ACTIVE REACHABLE 178 179 180 ``` 181 182 ### Creating Services 183 184 Start a *redis* service: 185 186 ``` 187 $ swarmctl service create --name redis --image redis:3.0.5 188 08ecg7vc7cbf9k57qs722n2le 189 ``` 190 191 List the running services: 192 193 ``` 194 $ swarmctl service ls 195 ID Name Image Replicas 196 -- ---- ----- -------- 197 08ecg7vc7cbf9k57qs722n2le redis redis:3.0.5 1/1 198 ``` 199 200 Inspect the service: 201 202 ``` 203 $ swarmctl service inspect redis 204 ID : 08ecg7vc7cbf9k57qs722n2le 205 Name : redis 206 Replicas : 1/1 207 Template 208 Container 209 Image : redis:3.0.5 210 211 Task ID Service Slot Image Desired State Last State Node 212 ------- ------- ---- ----- ------------- ---------- ---- 213 0xk1ir8wr85lbs8sqg0ug03vr redis 1 redis:3.0.5 RUNNING RUNNING 1 minutes ago node-1 214 ``` 215 216 ### Updating Services 217 218 You can update any attribute of a service. 219 220 For example, you can scale the service by changing the instance count: 221 222 ``` 223 $ swarmctl service update redis --replicas 6 224 08ecg7vc7cbf9k57qs722n2le 225 226 $ swarmctl service inspect redis 227 ID : 08ecg7vc7cbf9k57qs722n2le 228 Name : redis 229 Replicas : 6/6 230 Template 231 Container 232 Image : redis:3.0.5 233 234 Task ID Service Slot Image Desired State Last State Node 235 ------- ------- ---- ----- ------------- ---------- ---- 236 0xk1ir8wr85lbs8sqg0ug03vr redis 1 redis:3.0.5 RUNNING RUNNING 3 minutes ago node-1 237 25m48y9fevrnh77til1d09vqq redis 2 redis:3.0.5 RUNNING RUNNING 28 seconds ago node-3 238 42vwc8z93c884anjgpkiatnx6 redis 3 redis:3.0.5 RUNNING RUNNING 28 seconds ago node-2 239 d41f3wnf9dex3mk6jfqp4tdjw redis 4 redis:3.0.5 RUNNING RUNNING 28 seconds ago node-2 240 66lefnooz63met6yfrsk6myvg redis 5 redis:3.0.5 RUNNING RUNNING 28 seconds ago node-1 241 3a2sawtoyk19wqhmtuiq7z9pt redis 6 redis:3.0.5 RUNNING RUNNING 28 seconds ago node-3 242 ``` 243 244 Changing *replicas* from *1* to *6* forced *SwarmKit* to create *5* additional Tasks in order to 245 comply with the desired state. 246 247 Every other field can be changed as well, such as image, args, env, ... 248 249 Let's change the image from *redis:3.0.5* to *redis:3.0.6* (e.g. upgrade): 250 251 ``` 252 $ swarmctl service update redis --image redis:3.0.6 253 08ecg7vc7cbf9k57qs722n2le 254 255 $ swarmctl service inspect redis 256 ID : 08ecg7vc7cbf9k57qs722n2le 257 Name : redis 258 Replicas : 6/6 259 Update Status 260 State : COMPLETED 261 Started : 3 minutes ago 262 Completed : 1 minute ago 263 Message : update completed 264 Template 265 Container 266 Image : redis:3.0.6 267 268 Task ID Service Slot Image Desired State Last State Node 269 ------- ------- ---- ----- ------------- ---------- ---- 270 0udsjss61lmwz52pke5hd107g redis 1 redis:3.0.6 RUNNING RUNNING 1 minute ago node-3 271 b8o394v840thk10tamfqlwztb redis 2 redis:3.0.6 RUNNING RUNNING 1 minute ago node-1 272 efw7j66xqpoj3cn3zjkdrwff7 redis 3 redis:3.0.6 RUNNING RUNNING 1 minute ago node-3 273 8ajeipzvxucs3776e4z8gemey redis 4 redis:3.0.6 RUNNING RUNNING 1 minute ago node-2 274 f05f2lbqzk9fh4kstwpulygvu redis 5 redis:3.0.6 RUNNING RUNNING 1 minute ago node-2 275 7sbpoy82deq7hu3q9cnucfin6 redis 6 redis:3.0.6 RUNNING RUNNING 1 minute ago node-1 276 ``` 277 278 By default, all tasks are updated at the same time. 279 280 This behavior can be changed by defining update options. 281 282 For instance, in order to update tasks 2 at a time and wait at least 10 seconds between updates: 283 284 ``` 285 $ swarmctl service update redis --image redis:3.0.7 --update-parallelism 2 --update-delay 10s 286 $ watch -n1 "swarmctl service inspect redis" # watch the update 287 ``` 288 289 This will update 2 tasks, wait for them to become *RUNNING*, then wait an additional 10 seconds before moving to other tasks. 290 291 Update options can be set at service creation and updated later on. If an update command doesn't specify update options, the last set of options will be used. 292 293 ### Node Management 294 295 *SwarmKit* monitors node health. In the case of node failures, it re-schedules tasks to other nodes. 296 297 An operator can manually define the *Availability* of a node and can *Pause* and *Drain* nodes. 298 299 Let's put `node-1` into maintenance mode: 300 301 ``` 302 $ swarmctl node drain node-1 303 304 $ swarmctl node ls 305 ID Name Membership Status Availability Manager Status 306 -- ---- ---------- ------ ------------ -------------- 307 3x12fpoi36eujbdkgdnbvbi6r node-2 ACCEPTED READY ACTIVE 308 4spl3tyipofoa2iwqgabsdcve node-1 ACCEPTED READY DRAIN REACHABLE * 309 dknwk1uqxhnyyujq66ho0h54t node-3 ACCEPTED READY ACTIVE 310 311 $ swarmctl service inspect redis 312 ID : 08ecg7vc7cbf9k57qs722n2le 313 Name : redis 314 Replicas : 6/6 315 Update Status 316 State : COMPLETED 317 Started : 2 minutes ago 318 Completed : 1 minute ago 319 Message : update completed 320 Template 321 Container 322 Image : redis:3.0.7 323 324 Task ID Service Slot Image Desired State Last State Node 325 ------- ------- ---- ----- ------------- ---------- ---- 326 8uy2fy8dqbwmlvw5iya802tj0 redis 1 redis:3.0.7 RUNNING RUNNING 23 seconds ago node-2 327 7h9lgvidypcr7q1k3lfgohb42 redis 2 redis:3.0.7 RUNNING RUNNING 2 minutes ago node-3 328 ae4dl0chk3gtwm1100t5yeged redis 3 redis:3.0.7 RUNNING RUNNING 23 seconds ago node-3 329 9fz7fxbg0igypstwliyameobs redis 4 redis:3.0.7 RUNNING RUNNING 2 minutes ago node-3 330 drzndxnjz3c8iujdewzaplgr6 redis 5 redis:3.0.7 RUNNING RUNNING 23 seconds ago node-2 331 7rcgciqhs4239quraw7evttyf redis 6 redis:3.0.7 RUNNING RUNNING 2 minutes ago node-2 332 ``` 333 334 As you can see, every Task running on `node-1` was rebalanced to either `node-2` or `node-3` by the reconciliation loop.