github.com/sentienttechnologies/studio-go-runner@v0.0.0-20201118202441-6d21f2ced8ee/examples/docker/README.md (about) 1 # Docker Desktop multi runner deployment 2 3 This document discusses how to run a Docker Desktop deployment on a single Laptop or Desktop. 4 5 These instructions are intended for Mac or Windows experimenters. For Linux please see the (Linux Kubernetes local example)[examples/local/README.md] 6 7 These instructions are generally intended for CPU users, however they can also apply to multiple GPUs within a single host if the [nvidia for docker tooling](https://github.com/NVIDIA/nvidia-docker) is installed. 8 9 The motivation behind this style of deployment of the runner is for cases where python based applications or frameworks and libraries they use are not capable of scaling beyond a single thread of execution, or are not thread-safe. 10 11 <!--ts--> 12 13 Table of Contents 14 ================= 15 16 * [Docker Desktop multi runner deployment](#docker-desktop-multi-runner-deployment) 17 * [Table of Contents](#table-of-contents) 18 * [Introduction](#introduction) 19 * [Pre-requisites](#pre-requisites) 20 * [Docker Desktop](#docker-desktop) 21 * [Kubernetes CLI](#kubernetes-cli) 22 * [Minio CLI](#minio-cli) 23 * [Validation](#validation) 24 * [Configuration and Deployment](#configuration-and-deployment) 25 * [Create storage service](#create-storage-service) 26 * [Create the cluster](#create-the-cluster) 27 * [Validation](#validation-1) 28 * [A note on performance monitoring](#a-note-on-performance-monitoring) 29 * [Using the Cluster](#using-the-cluster) 30 * [Starting experiments](#starting-experiments) 31 * [Retrieving results](#retrieving-results) 32 <!--te--> 33 34 # Introduction 35 36 Using this document you will be able to run multiple studioml go runners on a single docker host. 37 38 # Pre-requisites 39 40 Before using the following instructions experimenters will need to have [Docker Desktop 2.3+ service installed](https://www.docker.com/products/docker-desktop). 41 42 This option requires at least 8Gb of memory in the minimal setups. 43 44 Any tools and servers used within the deployment are version controlled by the dockerhub container registry and so do not need to be specified. 45 46 ## Docker Desktop 47 48 Once Docker Desktop is installed use the Windows Start-\>Docker menu, or Mac OSX menubar for Docker Desktop to perform the following actions : 49 50 * Use the Preferences Resources tab to increase the amount of RAM allocated to Docker to at least 8Gb. 51 52 * Activate the Kubernetes feature using the Preferences option in the menu. In addition the menu should show a green light and the "Kubernetes is running" indication inside the menu Kubernetes has initialized and is ready for use. For more details please see, [https://docs.docker.com/desktop/](https://docs.docker.com/desktop/). 53 54 * Use the Kubernetes menu item to check that the Kubernetes instance installed and defaults to is the 'docker-desktop' instance. 55 56 * Export the kubectl configuration for your local cluster, see instructions in the validation section. 57 58 ## Kubernetes CLI 59 60 kubectl can be installed using instructions found at: 61 62 - kubectl https://kubernetes.io/docs/tasks/tools/install-kubectl/ 63 64 ## Minio CLI 65 66 Minio offers a client for the file server inside the docker cluster called, [mc](https://docs.min.io/docs/minio-client-quickstart-guide.html). 67 68 The quickstart guide details installation for Windows, and Mac. For Mac [Homebrew](https://brew.sh/) is used as shown: 69 70 ``` 71 brew install minio/stable/mc 72 ``` 73 74 ## Validation 75 76 docker context export default --kubeconfig ~/.kube/docker.kubeconfig 77 78 To validate your installation you can now leave the KUBE\_CONFIG, and KUBECONFIG environment variables set, or set then to point at your exported configuration file '~/.kube/docker.kubeconfig', this will allow the kubectl tool to default to using your localhost to communicate with the cluster. 79 80 Now the kubectl command access can be tested as shown in the following Mac example: 81 82 ``` 83 $ kubectl get nodes 84 NAME STATUS ROLES AGE VERSION 85 docker-desktop Ready master 2m12s v1.16.6-beta.0 86 $ kubectl describe nodes 87 Name: docker-desktop 88 Roles: master 89 Labels: beta.kubernetes.io/arch=amd64 90 beta.kubernetes.io/os=linux 91 kubernetes.io/arch=amd64 92 kubernetes.io/hostname=docker-desktop 93 kubernetes.io/os=linux 94 node-role.kubernetes.io/master= 95 Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock 96 node.alpha.kubernetes.io/ttl: 0 97 volumes.kubernetes.io/controller-managed-attach-detach: true 98 CreationTimestamp: Mon, 04 May 2020 15:17:10 -0700 99 Taints: <none> 100 Unschedulable: false 101 Lease: 102 HolderIdentity: docker-desktop 103 AcquireTime: <unset> 104 RenewTime: Mon, 04 May 2020 16:17:12 -0700 105 Conditions: 106 Type Status LastHeartbeatTime LastTransitionTime Reason Message 107 ---- ------ ----------------- ------------------ ------ ------- 108 MemoryPressure False Mon, 04 May 2020 16:16:23 -0700 Mon, 04 May 2020 15:17:08 -0700 KubeletHasSufficientMemory kubelet has sufficient memory available 109 DiskPressure False Mon, 04 May 2020 16:16:23 -0700 Mon, 04 May 2020 15:17:08 -0700 KubeletHasNoDiskPressure kubelet has no disk pressure 110 PIDPressure False Mon, 04 May 2020 16:16:23 -0700 Mon, 04 May 2020 15:17:08 -0700 KubeletHasSufficientPID kubelet has sufficient PID available 111 Ready True Mon, 04 May 2020 16:16:23 -0700 Mon, 04 May 2020 15:17:08 -0700 KubeletReady kubelet is posting ready status 112 Addresses: 113 InternalIP: 192.168.65.3 114 Hostname: docker-desktop 115 Capacity: 116 cpu: 6 117 ephemeral-storage: 61255492Ki 118 hugepages-1Gi: 0 119 hugepages-2Mi: 0 120 memory: 2038544Ki 121 pods: 110 122 Allocatable: 123 cpu: 6 124 ephemeral-storage: 56453061334 125 hugepages-1Gi: 0 126 hugepages-2Mi: 0 127 memory: 1936144Ki 128 pods: 110 129 System Info: 130 Machine ID: cff33312-1793-4201-829d-010a1525d327 131 System UUID: fb714256-0000-0000-a61c-ee3a89604c3a 132 Boot ID: 1d42a706-7f4f-4c91-8ec9-fd53bf1351bc 133 Kernel Version: 4.19.76-linuxkit 134 OS Image: Docker Desktop 135 Operating System: linux 136 Architecture: amd64 137 Container Runtime Version: docker://19.3.8 138 Kubelet Version: v1.16.6-beta.0 139 Kube-Proxy Version: v1.16.6-beta.0 140 Non-terminated Pods: (11 in total) 141 Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE 142 --------- ---- ------------ ---------- --------------- ------------- --- 143 docker compose-78f95d4f8c-6lp49 0 (0%) 0 (0%) 0 (0%) 0 (0%) 58m 144 docker compose-api-6ffb89dc58-qgnpq 0 (0%) 0 (0%) 0 (0%) 0 (0%) 58m 145 kube-system coredns-5644d7b6d9-2xr4r 100m (1%) 0 (0%) 70Mi (3%) 170Mi (8%) 59m 146 kube-system coredns-5644d7b6d9-vvpzk 100m (1%) 0 (0%) 70Mi (3%) 170Mi (8%) 59m 147 kube-system etcd-docker-desktop 0 (0%) 0 (0%) 0 (0%) 0 (0%) 58m 148 kube-system kube-apiserver-docker-desktop 250m (4%) 0 (0%) 0 (0%) 0 (0%) 58m 149 kube-system kube-controller-manager-docker-desktop 200m (3%) 0 (0%) 0 (0%) 0 (0%) 58m 150 kube-system kube-proxy-tdsn2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 59m 151 kube-system kube-scheduler-docker-desktop 100m (1%) 0 (0%) 0 (0%) 0 (0%) 58m 152 kube-system storage-provisioner 0 (0%) 0 (0%) 0 (0%) 0 (0%) 58m 153 kube-system vpnkit-controller 0 (0%) 0 (0%) 0 (0%) 0 (0%) 58m 154 Allocated resources: 155 (Total limits may be over 100 percent, i.e., overcommitted.) 156 Resource Requests Limits 157 -------- -------- ------ 158 cpu 750m (12%) 0 (0%) 159 memory 140Mi (7%) 340Mi (17%) 160 ephemeral-storage 0 (0%) 0 (0%) 161 Events: 162 Type Reason Age From Message 163 ---- ------ ---- ---- ------- 164 Normal Starting 60m kubelet, docker-desktop Starting kubelet. 165 Normal NodeHasSufficientMemory 60m (x8 over 60m) kubelet, docker-desktop Node docker-desktop status is now: NodeHasSufficientMemory 166 Normal NodeHasNoDiskPressure 60m (x8 over 60m) kubelet, docker-desktop Node docker-desktop status is now: NodeHasNoDiskPressure 167 Normal NodeHasSufficientPID 60m (x7 over 60m) kubelet, docker-desktop Node docker-desktop status is now: NodeHasSufficientPID 168 Normal NodeAllocatableEnforced 60m kubelet, docker-desktop Updated Node Allocatable limit across pods 169 Normal Starting 59m kube-proxy, docker-desktop Starting kube-proxy. 170 ``` 171 172 # Configuration and Deployment 173 174 ## Create storage service 175 176 Minio is used to create a storage server for runner clusters when AWS is not being used. This step will create a storage service with 10Gb of space. It uses the persistent volume claim feature to retain any data the server has been sent and to prevent restarts from loosing the data. The following steps are a summary of what is needed to standup the server: 177 178 ``` 179 kubectl create -f https://raw.githubusercontent.com/minio/minio/RELEASE.2020-05-16T01-33-21Z/docs/orchestration/kubernetes/minio-standalone-pvc.yaml 180 kubectl create -f https://raw.githubusercontent.com/minio/minio/RELEASE.2020-05-16T01-33-21Z/docs/orchestration/kubernetes/minio-standalone-deployment.yaml 181 kubectl create -f https://raw.githubusercontent.com/minio/minio/RELEASE.2020-05-16T01-33-21Z/docs/orchestration/kubernetes/minio-standalone-service.yaml 182 ``` 183 184 185 More detailed information is available from [Minio Standalone Deployment](https://github.com/minio/minio/blob/master/docs/orchestration/kubernetes/k8s-yaml.md#minio-standalone-server-deployment). 186 187 ## Create the cluster 188 189 To create the cluster a Kubernetes deployment yaml file is used and can be applied using applied using the 'kubectl -f [filename]' command. The deployment file can be obtained from this github project at [examples/docker/deployment.yaml](https://raw.githubusercontent.com/leaf-ai/studio-go-runner/master/examples/docker/deployment.yaml). 190 191 Before applying this file examine its contents and locate the studioml-go-runner-deployment deployment section, and then the resources subsection . The resources subsection contains the hardware resources that will be assigned to the studioml runner pod. Edit the resources to fit with your local machines capabilities and the resources needed to run your workloads. The default 'replicas' value in the studioml-go-runner-deployment deployment section is set to 1 to reflect having a single runner. 192 193 The runner will divide the up the resources it has been allocated to service jobs arriving from your local 'studio run', or completion service. As jobs are received by the runner the work will be apportioned by the runner and once the runner has allocated the resources that it has available it will stop secheduling more workers until sufficent resources are released. On a single node there is no need to run more than one runner, expect in testing situations and the like where there might be a functional requirement. 194 195 You should also examine the cpu and memory sizings to ensure that the runner deployment pod fits and can be run by the cluster, if not they will remain in a 'Pending' state. This can be done using the 'kubectl describe node' command and examining the hardware assigned to run the cluster. 196 197 Once you have checked the deployment file it can be applied as follows: 198 199 ``` 200 export KUBE_CONFIG=~/.kube/docker.kubeconfig 201 export KUBECONFIG=~/.kube/docker.kubeconfig 202 ``` 203 204 or 205 206 ``` 207 unset KUBE_CONFIG 208 unset KUBECONFIG 209 ``` 210 211 then 212 213 ``` 214 kubectl apply -f deployment.yaml 215 ``` 216 217 ## Validation 218 219 Having created the services you can validate access to your freshly deployed services as shown in the following example: 220 221 ``` 222 $ kubectl get svc 223 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 224 kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 20h 225 minio-service LoadBalancer 10.104.248.60 localhost 9000:30767/TCP 10m 226 rabbitmq-service LoadBalancer 10.104.168.157 localhost 15672:30790/TCP,5672:31312/TCP 2m22s 227 ``` 228 229 230 You will notice that the ports have been exposed to the localhost interface of your Mac or Windows machine. This allows you to for example use your browser to access minio on 'http://localhost:9000', using a username of 'minio' and password of 'minio123'. The rabbitMQ administration interface is on 'http://localhost:9000', username 'guest', and password 'guest'. 231 232 Clearly an insecure deployment intended just for testing, and benchmarking purposes. If you wish to deploy these services with your own usernames and passwords examine the YAML files used for deployments and modify them with appropriate values for your situation. 233 234 For more information on exposing ports from Kubernetes please see, [accessing an application in Kubernetes](https://medium.com/@lizrice/accessing-an-application-on-kubernetes-in-docker-1054d46b64b1) 235 236 ## A note on performance monitoring 237 238 There are two basic ways to get a sense of dynamic CPU and memory consumption. 239 240 * The first is to use 'docker stats'. This is the simplest and probably best approach. 241 242 * The second is to use the Kubernetes Web UI dashboard, more details below. 243 244 If you wish to use dashboard style monitoring of your local clusters resource consumption you can use the Kubernetes Dashboard which has an introduction at [Web UI (Dashboard)](https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/), and detailed access and installation instructions at, [https://github.com/kubernetes/dashboard](https://github.com/kubernetes/dashboard/blob/master/README.md). 245 246 # Using the Cluster 247 248 ## Starting experiments 249 250 Having deployed the cluster we can now launch studio experiments using the localhost for our queue and for our storage. To do this your studioml config.yaml file should be updated something like the following: 251 252 ``` 253 database: 254 type: s3 255 endpoint: http://minio-service.default.svc.cluster.local:9000 256 bucket: metadata 257 authentication: none 258 259 storage: 260 type: s3 261 endpoint: http://minio-service.default.svc.cluster.local:9000 262 bucket: storage 263 264 cloud: 265 queue: 266 rmq: "amqp://guest:guest@rabbitmq-service.default.svc.cluster.local:5672/%2f?connection_attempts=30&retry_delay=.5&socket_timeout=5" 267 268 server: 269 authentication: None 270 271 resources_needed: 272 cpus: 1 273 hdd: 10gb 274 ram: 2gb 275 276 env: 277 AWS_ACCESS_KEY_ID: minio 278 AWS_SECRET_ACCESS_KEY: minio123 279 AWS_DEFAULT_REGION: us-west-2 280 281 verbose: debug 282 ``` 283 284 In order to access the minio and rabbitMQ servers the host names being used will need to match between the experiment host where experiments are launched and host names inside the compute cluster. To do this the /etc/hosts, typically using 'sudo vim /etc/hosts', file of your local experiment host will need the following line added. 285 286 ``` 287 127.0.0.1 minio-service.default.svc.cluster.local rabbitmq-service.default.svc.cluster.local 288 ``` 289 290 If you wish you can use one of the examples provided by the StudioML python client to test your configuration, github.com/studioml/studio/examples/keras. Doing this will look like the following example: 291 292 ``` 293 cd studio/examples/keras 294 export AWS_ACCESS_KEY_ID=minio 295 export AWS_SECRET_ACCESS_KEY=minio123 296 studio run --lifetime=30m --max-duration=20m --gpus 0 --queue=rmq_kmutch --force-git train_mnist_keras.py 297 ``` 298 299 ## Retrieving results 300 301 There are many ways that can be used to retrieve experiment results from the minio server. 302 303 The Minio Client (mc) mentioned as a prerequiste can be used to extract data from folders on the minio recursively as shown in the following example: 304 305 ``` 306 mc config host add docker-desktop http://minio-service.default.svc.cluster.local:9000 minio minio123 307 mc cp --recursive docker-desktop/storage/experiments experiment-results 308 ``` 309 310 It should be noted that the bucket names in the above example originate from the ~/.studioml/config.yaml file. 311 312 Additional information related to the minio client can be found at [MinIO Client Complete Guide](https://docs.min.io/docs/minio-client-complete-guide.html). 313 314 Copyright © 2020 Cognizant Digital Business, Evolutionary AI. All rights reserved. Issued under the Apache 2.0 license.