github.com/sentienttechnologies/studio-go-runner@v0.0.0-20201118202441-6d21f2ced8ee/examples/docker/README.md

github.com/sentienttechnologies/studio-go-runner@v0.0.0-20201118202441-6d21f2ced8ee/examples/docker/README.md (about)

     1  # Docker Desktop multi runner deployment
     2  
     3  This document discusses how to run a Docker Desktop deployment on a single Laptop or Desktop.
     4  
     5  These instructions are intended for Mac or Windows experimenters.  For Linux please see the (Linux Kubernetes local example)[examples/local/README.md]
     6  
     7  These instructions are generally intended for CPU users, however they can also apply to multiple GPUs within a single host if the [nvidia for docker tooling](https://github.com/NVIDIA/nvidia-docker) is installed.
     8  
     9  The motivation behind this style of deployment of the runner is for cases where python based applications or frameworks and libraries they use are not capable of scaling beyond a single thread of execution, or are not thread-safe.
    10  
    11  <!--ts-->
    12  
    13  Table of Contents
    14  =================
    15  
    16  * [Docker Desktop multi runner deployment](#docker-desktop-multi-runner-deployment)
    17  * [Table of Contents](#table-of-contents)
    18  * [Introduction](#introduction)
    19  * [Pre-requisites](#pre-requisites)
    20    * [Docker Desktop](#docker-desktop)
    21    * [Kubernetes CLI](#kubernetes-cli)
    22    * [Minio CLI](#minio-cli)
    23    * [Validation](#validation)
    24  * [Configuration and Deployment](#configuration-and-deployment)
    25    * [Create storage service](#create-storage-service)
    26    * [Create the cluster](#create-the-cluster)
    27    * [Validation](#validation-1)
    28    * [A note on performance monitoring](#a-note-on-performance-monitoring)
    29  * [Using the Cluster](#using-the-cluster)
    30    * [Starting experiments](#starting-experiments)
    31    * [Retrieving results](#retrieving-results)
    32  <!--te-->
    33  
    34  # Introduction
    35  
    36  Using this document you will be able to run multiple studioml go runners on a single docker host.
    37  
    38  # Pre-requisites
    39  
    40  Before using the following instructions experimenters will need to have [Docker Desktop 2.3+ service installed](https://www.docker.com/products/docker-desktop).
    41  
    42  This option requires at least 8Gb of memory in the minimal setups.
    43  
    44  Any tools and servers used within the deployment are version controlled by the dockerhub container registry and so do not need to be specified.
    45  
    46  ## Docker Desktop
    47  
    48  Once Docker Desktop is installed use the Windows Start-\>Docker menu, or Mac OSX menubar for Docker Desktop to perform the following actions :
    49  
    50  * Use the Preferences Resources tab to increase the amount of RAM allocated to Docker to at least 8Gb.
    51  
    52  * Activate the Kubernetes feature using the Preferences option in the menu. In addition the menu should show a green light and the "Kubernetes is running" indication inside the menu Kubernetes has initialized and is ready for use.  For more details please see, [https://docs.docker.com/desktop/](https://docs.docker.com/desktop/).
    53  
    54  * Use the Kubernetes menu item to check that the Kubernetes instance installed and defaults to is the 'docker-desktop' instance.
    55  
    56  * Export the kubectl configuration for your local cluster, see instructions in the validation section.
    57  
    58  ## Kubernetes CLI
    59  
    60  kubectl can be installed using instructions found at:
    61  
    62  - kubectl https://kubernetes.io/docs/tasks/tools/install-kubectl/
    63  
    64  ## Minio CLI
    65  
    66  Minio offers a client for the file server inside the docker cluster called, [mc](https://docs.min.io/docs/minio-client-quickstart-guide.html).
    67  
    68  The quickstart guide details installation for Windows, and Mac.  For Mac [Homebrew](https://brew.sh/) is used as shown:
    69  
    70  ```
    71  brew install minio/stable/mc
    72  ```
    73  
    74  ## Validation
    75  
    76  docker context export default --kubeconfig ~/.kube/docker.kubeconfig
    77  
    78  To validate your installation you can now leave the KUBE\_CONFIG, and KUBECONFIG environment variables set, or set then to point at your exported configuration file '~/.kube/docker.kubeconfig', this will allow the kubectl tool to default to using your localhost to communicate with the cluster.
    79  
    80  Now the kubectl command access can be tested as shown in the following Mac example:
    81  
    82  ```
    83  $ kubectl get nodes
    84  NAME             STATUS   ROLES    AGE     VERSION
    85  docker-desktop   Ready    master   2m12s   v1.16.6-beta.0
    86  $ kubectl describe nodes
    87  Name:               docker-desktop
    88  Roles:              master
    89  Labels:             beta.kubernetes.io/arch=amd64
    90                      beta.kubernetes.io/os=linux
    91                      kubernetes.io/arch=amd64
    92                      kubernetes.io/hostname=docker-desktop
    93                      kubernetes.io/os=linux
    94                      node-role.kubernetes.io/master=
    95  Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
    96                      node.alpha.kubernetes.io/ttl: 0
    97                      volumes.kubernetes.io/controller-managed-attach-detach: true
    98  CreationTimestamp:  Mon, 04 May 2020 15:17:10 -0700
    99  Taints:             <none>
   100  Unschedulable:      false
   101  Lease:
   102    HolderIdentity:  docker-desktop
   103    AcquireTime:     <unset>
   104    RenewTime:       Mon, 04 May 2020 16:17:12 -0700
   105  Conditions:
   106    Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
   107    ----             ------  -----------------                 ------------------                ------                       -------
   108    MemoryPressure   False   Mon, 04 May 2020 16:16:23 -0700   Mon, 04 May 2020 15:17:08 -0700   KubeletHasSufficientMemory   kubelet has sufficient memory available
   109    DiskPressure     False   Mon, 04 May 2020 16:16:23 -0700   Mon, 04 May 2020 15:17:08 -0700   KubeletHasNoDiskPressure     kubelet has no disk pressure
   110    PIDPressure      False   Mon, 04 May 2020 16:16:23 -0700   Mon, 04 May 2020 15:17:08 -0700   KubeletHasSufficientPID      kubelet has sufficient PID available
   111    Ready            True    Mon, 04 May 2020 16:16:23 -0700   Mon, 04 May 2020 15:17:08 -0700   KubeletReady                 kubelet is posting ready status
   112  Addresses:
   113    InternalIP:  192.168.65.3
   114    Hostname:    docker-desktop
   115  Capacity:
   116    cpu:                6
   117    ephemeral-storage:  61255492Ki
   118    hugepages-1Gi:      0
   119    hugepages-2Mi:      0
   120    memory:             2038544Ki
   121    pods:               110
   122  Allocatable:
   123    cpu:                6
   124    ephemeral-storage:  56453061334
   125    hugepages-1Gi:      0
   126    hugepages-2Mi:      0
   127    memory:             1936144Ki
   128    pods:               110
   129  System Info:
   130    Machine ID:                 cff33312-1793-4201-829d-010a1525d327
   131    System UUID:                fb714256-0000-0000-a61c-ee3a89604c3a
   132    Boot ID:                    1d42a706-7f4f-4c91-8ec9-fd53bf1351bc
   133    Kernel Version:             4.19.76-linuxkit
   134    OS Image:                   Docker Desktop
   135    Operating System:           linux
   136    Architecture:               amd64
   137    Container Runtime Version:  docker://19.3.8
   138    Kubelet Version:            v1.16.6-beta.0
   139    Kube-Proxy Version:         v1.16.6-beta.0
   140  Non-terminated Pods:          (11 in total)
   141    Namespace                   Name                                      CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
   142    ---------                   ----                                      ------------  ----------  ---------------  -------------  ---
   143    docker                      compose-78f95d4f8c-6lp49                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         58m
   144    docker                      compose-api-6ffb89dc58-qgnpq              0 (0%)        0 (0%)      0 (0%)           0 (0%)         58m
   145    kube-system                 coredns-5644d7b6d9-2xr4r                  100m (1%)     0 (0%)      70Mi (3%)        170Mi (8%)     59m
   146    kube-system                 coredns-5644d7b6d9-vvpzk                  100m (1%)     0 (0%)      70Mi (3%)        170Mi (8%)     59m
   147    kube-system                 etcd-docker-desktop                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         58m
   148    kube-system                 kube-apiserver-docker-desktop             250m (4%)     0 (0%)      0 (0%)           0 (0%)         58m
   149    kube-system                 kube-controller-manager-docker-desktop    200m (3%)     0 (0%)      0 (0%)           0 (0%)         58m
   150    kube-system                 kube-proxy-tdsn2                          0 (0%)        0 (0%)      0 (0%)           0 (0%)         59m
   151    kube-system                 kube-scheduler-docker-desktop             100m (1%)     0 (0%)      0 (0%)           0 (0%)         58m
   152    kube-system                 storage-provisioner                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         58m
   153    kube-system                 vpnkit-controller                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         58m
   154  Allocated resources:
   155    (Total limits may be over 100 percent, i.e., overcommitted.)
   156    Resource           Requests    Limits
   157    --------           --------    ------
   158    cpu                750m (12%)  0 (0%)
   159    memory             140Mi (7%)  340Mi (17%)
   160    ephemeral-storage  0 (0%)      0 (0%)
   161  Events:
   162    Type    Reason                   Age                From                        Message
   163    ----    ------                   ----               ----                        -------
   164    Normal  Starting                 60m                kubelet, docker-desktop     Starting kubelet.
   165    Normal  NodeHasSufficientMemory  60m (x8 over 60m)  kubelet, docker-desktop     Node docker-desktop status is now: NodeHasSufficientMemory
   166    Normal  NodeHasNoDiskPressure    60m (x8 over 60m)  kubelet, docker-desktop     Node docker-desktop status is now: NodeHasNoDiskPressure
   167    Normal  NodeHasSufficientPID     60m (x7 over 60m)  kubelet, docker-desktop     Node docker-desktop status is now: NodeHasSufficientPID
   168    Normal  NodeAllocatableEnforced  60m                kubelet, docker-desktop     Updated Node Allocatable limit across pods
   169    Normal  Starting                 59m                kube-proxy, docker-desktop  Starting kube-proxy.
   170  ```
   171  
   172  # Configuration and Deployment
   173  
   174  ## Create storage service
   175  
   176  Minio is used to create a storage server for runner clusters when AWS is not being used.  This step will create a storage service with 10Gb of space.  It uses the persistent volume claim feature to retain any data the server has been sent and to prevent restarts from loosing the data.  The following steps are a summary of what is needed to standup the server:
   177  
   178  ```
   179  kubectl create -f https://raw.githubusercontent.com/minio/minio/RELEASE.2020-05-16T01-33-21Z/docs/orchestration/kubernetes/minio-standalone-pvc.yaml
   180  kubectl create -f https://raw.githubusercontent.com/minio/minio/RELEASE.2020-05-16T01-33-21Z/docs/orchestration/kubernetes/minio-standalone-deployment.yaml
   181  kubectl create -f https://raw.githubusercontent.com/minio/minio/RELEASE.2020-05-16T01-33-21Z/docs/orchestration/kubernetes/minio-standalone-service.yaml
   182  ```
   183  
   184  
   185  More detailed information is available from [Minio Standalone Deployment](https://github.com/minio/minio/blob/master/docs/orchestration/kubernetes/k8s-yaml.md#minio-standalone-server-deployment).
   186  
   187  ## Create the cluster
   188  
   189  To create the cluster a Kubernetes deployment yaml file is used and can be applied using applied using the 'kubectl -f [filename]' command. The deployment file can be obtained from this github project at [examples/docker/deployment.yaml](https://raw.githubusercontent.com/leaf-ai/studio-go-runner/master/examples/docker/deployment.yaml).
   190  
   191  Before applying this file examine its contents and locate the studioml-go-runner-deployment deployment section, and then the resources subsection .  The resources subsection contains the hardware resources that will be assigned to the studioml runner pod.  Edit the resources to fit with your local machines capabilities and the resources needed to run your workloads.  The default 'replicas' value in the studioml-go-runner-deployment deployment section is set to 1 to reflect having a single runner.
   192  
   193  The runner will divide the up the resources it has been allocated to service jobs arriving from your local 'studio run', or completion service.  As jobs are received by the runner the work will be apportioned by the runner and once the runner has allocated the resources that it has available it will stop secheduling more workers until sufficent resources are released.  On a single node there is no need to run more than one runner, expect in testing situations and the like where there might be a functional requirement.
   194  
   195  You should also examine the cpu and memory sizings to ensure that the runner deployment pod fits and can be run by the cluster, if not they will remain in a 'Pending' state.  This can be done using the 'kubectl describe node' command and examining the hardware assigned to run the cluster.
   196  
   197  Once you have checked the deployment file it can be applied as follows:
   198  
   199  ```
   200  export KUBE_CONFIG=~/.kube/docker.kubeconfig
   201  export KUBECONFIG=~/.kube/docker.kubeconfig
   202  ```
   203  
   204  or
   205  
   206  ```
   207  unset KUBE_CONFIG
   208  unset KUBECONFIG
   209  ```
   210  
   211  then
   212  
   213  ```
   214  kubectl apply -f deployment.yaml
   215  ```
   216  
   217  ## Validation
   218  
   219  Having created the services you can validate access to your freshly deployed services as shown in the following example:
   220  
   221  ```
   222  $ kubectl get svc
   223  NAME               TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                          AGE
   224  kubernetes         ClusterIP      10.96.0.1        <none>        443/TCP                          20h
   225  minio-service      LoadBalancer   10.104.248.60    localhost     9000:30767/TCP                   10m
   226  rabbitmq-service   LoadBalancer   10.104.168.157   localhost     15672:30790/TCP,5672:31312/TCP   2m22s
   227  ```
   228  
   229  
   230  You will notice that the ports have been exposed to the localhost interface of your Mac or Windows machine.  This allows you to for example use your browser to access minio on 'http://localhost:9000', using a username of 'minio' and password of 'minio123'.  The rabbitMQ administration interface is on 'http://localhost:9000', username 'guest', and password 'guest'.
   231  
   232  Clearly an insecure deployment intended just for testing, and benchmarking purposes.  If you wish to deploy these services with your own usernames and passwords examine the YAML files used for deployments and modify them with appropriate values for your situation.
   233  
   234  For more information on exposing ports from Kubernetes please see, [accessing an application in Kubernetes](https://medium.com/@lizrice/accessing-an-application-on-kubernetes-in-docker-1054d46b64b1)
   235  
   236  ## A note on performance monitoring
   237  
   238  There are two basic ways to get a sense of dynamic CPU and memory consumption.
   239  
   240  * The first is to use 'docker stats'.  This is the simplest and probably best approach.
   241  
   242  * The second is to use the Kubernetes Web UI dashboard, more details below.
   243  
   244  If you wish to use dashboard style monitoring of your local clusters resource consumption you can use the Kubernetes Dashboard which has an introduction at [Web UI (Dashboard)](https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/), and detailed access and installation instructions at, [https://github.com/kubernetes/dashboard](https://github.com/kubernetes/dashboard/blob/master/README.md).
   245  
   246  # Using the Cluster
   247  
   248  ## Starting experiments
   249  
   250  Having deployed the cluster we can now launch studio experiments using the localhost for our queue and for our storage.  To do this your studioml config.yaml file should be updated something like the following:
   251  
   252  ```
   253  database:
   254      type: s3
   255      endpoint: http://minio-service.default.svc.cluster.local:9000
   256      bucket: metadata
   257      authentication: none
   258  
   259  storage:
   260      type: s3
   261      endpoint: http://minio-service.default.svc.cluster.local:9000
   262      bucket: storage
   263  
   264  cloud:
   265      queue:
   266          rmq: "amqp://guest:guest@rabbitmq-service.default.svc.cluster.local:5672/%2f?connection_attempts=30&retry_delay=.5&socket_timeout=5"
   267  
   268  server:
   269      authentication: None
   270  
   271  resources_needed:
   272      cpus: 1
   273      hdd: 10gb
   274      ram: 2gb
   275  
   276  env:
   277      AWS_ACCESS_KEY_ID: minio
   278      AWS_SECRET_ACCESS_KEY: minio123
   279      AWS_DEFAULT_REGION: us-west-2
   280  
   281  verbose: debug
   282  ```
   283  
   284  In order to access the minio and rabbitMQ servers the host names being used will need to match between the experiment host where experiments are launched and host names inside the compute cluster.  To do this the /etc/hosts, typically using 'sudo vim /etc/hosts', file of your local experiment host will need the following line added.
   285  
   286  ```
   287  127.0.0.1 minio-service.default.svc.cluster.local rabbitmq-service.default.svc.cluster.local
   288  ```
   289  
   290  If you wish you can use one of the examples provided by the StudioML python client to test your configuration, github.com/studioml/studio/examples/keras. Doing this will look like the following example:
   291  
   292  ```
   293  cd studio/examples/keras
   294  export AWS_ACCESS_KEY_ID=minio
   295  export AWS_SECRET_ACCESS_KEY=minio123
   296  studio run --lifetime=30m --max-duration=20m --gpus 0 --queue=rmq_kmutch --force-git train_mnist_keras.py
   297  ```
   298  
   299  ## Retrieving results
   300  
   301  There are many ways that can be used to retrieve experiment results from the minio server.
   302  
   303  The Minio Client (mc) mentioned as a prerequiste can be used to extract data from folders on the minio recursively as shown in the following example:
   304  
   305  ```
   306  mc config host add docker-desktop http://minio-service.default.svc.cluster.local:9000 minio minio123
   307  mc cp --recursive docker-desktop/storage/experiments experiment-results
   308  ```
   309  
   310  It should be noted that the bucket names in the above example originate from the ~/.studioml/config.yaml file.
   311  
   312  Additional information related to the minio client can be found at [MinIO Client Complete Guide](https://docs.min.io/docs/minio-client-complete-guide.html).
   313  
   314  Copyright © 2020 Cognizant Digital Business, Evolutionary AI. All rights reserved. Issued under the Apache 2.0 license.