github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.9.x/deploy-manage/deploy/google_cloud_platform.md (about) 1 # Google Cloud Platform 2 3 Google Cloud Platform provides seamless support for Kubernetes. 4 Therefore, Pachyderm is fully supported on [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/) (GKE). 5 The following section walks you through deploying a Pachyderm cluster on GKE. 6 7 ## Prerequisites 8 9 - [Google Cloud SDK](https://cloud.google.com/sdk/) >= 124.0.0 10 - [kubectl](https://kubernetes.io/docs/user-guide/prereqs/) 11 - [pachctl](#install-pachctl) 12 13 If this is the first time you use the SDK, follow 14 the [Google SDK QuickStart Guide](https://cloud.google.com/sdk/docs/quickstarts). 15 !!! note 16 When you follow the QuickStart Guide, you might update your `~/.bash_profile` 17 and point your `$PATH` at the location where you extracted 18 `google-cloud-sdk`. However, Pachyderm recommends that you extract 19 the SDK to `~/bin`. 20 21 !!! tip 22 You can install `kubectl` by using the Google Cloud SDK and 23 running the following command: 24 25 ```shell 26 $ gcloud components install kubectl 27 ``` 28 29 ## Deploy Kubernetes 30 31 To create a new Kubernetes cluster by using GKE, run: 32 33 ```bassh 34 $ CLUSTER_NAME=<any unique name, e.g. "pach-cluster"> 35 36 $ GCP_ZONE=<a GCP availability zone. e.g. "us-west1-a"> 37 38 $ gcloud config set compute/zone ${GCP_ZONE} 39 40 $ gcloud config set container/cluster ${CLUSTER_NAME} 41 42 $ MACHINE_TYPE=<machine type for the k8s nodes, we recommend "n1-standard-4" or larger> 43 44 # By default the following command spins up a 3-node cluster. You can change the default with `--num-nodes VAL`. 45 $ gcloud container clusters create ${CLUSTER_NAME} --scopes storage-rw --machine-type ${MACHINE_TYPE} 46 47 # By default, GKE clusters have RBAC enabled. To allow 'pachctl deploy' to give the 'pachyderm' service account 48 # the requisite privileges via clusterrolebindings, you will need to grant *your user account* the privileges 49 # needed to create those clusterrolebindings. 50 # 51 # Note that this command is simple and concise, but gives your user account more privileges than necessary. See 52 # https://docs.pachyderm.io/en/latest/deployment/rbac.html for the complete list of privileges that the 53 # pachyderm serviceaccount needs. 54 $ kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=$(gcloud config get-value account) 55 ``` 56 57 !!! note "Important" 58 You must create the Kubernetes cluster by using the `gcloud` command-line 59 tool rather than the Google Cloud Console, as you can grant the 60 `storage-rw` scope through the command-line tool only. 61 62 This migth take a few minutes to start up. You can check the status on 63 the [GCP Console](https://console.cloud.google.com/compute/instances). 64 A `kubeconfig` entry is automatically generated and set as the current 65 context. As a sanity check, make sure your cluster is up and running 66 by running the following `kubectl` command: 67 68 ```shell 69 # List all pods in the kube-system namespace. 70 $ kubectl get pods -n kube-system 71 NAME READY STATUS RESTARTS AGE 72 event-exporter-v0.1.7-5c4d9556cf-fd9j2 2/2 Running 0 1m 73 fluentd-gcp-v2.0.9-68vhs 2/2 Running 0 1m 74 fluentd-gcp-v2.0.9-fzfpw 2/2 Running 0 1m 75 fluentd-gcp-v2.0.9-qvk8f 2/2 Running 0 1m 76 heapster-v1.4.3-5fbfb6bf55-xgdwx 3/3 Running 0 55s 77 kube-dns-778977457c-7hbrv 3/3 Running 0 1m 78 kube-dns-778977457c-dpff4 3/3 Running 0 1m 79 kube-dns-autoscaler-7db47cb9b7-gp5ns 1/1 Running 0 1m 80 kube-proxy-gke-pach-cluster-default-pool-9762dc84-bzcz 1/1 Running 0 1m 81 kube-proxy-gke-pach-cluster-default-pool-9762dc84-hqkr 1/1 Running 0 1m 82 kube-proxy-gke-pach-cluster-default-pool-9762dc84-jcbg 1/1 Running 0 1m 83 kubernetes-dashboard-768854d6dc-t75rp 1/1 Running 0 1m 84 l7-default-backend-6497bcdb4d-w72k5 1/1 Running 0 1m 85 ``` 86 87 If you *don't* see something similar to the above output, 88 you can point `kubectl` to the new cluster manually by running 89 the following command: 90 91 ```shell 92 # Update your kubeconfig to point at your newly created cluster. 93 $ gcloud container clusters get-credentials ${CLUSTER_NAME} 94 ``` 95 96 ## Deploy Pachyderm 97 98 To deploy Pachyderm we will need to: 99 100 1. [Create storage resources](#set-up-the-storage-resources), 101 2. [Install the Pachyderm CLI tool, `pachctl`](#install-pachctl), and 102 3. [Deploy Pachyderm on the Kubernetes cluster](#deploy-pachyderm-on-the-kubernetes-cluster) 103 104 ### Set up the Storage Resources 105 106 Pachyderm needs a [GCS bucket](https://cloud.google.com/storage/docs/) 107 and a [persistent disk](https://cloud.google.com/compute/docs/disks/) 108 to function correctly. You can specify the size of the persistent 109 disk, the bucket name, and create the bucket by running the following 110 commands: 111 112 ```shell 113 # For the persistent disk, 10GB is a good size to start with. 114 # This stores PFS metadata. For reference, 1GB 115 # should work fine for 1000 commits on 1000 files. 116 $ STORAGE_SIZE=<the size of the volume that you are going to create, in GBs. e.g. "10"> 117 118 # The Pachyderm bucket name needs to be globally unique across the entire GCP region. 119 $ BUCKET_NAME=<The name of the GCS bucket where your data will be stored> 120 121 # Create the bucket. 122 $ gsutil mb gs://${BUCKET_NAME} 123 ``` 124 125 To check that everything has been set up correctly, run: 126 127 ```shell 128 $ gsutil ls 129 # You should see the bucket you created. 130 ``` 131 132 ### Install `pachctl` 133 134 `pachctl` is a command-line utility for interacting with a Pachyderm cluster. You can install it locally as follows: 135 136 ```shell 137 # For macOS: 138 $ brew tap pachyderm/tap && brew install pachyderm/tap/pachctl@1.10 139 140 # For Linux (64 bit) or Window 10+ on WSL: 141 $ curl -o /tmp/pachctl.deb -L https://github.com/pachyderm/pachyderm/releases/download/v1.10.0/pachctl_1.10.0_amd64.deb && sudo dpkg -i /tmp/pachctl.deb 142 ``` 143 144 You can then run `pachctl version --client-only` to check that the installation was successful. 145 146 ```shell 147 $ pachctl version --client-only 148 1.9.7 149 ``` 150 151 ### Deploy Pachyderm on the Kubernetes cluster 152 153 Now you can deploy a Pachyderm cluster by running this command: 154 155 ```shell 156 $ pachctl deploy google ${BUCKET_NAME} ${STORAGE_SIZE} --dynamic-etcd-nodes=1 157 serviceaccount "pachyderm" created 158 storageclass "etcd-storage-class" created 159 service "etcd-headless" created 160 statefulset "etcd" created 161 service "etcd" created 162 service "pachd" created 163 deployment "pachd" created 164 service "dash" created 165 deployment "dash" created 166 secret "pachyderm-storage-secret" created 167 168 Pachyderm is launching. Check its status with "kubectl get all" 169 Once launched, access the dashboard by running "pachctl port-forward" 170 ``` 171 172 !!! note 173 Pachyderm uses one etcd node to manage Pachyderm metadata. 174 175 !!! note "Important" 176 If RBAC authorization is a requirement or you run into any RBAC 177 errors see [Configure RBAC](rbac.md). 178 179 It may take a few minutes for the pachd nodes to be running because Pachyderm 180 pulls containers from DockerHub. You can see the cluster status with 181 `kubectl`, which should output the following when Pachyderm is up and running: 182 183 ```shell 184 $ kubectl get pods 185 NAME READY STATUS RESTARTS AGE 186 dash-482120938-np8cc 2/2 Running 0 4m 187 etcd-0 1/1 Running 0 4m 188 pachd-3677268306-9sqm0 1/1 Running 0 4m 189 ``` 190 191 If you see a few restarts on the `pachd` pod, you can safely ignore them. 192 That simply means that Kubernetes tried to bring up those containers 193 before other components were ready, so it restarted them. 194 195 Finally, assuming your `pachd` is running as shown above, set up 196 forward a port so that `pachctl` can talk to the cluster. 197 198 ```shell 199 # Forward the ports. We background this process because it blocks. 200 $ pachctl port-forward & 201 ``` 202 203 And you're done! You can test to make sure the cluster is working 204 by running `pachctl version` or even creating a new repo. 205 206 ```shell 207 208 $ pachctl version 209 COMPONENT VERSION 210 pachctl 1.9.7 211 pachd 1.9.7 212 ``` 213 214 ### Increasing Ingress Throughput 215 216 One way to improve Ingress performance is to restrict Pachd to 217 a specific, more powerful node in the cluster. This is 218 accomplished by the use of [node-taints](https://cloud.google.com/kubernetes-engine/docs/how-to/node-taints) 219 in GKE. By creating a node-taint for `pachd`, you configure the 220 Kubernetes scheduler to run only the `pachd` pod on that node. After 221 that’s completed, you can deploy Pachyderm with the `--pachd-cpu-request` 222 and `--pachd-memory-request` set to match the resources limits of the 223 machine type. And finally, you need to modify the `pachd` deployment 224 so that it has an appropriate toleration: 225 226 ```shell 227 tolerations: 228 - key: "dedicated" 229 operator: "Equal" 230 value: "pachd" 231 effect: "NoSchedule" 232 ``` 233 234 ### Increasing upload performance 235 236 The most straightfoward approach to increasing upload performance is 237 to [leverage SSD’s as the boot disk](https://cloud.google.com/kubernetes-engine/docs/how-to/custom-boot-disks) in 238 your cluster because SSDs provide higher throughput and lower latency than 239 HDD disks. Additionally, you can increase the size of the SSD for 240 further performance gains because the number of IOPS increases with 241 disk size. 242 243 ### Increasing merge performance 244 245 Performance tweaks when it comes to merges can be done directly in 246 the [Pachyderm pipeline spec](../../../reference/pipeline_spec/). 247 More specifically, you can increase the number of hashtrees (hashtree spec) 248 in the pipeline spec. This number determines the number of shards for the 249 filesystem metadata. In general this number should be lower than the number 250 of workers (parallelism spec) and should not be increased unless merge time 251 (the time before the job is done and after the number of processed datums + 252 skipped datums is equal to the total datums) is too slow.