github.com/pachyderm/pachyderm@v1.13.4/doc/docs/master/deploy-manage/deploy/google_cloud_platform.md (about) 1 # Google Cloud Platform 2 3 Google Cloud Platform provides seamless support for Kubernetes. 4 Therefore, Pachyderm is fully supported on [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/) (GKE). 5 The following section walks you through deploying a Pachyderm cluster on GKE. 6 7 ## Prerequisites 8 9 - [Google Cloud SDK](https://cloud.google.com/sdk/) >= 124.0.0 10 - [kubectl](https://kubernetes.io/docs/user-guide/prereqs/) 11 - [pachctl](#install-pachctl) 12 13 If this is the first time you use the SDK, follow 14 the [Google SDK QuickStart Guide](https://cloud.google.com/sdk/docs/quickstarts). 15 16 !!! note 17 When you follow the QuickStart Guide, you might update your `~/.bash_profile` 18 and point your `$PATH` at the location where you extracted 19 `google-cloud-sdk`. However, Pachyderm recommends that you extract 20 the SDK to `~/bin`. 21 22 !!! tip 23 You can install `kubectl` by using the Google Cloud SDK and 24 running the following command: 25 26 ```shell 27 gcloud components install kubectl 28 ``` 29 30 ## Deploy Kubernetes 31 32 To create a new Kubernetes cluster by using GKE, run: 33 34 ```shell 35 CLUSTER_NAME=<any unique name, e.g. "pach-cluster"> 36 37 GCP_ZONE=<a GCP availability zone. e.g. "us-west1-a"> 38 39 gcloud config set compute/zone ${GCP_ZONE} 40 41 gcloud config set container/cluster ${CLUSTER_NAME} 42 43 MACHINE_TYPE=<machine type for the k8s nodes, we recommend "n1-standard-4" or larger> 44 45 # By default the following command spins up a 3-node cluster. You can change the default with `--num-nodes VAL`. 46 gcloud container clusters create ${CLUSTER_NAME} --scopes storage-rw --machine-type ${MACHINE_TYPE} 47 48 # By default, GKE clusters have RBAC enabled. To allow 'pachctl deploy' to give the 'pachyderm' service account 49 # the requisite privileges via clusterrolebindings, you will need to grant *your user account* the privileges 50 # needed to create those clusterrolebindings. 51 # 52 # Note that this command is simple and concise, but gives your user account more privileges than necessary. See 53 # https://docs.pachyderm.io/en/latest/deployment/rbac.html for the complete list of privileges that the 54 # pachyderm serviceaccount needs. 55 kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=$(gcloud config get-value account) 56 ``` 57 58 !!! note "Important" 59 You must create the Kubernetes cluster by using the `gcloud` command-line 60 tool rather than the Google Cloud Console, as you can grant the 61 `storage-rw` scope through the command-line tool only. 62 63 This migth take a few minutes to start up. You can check the status on 64 the [GCP Console](https://console.cloud.google.com/compute/instances). 65 A `kubeconfig` entry is automatically generated and set as the current 66 context. As a sanity check, make sure your cluster is up and running 67 by running the following `kubectl` command: 68 69 ```shell 70 # List all pods in the kube-system namespace. 71 kubectl get pods -n kube-system 72 ``` 73 74 **System Response:** 75 76 ```shell 77 NAME READY STATUS RESTARTS AGE 78 event-exporter-v0.1.7-5c4d9556cf-fd9j2 2/2 Running 0 1m 79 fluentd-gcp-v2.0.9-68vhs 2/2 Running 0 1m 80 fluentd-gcp-v2.0.9-fzfpw 2/2 Running 0 1m 81 fluentd-gcp-v2.0.9-qvk8f 2/2 Running 0 1m 82 heapster-v1.4.3-5fbfb6bf55-xgdwx 3/3 Running 0 55s 83 kube-dns-778977457c-7hbrv 3/3 Running 0 1m 84 kube-dns-778977457c-dpff4 3/3 Running 0 1m 85 kube-dns-autoscaler-7db47cb9b7-gp5ns 1/1 Running 0 1m 86 kube-proxy-gke-pach-cluster-default-pool-9762dc84-bzcz 1/1 Running 0 1m 87 kube-proxy-gke-pach-cluster-default-pool-9762dc84-hqkr 1/1 Running 0 1m 88 kube-proxy-gke-pach-cluster-default-pool-9762dc84-jcbg 1/1 Running 0 1m 89 kubernetes-dashboard-768854d6dc-t75rp 1/1 Running 0 1m 90 l7-default-backend-6497bcdb4d-w72k5 1/1 Running 0 1m 91 ``` 92 93 If you *don't* see something similar to the above output, 94 you can point `kubectl` to the new cluster manually by running 95 the following command: 96 97 ```shell 98 # Update your kubeconfig to point at your newly created cluster. 99 gcloud container clusters get-credentials ${CLUSTER_NAME} 100 ``` 101 102 ## Deploy Pachyderm 103 104 To deploy Pachyderm we will need to: 105 106 1. [Create storage resources](#set-up-the-storage-resources), 107 2. [Install the Pachyderm CLI tool, `pachctl`](#install-pachctl), and 108 3. [Deploy Pachyderm on the Kubernetes cluster](#deploy-pachyderm-on-the-kubernetes-cluster) 109 110 ### Set up the Storage Resources 111 112 Pachyderm needs a [GCS bucket](https://cloud.google.com/storage/docs/) 113 and a [persistent disk](https://cloud.google.com/compute/docs/disks/) 114 to function correctly. You can specify the size of the persistent 115 disk, the bucket name, and create the bucket by running the following 116 commands: 117 118 ```shell 119 # For the persistent disk, 10GB is a good size to start with. 120 # This stores PFS metadata. For reference, 1GB 121 # should work fine for 1000 commits on 1000 files. 122 STORAGE_SIZE=<the size of the volume that you are going to create, in GBs. e.g. "10"> 123 124 # The Pachyderm bucket name needs to be globally unique across the entire GCP region. 125 BUCKET_NAME=<The name of the GCS bucket where your data will be stored> 126 127 # Create the bucket. 128 gsutil mb gs://${BUCKET_NAME} 129 ``` 130 131 To check that everything has been set up correctly, run: 132 133 ```shell 134 gsutil ls 135 # You should see the bucket you created. 136 ``` 137 138 ### Install `pachctl` 139 140 `pachctl` is a command-line utility for interacting with a Pachyderm cluster. You can install it locally as follows: 141 142 ```shell 143 # For macOS: 144 brew tap pachyderm/tap && brew install pachyderm/tap/pachctl@1.11 145 146 # For Linux (64 bit) or Window 10+ on WSL: 147 148 $ curl -o /tmp/pachctl.deb -L https://github.com/pachyderm/pachyderm/releases/download/v{{ config.pach_latest_version }}/pachctl_{{ config.pach_latest_version }}_amd64.deb && sudo dpkg -i /tmp/pachctl.deb 149 ``` 150 151 You can then run `pachctl version --client-only` to check that the installation was successful. 152 153 ```shell 154 pachctl version --client-only 155 {{ config.pach_latest_version }} 156 ``` 157 158 ### Deploy Pachyderm on the Kubernetes cluster 159 160 Now you can deploy a Pachyderm cluster by running this command: 161 162 ```shell 163 pachctl deploy google ${BUCKET_NAME} ${STORAGE_SIZE} --dynamic-etcd-nodes=1 164 ``` 165 166 **System Response:** 167 168 ```shell 169 serviceaccount "pachyderm" created 170 storageclass "etcd-storage-class" created 171 service "etcd-headless" created 172 statefulset "etcd" created 173 service "etcd" created 174 service "pachd" created 175 deployment "pachd" created 176 service "dash" created 177 deployment "dash" created 178 secret "pachyderm-storage-secret" created 179 180 Pachyderm is launching. Check its status with "kubectl get all" 181 Once launched, access the dashboard by running "pachctl port-forward" 182 ``` 183 184 !!! note 185 Pachyderm uses one etcd node to manage Pachyderm metadata. 186 187 !!! note "Important" 188 If RBAC authorization is a requirement or you run into any RBAC 189 errors see [Configure RBAC](rbac.md). 190 191 It may take a few minutes for the pachd nodes to be running because Pachyderm 192 pulls containers from DockerHub. You can see the cluster status with 193 `kubectl`, which should output the following when Pachyderm is up and running: 194 195 ```shell 196 kubectl get pods 197 ``` 198 199 **System Response:** 200 201 ```shell 202 NAME READY STATUS RESTARTS AGE 203 dash-482120938-np8cc 2/2 Running 0 4m 204 etcd-0 1/1 Running 0 4m 205 pachd-3677268306-9sqm0 1/1 Running 0 4m 206 ``` 207 208 If you see a few restarts on the `pachd` pod, you can safely ignore them. 209 That simply means that Kubernetes tried to bring up those containers 210 before other components were ready, so it restarted them. 211 212 Finally, assuming your `pachd` is running as shown above, set up 213 forward a port so that `pachctl` can talk to the cluster. 214 215 ```shell 216 # Forward the ports. We background this process because it blocks. 217 pachctl port-forward & 218 ``` 219 220 And you're done! You can test to make sure the cluster is working 221 by running `pachctl version` or even creating a new repo. 222 223 ```shell 224 pachctl version 225 ``` 226 227 **System Response:** 228 229 ```shell 230 COMPONENT VERSION 231 pachctl {{ config.pach_latest_version }} 232 pachd {{ config.pach_latest_version }} 233 ``` 234 235 ### Increasing Ingress Throughput 236 237 One way to improve Ingress performance is to restrict Pachd to 238 a specific, more powerful node in the cluster. This is 239 accomplished by the use of [node-taints](https://cloud.google.com/kubernetes-engine/docs/how-to/node-taints) 240 in GKE. By creating a node-taint for `pachd`, you configure the 241 Kubernetes scheduler to run only the `pachd` pod on that node. After 242 that’s completed, you can deploy Pachyderm with the `--pachd-cpu-request` 243 and `--pachd-memory-request` set to match the resources limits of the 244 machine type. And finally, you need to modify the `pachd` deployment 245 so that it has an appropriate toleration: 246 247 ```shell 248 tolerations: 249 - key: "dedicated" 250 operator: "Equal" 251 value: "pachd" 252 effect: "NoSchedule" 253 ``` 254 255 ### Increasing upload performance 256 257 The most straightfoward approach to increasing upload performance is 258 to [leverage SSD’s as the boot disk](https://cloud.google.com/kubernetes-engine/docs/how-to/custom-boot-disks) in 259 your cluster because SSDs provide higher throughput and lower latency than 260 HDD disks. Additionally, you can increase the size of the SSD for 261 further performance gains because the number of IOPS increases with 262 disk size. 263 264 ### Increasing merge performance 265 266 Performance tweaks when it comes to merges can be done directly in 267 the [Pachyderm pipeline spec](../../../reference/pipeline_spec/). 268 More specifically, you can increase the number of hashtrees (hashtree spec) 269 in the pipeline spec. This number determines the number of shards for the 270 filesystem metadata. In general this number should be lower than the number 271 of workers (parallelism spec) and should not be increased unless merge time 272 (the time before the job is done and after the number of processed datums + 273 skipped datums is equal to the total datums) is too slow.