github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.10.x/deploy-manage/deploy/azure.md (about) 1 # Azure 2 3 You can deploy Pachyderm in a new or existing Microsoft® Azure® Kubernetes 4 Service environment and use Azure's resource to run your Pachyderm 5 workloads. 6 To deploy Pachyderm to AKS, you need to: 7 8 1. [Install Prerequisites](#install-prerequisites) 9 2. [Deploy Kubernetes](#deploy-kubernetes) 10 3. [Deploy Pachyderm](#deploy-pachyderm) 11 12 ## Install Prerequisites 13 14 Before you can deploy Pachyderm on Azure, you need to configure a few 15 prerequisites on your client machine. If not explicitly specified, use the 16 latest available version of the components listed below. 17 Install the following prerequisites: 18 19 * [Azure CLI 2.0.1 or later](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli) 20 * [jq](https://stedolan.github.io/jq/download/) 21 * [kubectl](https://docs.microsoft.com/cli/azure/aks?view=azure-cli-latest#az_aks_install_cli) 22 * [pachctl](#install-pachctl) 23 24 ### Install `pachctl` 25 26 `pachctl` is a primary command-line utility for interacting with Pachyderm clusters. 27 You can run the tool on Linux®, macOS®, and Microsoft® Windows® 10 or later operating 28 systems and install it by using your favorite command line package manager. 29 This section describes how you can install `pachctl` by using 30 `brew` and `curl`. 31 32 If you are installing `pachctl` on Windows, you need to first install 33 Windows Subsystem (WSL) for Linux. 34 35 To install `pachctl`, complete the following steps: 36 37 * To install on macOS by using `brew`, run the following command: 38 39 ```shell 40 brew tap pachyderm/tap && brew install pachyderm/tap/pachctl@1.10 41 ``` 42 * To install on Linux 64-bit or Windows 10 or later, run the following command: 43 44 ```shell 45 $ curl -o /tmp/pachctl.deb -L https://github.com/pachyderm/pachyderm/releases/download/v1.10.0/pachctl_1.10.0_amd64.deb && sudo dpkg -i /tmp/pachctl.deb 46 ``` 47 48 1. Verify your installation by running `pachctl version`: 49 50 ```shell 51 pachctl version --client-only 52 ``` 53 54 **System Response:** 55 56 ```shell 57 COMPONENT VERSION 58 pachctl 1.9.0 59 ``` 60 61 ## Deploy Kubernetes 62 63 You can deploy Kubernetes on Azure by following the official [Azure Container Service documentation](https://docs.microsoft.com/azure/aks/tutorial-kubernetes-deploy-cluster) or by 64 following the steps in this section. When you deploy Kubernetes on Azure, 65 you need to specify the following parameters: 66 67 <style type="text/css"> 68 .tg {border-collapse:collapse;border-spacing:0;border-color:#ccc;} 69 .tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#fff;} 70 .tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#f0f0f0;} 71 .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top} 72 </style> 73 <table class="tg"> 74 <tr> 75 <th class="tg-0pky">Variable</th> 76 <th class="tg-0pky">Description</th> 77 </tr> 78 <tr> 79 <td class="tg-0pky">RESOURCE_GROUP</td> 80 <td class="tg-0pky">A unique name for the resource group where Pachyderm is deployed. For example, `pach-resource-group`.</td> 81 </tr> 82 <tr> 83 <td class="tg-0pky">LOCATION</td> 84 <td class="tg-0pky">An Azure availability zone where AKS is available. For example, `centralus`.</td> 85 </tr> 86 <tr> 87 <td class="tg-0pky">NODE_SIZE</td> 88 <td class="tg-0pky">The size of the Kubernetes virtual machine (VM) instances. To avoid performance issues, Pachyderm recommends that you 89 set this value to at least `Standard_DS4_v2` which gives you 8 CPUs, 28 Gib of Memory, 56 Gib SSD.</td> 90 </tr> 91 <tr> 92 <td class="tg-0pky">CLUSTER_NAME</td> 93 <td class="tg-0pky">A unique name for the Pachyderm cluster. For example, `pach-aks-cluster`.</td> 94 </tr> 95 </table> 96 97 To deploy Kubernetes on Azure, complete the following steps: 98 99 1. Log in to Azure: 100 101 ```shell 102 az login 103 ``` 104 105 **System Response:** 106 107 ```shell 108 Note, we have launched a browser for you to login. For old experience with 109 device code, use "az login --use-device-code" 110 ``` 111 112 If you have not already logged in this command opens a browser window. Log in with your Azure credentials. 113 After you log in, the following message appears in the command prompt: 114 115 ```shell 116 You have logged in. Now let us find all the subscriptions to which you have access... 117 [ 118 { 119 "cloudName": "AzureCloud", 120 "id": "your_id", 121 "isDefault": true, 122 "name": "Microsoft Azure Sponsorship", 123 "state": "Enabled", 124 "tenantId": "your_tenant_id", 125 "user": { 126 "name": "your_contact_id", 127 "type": "user" 128 } 129 } 130 ] 131 ``` 132 133 1. Create an Azure resource group. 134 135 ```shell 136 az group create --name=${RESOURCE_GROUP} --location=${LOCATION} 137 ``` 138 139 **Example:** 140 141 ```shell 142 az group create --name="test-group" --location=centralus 143 ``` 144 145 **System Response:** 146 147 ```shell 148 { 149 "id": "/subscriptions/6c9f2e1e-0eba-4421-b4cc-172f959ee110/resourceGroups/pach-resource-group", 150 "location": "centralus", 151 "managedBy": null, 152 "name": "pach-resource-group", 153 "properties": { 154 "provisioningState": "Succeeded" 155 }, 156 "tags": null, 157 "type": null 158 } 159 ``` 160 161 1. Create an AKS cluster: 162 163 ```shell 164 az aks create --resource-group ${RESOURCE_GROUP} --name ${CLUSTER_NAME} --generate-ssh-keys --node-vm-size ${NODE_SIZE} 165 ``` 166 167 **Example:** 168 169 ```shell 170 az aks create --resource-group test-group --name test-cluster --generate-ssh-keys --node-vm-size Standard_DS4_v2 171 ``` 172 173 **System Response:** 174 175 ```shell 176 { 177 "aadProfile": null, 178 "addonProfiles": null, 179 "agentPoolProfiles": [ 180 { 181 "availabilityZones": null, 182 "count": 3, 183 "enableAutoScaling": null, 184 "maxCount": null, 185 "maxPods": 110, 186 "minCount": null, 187 "name": "nodepool1", 188 "orchestratorVersion": "1.12.8", 189 "osDiskSizeGb": 100, 190 "osType": "Linux", 191 "provisioningState": "Succeeded", 192 "type": "AvailabilitySet", 193 "vmSize": "Standard_DS4_v2", 194 "vnetSubnetId": null 195 } 196 ], 197 ... 198 ``` 199 200 1. Confirm the version of the Kubernetes server: 201 202 ```shell 203 kubectl version 204 ``` 205 206 **System Response:** 207 208 ```shell 209 Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-03-01T23:36:43Z", GoVersion:"go1.12", Compiler:"gc", Platform:"darwin/amd64"} 210 Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:30:26Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"} 211 ``` 212 213 !!! note "See Also:" 214 - [Azure Virtual Machine sizes](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-general) 215 216 217 ## Add storage resources 218 219 Pachyderm requires you to deploy an object store and a persistent 220 volume in your cloud environment to function correctly. For best 221 results, you need to use faster disk drives, such as *Premium SSD 222 Managed Disks* that are available with the Azure Premium Storage offering. 223 224 You need to specify the following parameters when you create storage 225 resources: 226 227 <style type="text/css"> 228 .tg {border-collapse:collapse;border-spacing:0;border-color:#ccc;} 229 .tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#fff;} 230 .tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#f0f0f0;} 231 .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top} 232 </style> 233 <table class="tg"> 234 <tr> 235 <th class="tg-0pky">Variable</th> 236 <th class="tg-0pky">Description</th> 237 </tr> 238 <tr> 239 <td class="tg-0pky">STORAGE_ACCOUNT</td> 240 <td class="tg-0pky">The name of the storage account where you store your data, unique in the Azure location</td> 241 </tr> 242 <tr> 243 <td class="tg-0pky">CONTAINER_NAME</td> 244 <td class="tg-0pky">The name of the Azure blob container where you store your data</td> 245 </tr> 246 <tr> 247 <td class="tg-0pky">STORAGE_SIZE</td> 248 <td class="tg-0pky">The size of the persistent volume to create in GBs. Allocate at least 10 GB.</td> 249 </tr> 250 </table> 251 252 To create these resources, follow these steps: 253 254 1. Clone the [Pachyderm GitHub repo](https://github.com/pachyderm/pachyderm). 255 1. Change the directory to the root directory of the `pachyderm` repository. 256 1. Create an Azure storage account: 257 258 ```shell 259 az storage account create \ 260 --resource-group="${RESOURCE_GROUP}" \ 261 --location="${LOCATION}" \ 262 --sku=Premium_LRS \ 263 --name="${STORAGE_ACCOUNT}" \ 264 --kind=BlockBlobStorage 265 ``` 266 **System response:** 267 268 ``` 269 { 270 "accessTier": null, 271 "creationTime": "2019-06-20T16:05:55.616832+00:00", 272 "customDomain": null, 273 "enableAzureFilesAadIntegration": null, 274 "enableHttpsTrafficOnly": false, 275 "encryption": { 276 "keySource": "Microsoft.Storage", 277 "keyVaultProperties": null, 278 "services": { 279 "blob": { 280 "enabled": true, 281 ... 282 ``` 283 284 Make sure that you set Stock Keeping Unit (SKU) to `Premium_LRS` 285 and the `kind` parameter is set to `BlockBlobStorage`. This 286 configuration results in a storage that uses SSDs rather than 287 standard Hard Disk Drives (HDD). 288 If you set this parameter to an HDD-based storage option, your Pachyderm 289 cluster will be too slow and might malfunction. 290 291 1. Verify that your storage account has been successfully created: 292 293 ```shell 294 az storage account list 295 ``` 296 297 1. Obtain the key for the storage account (`STORAGE_ACCOUNT`) and the resource group to be used to deploy Pachyderm: 298 299 ```shell 300 STORAGE_KEY="$(az storage account keys list \ 301 --account-name="${STORAGE_ACCOUNT}" \ 302 --resource-group="${RESOURCE_GROUP}" \ 303 --output=json \ 304 | jq '.[0].value' -r 305 )" 306 ``` 307 308 1. Find the generated key in the **Storage accounts > Access keys** 309 section in the Azure Portal or by running the following command: 310 311 ```shell 312 az storage account keys list --account-name=${STORAGE_ACCOUNT} 313 ``` 314 315 **System Response:** 316 317 ```shell 318 [ 319 { 320 "keyName": "key1", 321 "permissions": "Full", 322 "value": "" 323 } 324 ] 325 ``` 326 327 1. Create a new storage container within your storage account: 328 329 ```shell 330 az storage container create --name ${CONTAINER_NAME} \ 331 --account-name ${STORAGE_ACCOUNT} \ 332 --account-key "${STORAGE_KEY}" 333 ``` 334 335 !!! note "See Also:" 336 - [Azure Storage](https://azure.microsoft.com/documentation/articles/storage-introduction/) 337 338 339 ## Deploy Pachyderm 340 341 After you complete all the sections above, you can deploy Pachyderm 342 on Azure. If you have previously tried to run Pachyderm locally, 343 make sure that you are using the right Kubernetes context. Otherwise, 344 you might accidentally deploy your cluster on Minikube. 345 346 1. Verify cluster context: 347 348 ```shell 349 kubectl config current-context 350 ``` 351 352 This command should return the name of your Kubernetes cluster that 353 runs on Azure. 354 355 * If you have a different contents displayed, configure `kubectl` 356 to use your Azure configuration: 357 358 ```shell 359 az aks get-credentials --resource-group ${RESOURCE_GROUP} --name ${CLUSTER_NAME} 360 ``` 361 362 **System Response:** 363 364 ```shell 365 Merged "${CLUSTER_NAME}" as current context in /Users/test-user/.kube/config 366 ``` 367 368 1. Run the following command: 369 370 ```shell 371 pachctl deploy microsoft ${CONTAINER_NAME} ${STORAGE_ACCOUNT} ${STORAGE_KEY} ${STORAGE_SIZE} --dynamic-etcd-nodes 1 372 ``` 373 **Example:** 374 375 ```shell 376 pachctl deploy microsoft test-container teststorage <key> 10 --dynamic-etcd-nodes 1 377 serviceaccount/pachyderm configured 378 clusterrole.rbac.authorization.k8s.io/pachyderm configured 379 clusterrolebinding.rbac.authorization.k8s.io/pachyderm configured 380 service/etcd-headless created 381 statefulset.apps/etcd created 382 service/etcd configured 383 service/pachd configured 384 deployment.apps/pachd configured 385 service/dash configured 386 deployment.apps/dash configured 387 secret/pachyderm-storage-secret configured 388 389 Pachyderm is launching. Check its status with "kubectl get all" 390 Once launched, access the dashboard by running "pachctl port-forward" 391 ``` 392 393 Because Pachyderm pulls containers from DockerHub, it might take some time 394 before the `pachd` pods start. You can check the status of the 395 deployment by periodically running `kubectl get all`. 396 397 1. When pachyderm is up and running, get the information about the pods: 398 399 ```shell 400 kubectl get pods 401 ``` 402 403 **System Response:** 404 405 ```shell 406 NAME READY STATUS RESTARTS AGE 407 dash-482120938-vdlg9 2/2 Running 0 54m 408 etcd-0 1/1 Running 0 54m 409 pachd-1971105989-mjn61 1/1 Running 0 54m 410 ``` 411 412 **Note:** Sometimes Kubernetes tries to start `pachd` nodes before 413 the `etcd` nodes are ready which might result in the `pachd` nodes 414 restarting. You can safely ignore those restarts. 415 416 1. To connect to the cluster from your local machine, such as your laptop, 417 set up port forwarding to enable `pachctl` and cluster communication: 418 419 ```shell 420 pachctl port-forward 421 ``` 422 423 1. Verify that the cluster is up and running: 424 425 ```shell 426 pachctl version 427 ``` 428 429 **System Response:** 430 431 ```shell 432 COMPONENT VERSION 433 pachctl 1.9.0 434 pachd 1.9.0 435 ```