github.com/pachyderm/pachyderm@v1.13.4/doc/docs/master/deploy-manage/deploy/azure.md (about)

     1  # Azure
     2  
     3  You can deploy Pachyderm in a new or existing Microsoft® Azure® Kubernetes
     4  Service environment and use Azure's resource to run your Pachyderm
     5  workloads. 
     6  To deploy Pachyderm to AKS, you need to:
     7  
     8  1. [Install Prerequisites](#install-prerequisites)
     9  2. [Deploy Kubernetes](#deploy-kubernetes)
    10  3. [Deploy Pachyderm](#deploy-pachyderm)
    11  
    12  ## Install Prerequisites
    13  
    14  Before you can deploy Pachyderm on Azure, you need to configure a few
    15  prerequisites on your client machine. If not explicitly specified, use the
    16  latest available version of the components listed below.
    17  Install the following prerequisites:
    18  
    19  * [Azure CLI 2.0.1 or later](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)
    20  * [jq](https://stedolan.github.io/jq/download/)
    21  * [kubectl](https://docs.microsoft.com/cli/azure/aks?view=azure-cli-latest#az_aks_install_cli)
    22  * [pachctl](#install-pachctl)
    23  
    24  ### Install `pachctl`
    25  
    26   `pachctl` is a primary command-line utility for interacting with Pachyderm clusters.
    27   You can run the tool on Linux®, macOS®, and Microsoft® Windows® 10 or later operating
    28   systems and install it by using your favorite command line package manager.
    29   This section describes how you can install `pachctl` by using
    30   `brew` and `curl`.
    31  
    32   If you are installing `pachctl` on Windows, you need to first install
    33   Windows Subsystem (WSL) for Linux.
    34  
    35   To install `pachctl`, complete the following steps:
    36  
    37   * To install on macOS by using `brew`, run the following command:
    38  
    39     ```shell
    40     brew tap pachyderm/tap && brew install pachyderm/tap/pachctl@1.11
    41     ```
    42   * To install on Linux 64-bit or Windows 10 or later, run the following command:
    43  
    44     ```shell
    45     $ curl -o /tmp/pachctl.deb -L https://github.com/pachyderm/pachyderm/releases/download/v{{ config.pach_latest_version }}/pachctl_{{ config.pach_latest_version }}_amd64.deb &&  sudo dpkg -i /tmp/pachctl.deb
    46     ```
    47  
    48   1. Verify your installation by running `pachctl version`:
    49  
    50      ```shell
    51      pachctl version --client-only
    52      ```
    53  
    54      **System Response:**
    55  
    56      ```shell
    57      COMPONENT           VERSION
    58      pachctl             {{ config.pach_latest_version }}
    59      ```
    60  
    61  ## Deploy Kubernetes
    62  
    63  You can deploy Kubernetes on Azure by following the official [Azure Container Service documentation](https://docs.microsoft.com/azure/aks/tutorial-kubernetes-deploy-cluster) or by
    64  following the steps in this section. When you deploy Kubernetes on Azure,
    65  you need to specify the following parameters:
    66  
    67  <style type="text/css">
    68  .tg  {border-collapse:collapse;border-spacing:0;border-color:#ccc;}
    69  .tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#fff;}
    70  .tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#f0f0f0;}
    71  .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
    72  </style>
    73  <table class="tg">
    74    <tr>
    75      <th class="tg-0pky">Variable</th>
    76      <th class="tg-0pky">Description</th>
    77    </tr>
    78    <tr>
    79      <td class="tg-0pky">RESOURCE_GROUP</td>
    80      <td class="tg-0pky">A unique name for the resource group where Pachyderm is deployed. For example, `pach-resource-group`.</td>
    81    </tr>
    82    <tr>
    83      <td class="tg-0pky">LOCATION</td>
    84      <td class="tg-0pky">An Azure availability zone where AKS is available. For example, `centralus`.</td>
    85    </tr>
    86    <tr>
    87      <td class="tg-0pky">NODE_SIZE</td>
    88      <td class="tg-0pky">The size of the Kubernetes virtual machine (VM) instances. To avoid performance issues, Pachyderm recommends that you
    89      set this value to at least `Standard_DS4_v2` which gives you 8 CPUs, 28 Gib of Memory, 56 Gib SSD.</td>
    90    </tr>
    91    <tr>
    92      <td class="tg-0pky">CLUSTER_NAME</td>
    93      <td class="tg-0pky">A unique name for the Pachyderm cluster. For example, `pach-aks-cluster`.</td>
    94    </tr>
    95  </table>
    96  
    97  To deploy Kubernetes on Azure, complete the following steps:
    98  
    99  1. Log in to Azure:
   100  
   101     ```shell
   102     az login
   103     ```
   104  
   105     **System Response:**
   106  
   107     ```shell
   108     Note, we have launched a browser for you to login. For old experience with
   109     device code, use "az login --use-device-code"
   110     ```
   111  
   112     If you have not already logged in this command opens a browser window. Log in with your Azure credentials.
   113     After you log in, the following message appears in the command prompt:
   114  
   115     ```shell
   116     You have logged in. Now let us find all the subscriptions to which you have access...
   117     [
   118       {
   119         "cloudName": "AzureCloud",
   120         "id": "your_id",
   121         "isDefault": true,
   122         "name": "Microsoft Azure Sponsorship",
   123         "state": "Enabled",
   124         "tenantId": "your_tenant_id",
   125         "user": {
   126           "name": "your_contact_id",
   127           "type": "user"
   128         }
   129       }
   130     ]
   131     ```
   132  
   133  1. Create an Azure resource group.
   134  
   135     ```shell
   136     az group create --name=${RESOURCE_GROUP} --location=${LOCATION}
   137     ```
   138  
   139     **Example:**
   140  
   141     ```shell
   142     az group create --name="test-group" --location=centralus
   143     ```
   144  
   145     **System Response:**
   146  
   147     ```shell
   148     {
   149       "id": "/subscriptions/6c9f2e1e-0eba-4421-b4cc-172f959ee110/resourceGroups/pach-resource-group",
   150       "location": "centralus",
   151       "managedBy": null,
   152       "name": "pach-resource-group",
   153       "properties": {
   154         "provisioningState": "Succeeded"
   155       },
   156       "tags": null,
   157       "type": null
   158     }
   159     ```
   160  
   161  1. Create an AKS cluster:
   162  
   163     ```shell
   164     az aks create --resource-group ${RESOURCE_GROUP} --name ${CLUSTER_NAME} --generate-ssh-keys --node-vm-size ${NODE_SIZE}
   165     ```
   166  
   167     **Example:**
   168  
   169     ```shell
   170     az aks create --resource-group test-group --name test-cluster --generate-ssh-keys --node-vm-size Standard_DS4_v2
   171     ```
   172  
   173     **System Response:**
   174  
   175     ```shell
   176     {
   177       "aadProfile": null,
   178       "addonProfiles": null,
   179       "agentPoolProfiles": [
   180         {
   181           "availabilityZones": null,
   182           "count": 3,
   183           "enableAutoScaling": null,
   184           "maxCount": null,
   185           "maxPods": 110,
   186           "minCount": null,
   187           "name": "nodepool1",
   188           "orchestratorVersion": "1.12.8",
   189           "osDiskSizeGb": 100,
   190           "osType": "Linux",
   191           "provisioningState": "Succeeded",
   192           "type": "AvailabilitySet",
   193           "vmSize": "Standard_DS4_v2",
   194           "vnetSubnetId": null
   195         }
   196       ],
   197     ...
   198     ```
   199  
   200  1. Confirm the version of the Kubernetes server:
   201  
   202     ```shell
   203     kubectl version
   204     ```
   205  
   206     **System Response:**
   207  
   208     ```shell
   209     Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-03-01T23:36:43Z", GoVersion:"go1.12", Compiler:"gc", Platform:"darwin/amd64"}
   210     Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:30:26Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
   211     ```
   212  
   213  !!! note "See Also:"
   214      - [Azure Virtual Machine sizes](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-general)
   215  
   216  
   217  ## Add storage resources
   218  
   219  Pachyderm requires you to deploy an object store and a persistent
   220  volume in your cloud environment to function correctly. For best
   221  results, you need to use faster disk drives, such as *Premium SSD
   222  Managed Disks* that are available with the Azure Premium Storage offering.
   223  
   224  You need to specify the following parameters when you create storage
   225  resources:
   226  
   227  <style type="text/css">
   228  .tg  {border-collapse:collapse;border-spacing:0;border-color:#ccc;}
   229  .tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#fff;}
   230  .tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#f0f0f0;}
   231  .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
   232  </style>
   233  <table class="tg">
   234    <tr>
   235      <th class="tg-0pky">Variable</th>
   236      <th class="tg-0pky">Description</th>
   237    </tr>
   238    <tr>
   239      <td class="tg-0pky">STORAGE_ACCOUNT</td>
   240      <td class="tg-0pky">The name of the storage account where you store your data, unique in the Azure location</td>
   241    </tr>
   242    <tr>
   243      <td class="tg-0pky">CONTAINER_NAME</td>
   244      <td class="tg-0pky">The name of the Azure blob container where you store your data</td>
   245    </tr>
   246    <tr>
   247      <td class="tg-0pky">STORAGE_SIZE</td>
   248      <td class="tg-0pky">The size of the persistent volume to create in GBs. Allocate at least 10 GB.</td>
   249    </tr>
   250  </table>
   251  
   252  To create these resources, follow these steps:
   253  
   254  1. Clone the [Pachyderm GitHub repo](https://github.com/pachyderm/pachyderm).
   255  1. Change the directory to the root directory of the `pachyderm` repository.
   256  1. Create an Azure storage account:
   257  
   258     ```shell
   259     az storage account create \
   260       --resource-group="${RESOURCE_GROUP}" \
   261       --location="${LOCATION}" \
   262       --sku=Premium_LRS \
   263       --name="${STORAGE_ACCOUNT}" \
   264       --kind=BlockBlobStorage
   265     ```
   266     **System response:**
   267  
   268     ```
   269     {
   270       "accessTier": null,
   271       "creationTime": "2019-06-20T16:05:55.616832+00:00",
   272       "customDomain": null,
   273       "enableAzureFilesAadIntegration": null,
   274       "enableHttpsTrafficOnly": false,
   275       "encryption": {
   276         "keySource": "Microsoft.Storage",
   277         "keyVaultProperties": null,
   278         "services": {
   279           "blob": {
   280             "enabled": true,
   281       ...
   282     ```
   283  
   284     Make sure that you set Stock Keeping Unit (SKU) to `Premium_LRS`
   285     and the `kind` parameter is set to `BlockBlobStorage`. This
   286     configuration results in a storage that uses SSDs rather than
   287     standard Hard Disk Drives (HDD).
   288     If you set this parameter to an HDD-based storage option, your Pachyderm
   289     cluster will be too slow and might malfunction.
   290  
   291  1. Verify that your storage account has been successfully created:
   292  
   293     ```shell
   294     az storage account list
   295     ```
   296  
   297  1. Obtain the key for the storage account (`STORAGE_ACCOUNT`) and the resource group to be used to deploy Pachyderm:
   298  
   299     ```shell
   300     STORAGE_KEY="$(az storage account keys list \
   301                   --account-name="${STORAGE_ACCOUNT}" \
   302                   --resource-group="${RESOURCE_GROUP}" \
   303                   --output=json \
   304                   | jq '.[0].value' -r
   305                )"
   306     ```
   307  
   308  1. Find the generated key in the **Storage accounts > Access keys**
   309     section in the Azure Portal or by running the following command:
   310  
   311     ```shell
   312     az storage account keys list --account-name=${STORAGE_ACCOUNT}
   313     ```
   314  
   315     **System Response:**
   316  
   317     ```shell
   318     [
   319       {
   320         "keyName": "key1",
   321         "permissions": "Full",
   322         "value": ""
   323       }
   324     ]
   325     ```
   326  
   327  1. Create a new storage container within your storage account:
   328  
   329     ```shell
   330     az storage container create --name ${CONTAINER_NAME} \
   331               --account-name ${STORAGE_ACCOUNT} \
   332               --account-key "${STORAGE_KEY}"
   333     ```
   334  
   335  !!! note "See Also:"
   336      - [Azure Storage](https://azure.microsoft.com/documentation/articles/storage-introduction/)
   337  
   338  
   339  ## Deploy Pachyderm
   340  
   341  After you complete all the sections above, you can deploy Pachyderm
   342  on Azure. If you have previously tried to run Pachyderm locally,
   343  make sure that you are using the right Kubernetes context. Otherwise,
   344  you might accidentally deploy your cluster on Minikube.
   345  
   346  1. Verify cluster context:
   347  
   348     ```shell
   349     kubectl config current-context
   350     ```
   351  
   352     This command should return the name of your Kubernetes cluster that
   353     runs on Azure.
   354  
   355     * If you have a different contents displayed, configure `kubectl`
   356     to use your Azure configuration:
   357  
   358     ```shell
   359     az aks get-credentials --resource-group ${RESOURCE_GROUP} --name ${CLUSTER_NAME}
   360     ```
   361  
   362     **System Response:**
   363  
   364     ```shell
   365     Merged "${CLUSTER_NAME}" as current context in /Users/test-user/.kube/config
   366     ```
   367  
   368  1. Run the following command:
   369  
   370     ```shell
   371     pachctl deploy microsoft ${CONTAINER_NAME} ${STORAGE_ACCOUNT} ${STORAGE_KEY} ${STORAGE_SIZE} --dynamic-etcd-nodes 1
   372     ```
   373     **Example:**
   374  
   375     ```shell
   376     pachctl deploy microsoft test-container teststorage <key> 10 --dynamic-etcd-nodes 1
   377     serviceaccount/pachyderm configured
   378     clusterrole.rbac.authorization.k8s.io/pachyderm configured
   379     clusterrolebinding.rbac.authorization.k8s.io/pachyderm configured
   380     service/etcd-headless created
   381     statefulset.apps/etcd created
   382     service/etcd configured
   383     service/pachd configured
   384     deployment.apps/pachd configured
   385     service/dash configured
   386     deployment.apps/dash configured
   387     secret/pachyderm-storage-secret configured
   388  
   389     Pachyderm is launching. Check its status with "kubectl get all"
   390     Once launched, access the dashboard by running "pachctl port-forward"
   391     ```
   392  
   393     Because Pachyderm pulls containers from DockerHub, it might take some time
   394     before the `pachd` pods start. You can check the status of the
   395     deployment by periodically running `kubectl get all`.
   396  
   397  1. When pachyderm is up and running, get the information about the pods:
   398  
   399     ```shell
   400     kubectl get pods
   401     ```
   402  
   403     **System Response:**
   404  
   405     ```shell
   406     NAME                      READY     STATUS    RESTARTS   AGE
   407     dash-482120938-vdlg9      2/2       Running   0          54m
   408     etcd-0                    1/1       Running   0          54m
   409     pachd-1971105989-mjn61    1/1       Running   0          54m
   410     ```
   411  
   412     **Note:** Sometimes Kubernetes tries to start `pachd` nodes before
   413     the `etcd` nodes are ready which might result in the `pachd` nodes
   414     restarting. You can safely ignore those restarts.
   415  
   416  1. To connect to the cluster from your local machine, such as your laptop,
   417  set up port forwarding to enable `pachctl` and cluster communication:
   418  
   419     ```shell
   420     pachctl port-forward
   421     ```
   422  
   423  1. Verify that the cluster is up and running:
   424  
   425     ```shell
   426     pachctl version
   427     ```
   428  
   429     **System Response:**
   430  
   431     ```shell
   432     COMPONENT           VERSION
   433     pachctl             {{ config.pach_latest_version }}
   434     pachd               {{ config.pach_latest_version }}
   435     ```