github.com/pachyderm/pachyderm@v1.13.4/doc/docs/master/deploy-manage/manage/migrations.md

github.com/pachyderm/pachyderm@v1.13.4/doc/docs/master/deploy-manage/manage/migrations.md (about)

     1  # Migrate to a Minor or Major Version
     2  
     3  !!! info
     4      If you need to upgrade Pachyderm from one patch
     5      to another, such as from x.xx.0 to x.xx.1, see
     6      [Upgrade Pachyderm](upgrades.md).
     7  
     8  As new versions of Pachyderm are released, you might need to update your
     9  cluster to get access to bug fixes and new features.
    10  
    11  Migrations involve moving between major releases, such as 1.x.x to
    12  2.x.x or minor releases, such as 1.11.x to 1.12.0.
    13  
    14  !!! tip
    15      Pachyderm follows the [Semantic Versioning](https://semver.org/)
    16      specification to manage the release process.
    17  
    18  Pachyderm stores all of its states in the following places:
    19  
    20  * In `etcd` which in turn stores its state in one or more persistent volumes,
    21  which were created when the Pachyderm cluster was deployed. `etcd` stores
    22  metadata about your pipelines, repositories, and other Pachyderm primitives.
    23  
    24  * In an object store bucket, such as AWS S3, MinIO, or Azure Blob Storage.
    25  Actual data is stored here.
    26  
    27  In a migration, the data structures stored in those locations need to be
    28  read, transformed, and rewritten. Therefore, this process involves the
    29  following steps:
    30  
    31  1. Back up your cluster by exporting the existing Pachyderm cluster's repos,
    32  pipelines, and input commits to a backup file and optionally to an S3 bucket.
    33  1. Bring up a new Pachyderm cluster adjacent to the old pachyderm cluster either
    34  in a separate namespace or in a separate Kubernetes cluster.
    35  1. Restore the old cluster's repos, commits, and pipelines into the new
    36     cluster.
    37  
    38  !!! warning
    39      Whether you are upgrading or migrating your cluster, you must back it up
    40      to guarantee that you can restore it after migration.
    41  
    42  ## Step 1 - Back up Your Cluster
    43  
    44  Before migrating your cluster, create a backup that you can use to restore your
    45  cluster from. For large amounts of data that are stored in an S3 object store,
    46  we recommend that you use the cloud provider capabilities to copy your data
    47  into a new bucket while backing up information about Pachyderm object to a
    48  local file. For smaller deployments, you can copy everything into a local
    49  file and then restore from that file.
    50  
    51  To back up your cluster, complete the following steps:
    52  
    53  1. Back up your cluster by running the `pachctl export` command with the
    54  `--no-object` flag as described in [Back up Your Cluster](../backup_restore/).
    55  
    56  1. In your cloud provider, create a new S3 bucket with the same Permissions
    57  policy that you assigned to the original cluster bucket. For example,
    58  if your cluster is on EKS, create the same Permissions policy as described
    59  in [Deploy Pachyderm with an IAM Role](../../deploy/amazon_web_services/aws-deploy-pachyderm/#deploy-pachyderm-with-an-iam-role).
    60  
    61  1. Clone your S3 bucket that you used for the olf cluster to this new bucket.
    62     Follow the instructions for your cloud provider:
    63  
    64     * If you use Google cloud, see the [gsutil instructions](https://cloud.google.com/storage/docs/gsutil/commands/cp).
    65     * If you use Microsoft Azure, see the [azcopy instructions](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-linux?toc=%2fazure%2fstorage%2ffiles%2ftoc.json).
    66     * If you use Amazon EKS, see [AWS CLI instructions](https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html).
    67  
    68     **Example:**
    69  
    70     ```shell
    71     aws s3 sync s3://mybucket s3://mybucket2
    72     ```
    73  
    74  1. Proceed to [Step 2](#step-2-restore-all-paused-pipelines).
    75  
    76  ## Step 2 - Restore All Paused Pipelines
    77  
    78  If you want to minimize downtime and run your pipeline while you are migrating
    79  your cluster, you can restart all paused pipelines and data loading operations
    80  after the backup and clone operations are complete.
    81  
    82  To restore all paused pipelines, complete the following steps:
    83  
    84  1. Run the `pachctl start pipeline` command on each paused pipeline one-by-one, or
    85     use the multi-line shell script to restart pipelines all-at-once:
    86  
    87  === "one-by-one"
    88      ```shell
    89      pachctl start pipeline <pipeline-name>
    90      ```
    91  
    92  === "all-at-once"
    93      ```shell
    94      pachctl list pipeline --raw \
    95      | jq -r '.pipeline.name' \
    96      | xargs -P3 -n1 -I{} pachctl start pipeline {}
    97      ```
    98  
    99     You might need to install `jq` and other utilities to run the script.
   100  
   101  1. Confirm that each pipeline is started using the `list pipeline` command:
   102  
   103     ```shell
   104     pachctl list pipeline
   105     ```
   106  
   107     * If you have switched the ports to stop data loading from outside sources,
   108     change the ports back:
   109  
   110       1. Back up the current configuration:
   111  
   112          ```shell
   113          kubectl get svc/pachd -o json >pachd_service_backup_30649.json
   114          kubectl get svc/etcd -o json >etcd_svc_backup_32379.json
   115          kubectl get svc/dash -o json >dash_svc_backup_30080.json
   116          ```
   117  
   118       1. Modify the services to accept traffic on the corresponding ports to
   119       avoid collisions with the migration cluster:
   120  
   121          ```shell
   122          # Modify the pachd API endpoint to run on 30650:
   123          kubectl get svc/pachd -o json | sed 's/30649/30650/g' | kubectl apply -f -
   124          # Modify the pachd trace port to run on 30651:
   125          kubectl get svc/pachd -o json | sed 's/30648/30651/g' | kubectl apply -f -
   126          # Modify the pachd api-over-http port to run on 30652:
   127          kubectl get svc/pachd -o json | sed 's/30647/30652/g' | kubectl apply -f -
   128          # Modify the pachd SAML authentication port to run on 30654:
   129          kubectl get svc/pachd -o json | sed 's/30646/30654/g' | kubectl apply -f -
   130          # Modify the pachd git API callback port to run on 30655:
   131          kubectl get svc/pachd -o json | sed 's/30644/30655/g' | kubectl apply -f -
   132          # Modify the pachd s3 port to run on 30600:
   133          kubectl get svc/pachd -o json | sed 's/30611/30600/g' | kubectl apply -f -
   134          # Modify the etcd client port to run on 32378:
   135          kubectl get svc/etcd -o json | sed 's/32378/32379/g' | kubectl apply -f -
   136          # Modify the dashboard ports to run on 30081 and 30080:
   137          kubectl get svc/dash -o json | sed 's/30079/30080/g' | kubectl apply -f -
   138          kubectl get svc/dash -o json | sed 's/30078/30081/g' | kubectl apply -f -
   139          ```
   140  
   141  1. Modify your environment so that you can access `pachd` on the old port:
   142  
   143     ```shell
   144     pachctl config update context `pachctl config get active-context` --pachd-address=<cluster ip>:30650
   145     ```
   146  
   147  1. Verify that you can access `pachd`:
   148  
   149     ```shell
   150     pachctl version
   151     ```
   152  
   153  ```shell
   154  pachctl config update context `pachctl config get active-context` --pachd-address=<cluster ip>:30650
   155  ```
   156  
   157  1. Verify that you can access `pachd`:
   158  
   159  ```shell
   160  pachctl version
   161  ```
   162  
   163     **System Response:**
   164  
   165     ```
   166     COMPONENT           VERSION
   167     pachctl             {{ config.pach_latest_version }}
   168     pachd               {{ config.pach_latest_version }}
   169     ```
   170  
   171     If the command above hangs, you might need to adjust your firewall rules.
   172     Your old Pachyderm cluster can operate while you are creating a migrated
   173     one.
   174  
   175  1. Proceed to [Step 3](#step-3-deploy-a-pachyderm-cluster-with-the-cloned-bucket).
   176  
   177  ## Step 3 - Deploy a Pachyderm Cluster with the Cloned Bucket
   178  
   179  After you create a backup of your existing cluster, you need to create a new
   180  Pachyderm cluster by using the bucket you cloned in [Step 1](#step-1-back-up-your-cluster).
   181  
   182  This new cluster can be deployed:
   183  
   184  * On the same Kubernetes cluster in a separate namespace.
   185  * On a different Kubernetes cluster within the same cloud provider.
   186  
   187  If you are deploying in a namespace on the same Kubernetes cluster,
   188  you might need to modify Kubernetes ingress to Pachyderm deployment in the
   189  new namespace to avoid port conflicts in the same cluster.
   190  Consult with your Kubernetes administrator for information on avoiding
   191  ingress conflicts.
   192  
   193  If you have issues with the extracted data, rerun instructions in
   194  [Step 1](#step-1-back-up-your-cluster).
   195  
   196  To deploy a Pachyderm cluster with a cloned bucket, complete the following
   197  steps:
   198  
   199  1. Upgrade your Pachyderm version to the latest version:
   200  
   201     ```shell
   202     brew upgrade pachyderm/tap/pachctl@1.11
   203     ```
   204  
   205  ```shell
   206  brew upgrade pachyderm/tap/pachctl@1.11
   207  ```
   208  
   209  * If you are deploying your cluster in a separate Kubernetes namespace,
   210   create a new namespace:
   211  
   212    ```shell
   213    kubectl create namespace <new-cluster-namespace>
   214    ```
   215  
   216  ```shell
   217  kubectl create namespace <new-cluster-namespace>
   218  ```
   219  
   220  1. Deploy your cluster in a separate namespace or on a separate Kubernetes
   221  cluster by using a `pachctl deploy` command for your cloud provider with the
   222  `--namespace` flag.
   223  
   224  **Examples:**
   225  
   226  === "AWS EKS"
   227      ```shell
   228      pachctl deploy amazon <bucket-name> <region> <storage-size> --dynamic-etcd-nodes=<number> --iam-role <iam-role> --namespace=<namespace-name>
   229      ```
   230  
   231  === "GKE"
   232      ```shell
   233      pachctl deploy google <bucket-name> <storage-size> --dynamic-etcd-nodes=1  --namespace=<namespace-name>
   234      ```
   235  
   236  === "Azure"
   237      ```shell
   238      pachctl deploy microsoft <account-name> <storage-account> <storage-key> <storage-size> --dynamic-etcd-nodes=<number> --namespace=<namespace-name>
   239      ```
   240  
   241  **Note:** Parameters for your Pachyderm cluster deployment might be different.
   242  For more information, see [Deploy Pachyderm](../../deploy/).
   243  
   244  1. Verify that your cluster has been deployed:
   245  
   246  === "In a namespace"
   247      ```shell
   248      kubectl get pod --namespace=<new-cluster>
   249      ```
   250  
   251  === "On a cluster"
   252      ```shell
   253      kubectl get pod
   254      ```
   255  
   256  * If you have deployed your new cluster in a namespace, Pachyderm should
   257  have created a new context for this deployement. Verify that you are
   258  using this.
   259  
   260  1. Proceed to [Step 4](#step-4-restore-your-cluster).
   261  
   262  ## Step 4 - Restore your Cluster
   263  
   264  After you have created a new cluster, you can restore your backup to this
   265  new cluster. If you have deployed your new cluster in a namespace, Pachyderm
   266  should have created a new context for this deployment. You need to switch to
   267  this new context to access the correct cluster. Before you run the
   268  `pachctl restore` command, your new cluster should be empty.
   269  
   270  To restore your cluster, complete the following steps:
   271  
   272  * If you deployed your new cluster into a different namespace on the same
   273  Kubernetes cluster as your old cluster, verify that you on the correct namespace:
   274  
   275    ```shell
   276    $ pachctl config get context `pachctl config get active-context`
   277    ```
   278  
   279    **Example System Response:**
   280  
   281    ``` hl_lines="5"
   282    {
   283      "source": "IMPORTED",
   284      "cluster_name": "test-migration.us-east-1.eksctl.io",
   285      "auth_info": "user@test-migration.us-east-1.eksctl.io",
   286      "namespace": "new-cluster"
   287    }
   288    ```
   289  
   290    Your active context must have the namespace you have deployed your new
   291    cluster into.
   292  
   293  1. Check that the cluster does not have any existing Pachyderm objects:
   294  
   295     ```shell
   296     pachctl list repo & pachctl list pipeline
   297     ```
   298  
   299     You should get empty output.
   300  
   301  1. Restore your cluster from the backup you have created in
   302  [Step 1](#step-1-back-up-your-cluster):
   303  
   304  === "Local File"
   305      ```shell
   306      pachctl restore < path/to/your/backup/file
   307      ```
   308  
   309  === "S3 Bucket"
   310      ```shell
   311      pachctl restore --url s3://path/to/backup
   312      ```
   313  
   314  This S3 bucket is different from the s3 bucket to which you cloned
   315  your Pachyderm data. This is merely a bucket you allocated to hold
   316  the Pachyderm backup without objects.
   317  
   318  1. Configure any external data loading systems to point at the new,
   319  upgraded Pachyderm cluster and play back transactions from the checkpoint
   320  established at [Pause External Data Operations](./backup-migrations/#pause-external-data-loading-operations).
   321  Perform any reconfiguration to data loading or unloading operations.
   322  Confirm that the data output is as expected and the new cluster is operating as expected.
   323  
   324  1. Disable the old cluster:
   325  
   326  * If you have deployed the new cluster on the same Kuberenetes cluster
   327  switch to the old cluster's Pachyderm context:
   328  
   329     ```shell
   330     pachctl config set active-context <old-context>
   331     ```
   332  
   333     * If you have deployed the new cluster to a different Kubernetes cluster,
   334     switch to the old cluster's Kubernetes context:
   335  
   336     ```shell
   337     kubectl config use-context <old cluster>
   338     ```
   339  
   340     1. Undeploy your old cluster:
   341  
   342        ```pachctl
   343        pachctl undeploy
   344        ```
   345  
   346  1. Reconfigure new cluster as necessary
   347     You may need to reconfigure the following:
   348  
   349     - Data loading operations from Pachyderm to processes outside
   350     of it to work as expected.
   351     - Kubernetes ingress and port changes taken to avoid conflicts
   352     with the old cluster.