github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.11.x/deploy-manage/manage/migrations.md

github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.11.x/deploy-manage/manage/migrations.md (about)

     1  # Migrate to a Minor or Major Version
     2  
     3  !!! info
     4      If you need to upgrade Pachyderm from one patch
     5      to another, such as from x.xx.0 to x.xx.1, see
     6      [Upgrade Pachyderm](upgrades.md).
     7  
     8  As new versions of Pachyderm are released, you might need to update your
     9  cluster to get access to bug fixes and new features.
    10  
    11  Migrations involve moving between major releases, such as 1.x.x to
    12  2.x.x or minor releases, such as 1.10.x to {{ config.pach_latest_version }}.
    13  
    14  !!! tip
    15      Pachyderm follows the [Semantic Versioning](https://semver.org/)
    16      specification to manage the release process.
    17  
    18  Pachyderm stores all of its states in the following places:
    19  
    20  * In `etcd` which in turn stores its state in one or more persistent volumes,
    21  which were created when the Pachyderm cluster was deployed. `etcd` stores
    22  metadata about your pipelines, repositories, and other Pachyderm primitives.
    23  
    24  * In an object store bucket, such as AWS S3, MinIO, or Azure Blob Storage.
    25  Actual data is stored here.
    26  
    27  In a migration, the data structures stored in those locations need to be
    28  read, transformed, and rewritten. Therefore, this process involves the
    29  following steps:
    30  
    31  1. Back up your cluster by exporting the existing Pachyderm cluster's repos,
    32  pipelines, and input commits to a backup file and optionally to an S3 bucket.
    33  1. Bring up a new Pachyderm cluster adjacent to the old pachyderm cluster either
    34  in a separate namespace or in a separate Kubernetes cluster.
    35  1. Restore the old cluster's repos, commits, and pipelines into the new
    36     cluster.
    37  
    38  !!! warning
    39      Whether you are upgrading or migrating your cluster, you must back it up
    40      to guarantee that you can restore it after migration.
    41  
    42  ## Step 1 - Back up Your Cluster
    43  
    44  Before migrating your cluster, create a backup that you can use to restore your
    45  cluster from. For large amounts of data that are stored in an S3 object store,
    46  we recommend that you use the cloud provider capabilities to copy your data
    47  into a new bucket while backing up information about Pachyderm object to a
    48  local file. For smaller deployments, you can copy everything into a local
    49  file and then restore from that file.
    50  
    51  To back up your cluster, complete the following steps:
    52  
    53  1. Back up your cluster by running the `pachctl export` command with the
    54  `--no-object` flag as described in [Back up Your Cluster](../backup_restore/).
    55  
    56  1. In your cloud provider, create a new S3 bucket with the same Permissions
    57  policy that you assigned to the original cluster bucket. For example,
    58  if your cluster is on EKS, create the same Permissions policy as described
    59  in [Deploy Pachyderm with an IAM Role](../../deploy/amazon_web_services/aws-deploy-pachyderm/#deploy-pachyderm-with-an-iam-role).
    60  
    61  1. Clone your S3 bucket that you used for the olf cluster to this new bucket.
    62     Follow the instructions for your cloud provider:
    63  
    64     * If you use Google cloud, see the [gsutil instructions](https://cloud.google.com/storage/docs/gsutil/commands/cp).
    65     * If you use Microsoft Azure, see the [azcopy instructions](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-linux?toc=%2fazure%2fstorage%2ffiles%2ftoc.json).
    66     * If you use Amazon EKS, see [AWS CLI instructions](https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html).
    67  
    68     **Example:**
    69  
    70     ```shell
    71     aws s3 sync s3://mybucket s3://mybucket2
    72     ```
    73  
    74  1. Proceed to [Step 2](#step-2-restore-all-paused-pipelines).
    75  
    76  ## Step 2 - Restore All Paused Pipelines
    77  
    78  If you want to minimize downtime and run your pipeline while you are migrating
    79  your cluster, you can restart all paused pipelines and data loading operations
    80  after the backup and clone operations are complete.
    81  
    82  To restore all paused pipelines, complete the following steps:
    83  
    84  1. Run the `pachctl start pipeline` command on each paused pipeline or
    85     use the multi-line shell script to restart all pipelines at once:
    86  
    87     ```pachctl tab="Command"
    88     pachctl start pipeline <pipeline-name>
    89     ```
    90  
    91     ```shell tab="Script"
    92     pachctl list pipeline --raw \
    93     | jq -r '.pipeline.name' \
    94     | xargs -P3 -n1 -I{} pachctl start pipeline {}
    95     ```
    96  
    97     You might need to install `jq` and other utilities to run the script.
    98  
    99  1. Confirm that each pipeline is started using the `list pipeline` command:
   100  
   101     ```shell
   102     pachctl list pipeline
   103     ```
   104  
   105     * If you have switched the ports to stop data loading from outside sources,
   106     change the ports back:
   107  
   108       1. Back up the current configuration:
   109  
   110          ```shell
   111          kubectl get svc/pachd -o json >pachd_service_backup_30649.json
   112          kubectl get svc/etcd -o json >etcd_svc_backup_32379.json
   113          kubectl get svc/dash -o json >dash_svc_backup_30080.json
   114          ```
   115  
   116       1. Modify the services to accept traffic on the corresponding ports to
   117       avoid collisions with the migration cluster:
   118  
   119          ```shell
   120          # Modify the pachd API endpoint to run on 30650:
   121          kubectl get svc/pachd -o json | sed 's/30649/30650/g' | kubectl apply -f -
   122          # Modify the pachd trace port to run on 30651:
   123          kubectl get svc/pachd -o json | sed 's/30648/30651/g' | kubectl apply -f -
   124          # Modify the pachd api-over-http port to run on 30652:
   125          kubectl get svc/pachd -o json | sed 's/30647/30652/g' | kubectl apply -f -
   126          # Modify the pachd SAML authentication port to run on 30654:
   127          kubectl get svc/pachd -o json | sed 's/30646/30654/g' | kubectl apply -f -
   128          # Modify the pachd git API callback port to run on 30655:
   129          kubectl get svc/pachd -o json | sed 's/30644/30655/g' | kubectl apply -f -
   130          # Modify the pachd s3 port to run on 30600:
   131          kubectl get svc/pachd -o json | sed 's/30611/30600/g' | kubectl apply -f -
   132          # Modify the etcd client port to run on 32378:
   133          kubectl get svc/etcd -o json | sed 's/32378/32379/g' | kubectl apply -f -
   134          # Modify the dashboard ports to run on 30081 and 30080:
   135          kubectl get svc/dash -o json | sed 's/30079/30080/g' | kubectl apply -f -
   136          kubectl get svc/dash -o json | sed 's/30078/30081/g' | kubectl apply -f -
   137          ```
   138  
   139  1. Modify your environment so that you can access `pachd` on the old port:
   140  
   141     ```shell
   142     pachctl config update context `pachctl config get active-context` --pachd-address=<cluster ip>:30650
   143     ```
   144  
   145  1. Verify that you can access `pachd`:
   146  
   147     ```shell
   148     pachctl version
   149     ```
   150  
   151     **System Response:**
   152  
   153     ```
   154     COMPONENT           VERSION
   155     pachctl             {{ config.pach_latest_version }}
   156     pachd               {{ config.pach_latest_version }}
   157     ```
   158  
   159     If the command above hangs, you might need to adjust your firewall rules.
   160     Your old Pachyderm cluster can operate while you are creating a migrated
   161     one.
   162  
   163  1. Proceed to [Step 3](#step-3-deploy-a-pachyderm-cluster-with-the-cloned-bucket).
   164  
   165  ## Step 3 - Deploy a Pachyderm Cluster with the Cloned Bucket
   166  
   167  After you create a backup of your existing cluster, you need to create a new
   168  Pachyderm cluster by using the bucket you cloned in [Step 1](#step-1-back-up-your-cluster).
   169  
   170  This new cluster can be deployed:
   171  
   172  * On the same Kubernetes cluster in a separate namespace.
   173  * On a different Kubernetes cluster within the same cloud provider.
   174  
   175  If you are deploying in a namespace on the same Kubernetes cluster,
   176  you might need to modify Kubernetes ingress to Pachyderm deployment in the
   177  new namespace to avoid port conflicts in the same cluster.
   178  Consult with your Kubernetes administrator for information on avoiding
   179  ingress conflicts.
   180  
   181  If you have issues with the extracted data, rerun instructions in
   182  [Step 1](#step-1-back-up-your-cluster).
   183  
   184  To deploy a Pachyderm cluster with a cloned bucket, complete the following
   185  steps:
   186  
   187  1. Upgrade your Pachyderm version to the latest version:
   188  
   189     ```shell
   190     brew upgrade pachyderm/tap/pachctl@1.11
   191     ```
   192  
   193     * If you are deploying your cluster in a separate Kubernetes namespace,
   194     create a new namespace:
   195  
   196    ```shell
   197    kubectl create namespace <new-cluster-namespace>
   198    ```
   199  
   200  1. Deploy your cluster in a separate namespace or on a separate Kubernetes
   201  cluster by using a `pachctl deploy` command for your cloud provider with the
   202  `--namespace` flag.
   203  
   204     **Examples:**
   205  
   206     ```pachctl tab="AWS EKS"
   207     pachctl deploy amazon <bucket-name> <region> <storage-size> --dynamic-etcd-nodes=<number> --iam-role <iam-role> --namespace=<namespace-name>
   208     ```
   209  
   210     ```shell tab="GKE"
   211     pachctl deploy google <bucket-name> <storage-size> --dynamic-etcd-nodes=1  --namespace=<namespace-name>
   212     ```
   213  
   214     ```shell tab="Azure"
   215     pachctl deploy microsoft <account-name> <storage-account> <storage-key> <storage-size> --dynamic-etcd-nodes=<number> --namespace=<namespace-name>
   216     ```
   217  
   218     **Note:** Parameters for your Pachyderm cluster deployment might be different.
   219     For more information, see [Deploy Pachyderm](../../deploy/).
   220  
   221  1. Verify that your cluster has been deployed:
   222  
   223     ```pachctl tab="In a Namespace"
   224     kubectl get pod --namespace=<new-cluster>
   225     ```
   226  
   227     ```shell tab="On a Separate Cluster"
   228     kubectl get pod
   229     ```
   230  
   231     * If you have deployed your new cluster in a namespace, Pachyderm should
   232     have created a new context for this deployement. Verify that you are
   233     using this.
   234  
   235  1. Proceed to [Step 4](#step-4-restore-your-cluster).
   236  
   237  ## Step 4 - Restore your Cluster
   238  
   239  After you have created a new cluster, you can restore your backup to this
   240  new cluster. If you have deployed your new cluster in a namespace, Pachyderm
   241  should have created a new context for this deployement. You need to switch to
   242  this new context to access the correct cluster. Before you run the
   243  `pachctl restore` command, your new cluster should be empty.
   244  
   245  To restore your cluster, complete the following steps:
   246  
   247  * If you deployed your new cluster into a different namespace on the same
   248  Kubernetes cluster as your old cluster, verify that you on the correct namespace:
   249  
   250    ```shell
   251    $ pachctl config get context `pachctl config get active-context`
   252    ```
   253  
   254    **Example System Response:**
   255  
   256    ``` hl_lines="5"
   257    {
   258      "source": "IMPORTED",
   259      "cluster_name": "test-migration.us-east-1.eksctl.io",
   260      "auth_info": "user@test-migration.us-east-1.eksctl.io",
   261      "namespace": "new-cluster"
   262    }
   263    ```
   264  
   265    Your active context must have the namespace you have deployed your new
   266    cluster into.
   267  
   268  1. Check that the cluster does not have any exisiting Pachyderm objects:
   269  
   270     ```shell
   271     pachctl list repo & pachctl list pipeline
   272     ```
   273  
   274     You should get empty output.
   275  
   276  1. Restore your cluster from the backup you have created in
   277  [Step 1](#step-1-back-up-your-cluster):
   278  
   279     ```pachctl tab="From a Local File"
   280     pachctl restore < path/to/your/backup/file
   281     ```
   282  
   283     ```pachctl tab="From an S3 Bucker"
   284     pachctl restore --url s3://path/to/backup
   285     ```
   286  
   287     This S3 bucket is different from the s3 bucket to which you cloned
   288     your Pachyderm data. This is merely a bucket you allocated to hold
   289     the Pachyderm backup without objects.
   290  
   291  1. Configure any external data loading systems to point at the new,
   292  upgraded Pachyderm cluster and play back transactions from the checkpoint
   293  established at [Pause External Data Operations](./backup-migrations/#pause-external-data-loading-operations).
   294  Perform any reconfiguration to data loading or unloading operations.
   295  Confirm that the data output is as expected and the new cluster is operating as expected.
   296  
   297  1. Disable the old cluster:
   298  
   299     * If you have deployed the new cluster on the same Kuberenetes cluster
   300     switch to the old cluster's Pachyderm context:
   301  
   302     ```shell
   303     pachctl config set active-context <old-context>
   304     ```
   305  
   306     * If you have deployed the new cluster to a different Kubernetes cluster,
   307     switch to the old cluster's Kubernetes context:
   308  
   309     ```shell
   310     kubectl config use-context <old cluster>
   311     ```
   312  
   313     1. Undeploy your old cluster:
   314  
   315        ```pachctl
   316        pachctl undeploy
   317        ```
   318  
   319  1. Reconfigure new cluster as necessary
   320     You may need to reconfigure the following:
   321  
   322     - Data loading operations from Pachyderm to processes outside
   323     of it to work as expected.
   324     - Kubernetes ingress and port changes taken to avoid conflicts
   325     with the old cluster.