github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.9.x/deploy-manage/manage/migrations.md (about)

     1  # Migration
     2  
     3  !!! info
     4      If you need to upgrade Pachyderm from one minor version
     5      to another, such as from 1.9.4 to 1.9.5, see
     6      [Upgrade Pachyderm](upgrades.md).
     7  
     8  - [Introduction](#introduction)
     9  - [Note about 1.7 to 1.8 migrations](#note-about-1-7-to-1-8-migrations]
    10  - [General migration procedure](#general-migration-procedure)
    11    - [Before you start: backups](#before-you-start-backups)
    12    - [Migration steps](#migration-steps)
    13      - [1. Pause all pipeline and data loading operations](#1-pause-all-pipeline-and-data-loading-operations)
    14      - [2. Extract a pachyderm backup with the --no-objects flag](#2-extract-a-pachyderm-backup-with-the-no-objects-flag)
    15      - [3. Clone your object store bucket](#3-clone-your-object-store-bucket)
    16      - [4. Restart all pipeline and data loading ops](#4-restart-all-pipeline-and-data-loading-ops)
    17      - [5. Deploy a 1.X Pachyderm cluster with cloned bucket](#5-deploy-a-1x-pachyderm-cluster-with-cloned-bucket)
    18      - [6. Restore the new 1.X Pachyderm cluster from your backup](#6-restore-the-new-1x-pachyderm-cluster-from-your-backup)
    19      - [7. Load transactional data from checkpoint into new cluster](#7-load-transactional-data-from-checkpoint-into-new-cluster)
    20      - [8. Disable the old cluster](#8-disable-the-old-cluster)
    21      - [9. Reconfigure new cluster as necessary](#9-reconfigure-new-cluster-as-necessary)
    22  
    23  ## Introduction
    24  
    25  As new versions of Pachyderm are released, you may need to update your cluster to get access to bug fixes and new features. 
    26  These updates fall into two categories, upgrades and migrations.
    27  
    28  An upgrade is moving between point releases within the same major release, 
    29  like 1.7.2 to 1.7.3.
    30  Upgrades are typically a simple process that require little to no downtime.
    31  
    32  Migrations involve moving between major releases, 
    33  like 1.8.6 to 1.9.0.
    34  Migration is covered in this document. 
    35  
    36  In general, 
    37  Pachyderm stores all of its state in two places: 
    38  `etcd` 
    39  (which in turn stores its state in one or more persistent volumes,
    40  which were created when the Pachyderm cluster was deployed) 
    41  and an object store bucket 
    42  (something like AWS S3, MinIO, or Azure Blob Storage).
    43  
    44  In a migration, 
    45  the data structures stored in those locations need to be read, transformed, and rewritten, so the process involves:
    46  
    47  1. bringing up a new Pachyderm cluster adjacent to the old pachyderm cluster
    48  1. exporting the old Pachdyerm cluster's repos, pipelines, and input commits
    49  1. importing the old cluster's repos, commits, and pipelines into the new
    50     cluster.
    51  
    52  *You must perform a migration to move between major releases*,
    53  such as 1.8.7 to 1.9.0.
    54  
    55  Whether you're doing an upgrade or migration, it is recommended you [backup Pachyderm](../backup_restore/#general-backup-procedure) prior.
    56  That will guarantee you can restore your cluster to its previous, good state.
    57  
    58  ## Note about 1.7 to 1.8 migrations
    59  
    60  In Pachyderm 1.8,
    61  we rearchitected core parts of the platform to [improve speed and scalability](http://www.pachyderm.io/2018/11/15/performance-improvements.html).
    62  Migrating from 1.7.x to 1.8.x using the procedure below can a fairly lengthy process.
    63  If your requirements fit, it may be easier to create a new 1.8 or greater cluster and reload your latest source data into your input repositories.
    64  
    65  You may wish to keep your original 1.7 cluster around in a suspended state, reactivating it in case you need access to that provenance data.
    66  
    67  ## General migration procedure
    68  
    69  ### Before you start: backups
    70  
    71  Please refer to [the documentation on backing up your cluster](../backup_restore/#general-backup-procedure).
    72  
    73  ### Migration steps
    74  #### 1. Pause all pipeline and data loading operations
    75  
    76  From the directed acyclic graphs (DAG) that define your pachyderm cluster, stop each pipeline step.  You can either run a multiline shell command, shown below, or you must, for each pipeline, manually run the `stop pipeline` command.
    77  
    78  `pachctl stop pipeline <pipeline-name>`
    79  
    80  You can confirm each pipeline is paused using the `list pipeline` command
    81  
    82  `pachctl list pipeline`
    83  
    84  Alternatively, a useful shell script for running `stop pipeline` on all pipelines is included below.  It may be necessary to install the utilities used in the script, like `jq` and `xargs`, on your system.
    85  
    86  ```
    87  pachctl list pipeline --raw \
    88    | jq -r '.pipeline.name' \
    89    | xargs -P3 -n1 -I{} pachctl stop pipeline {}
    90  ```
    91  
    92  It's also a useful practice, for simple to moderately complex deployments, to keep a terminal window up showing the state of all running kubernetes pods.
    93  
    94  `watch -n 5 kubectl get pods`
    95  
    96  You may need to install the `watch` and `kubectl` commands on your system, and configure `kubectl` to point at the cluster that Pachyderm is running in.
    97  
    98  #### Pausing data loading operations
    99  **Input repositories** or **input repos** in Pachyderm are repositories created with the `pachctl create repo` command.
   100  They're designed to be the repos at the top of a directed acyclic graph of pipelines.
   101  Pipelines have their own output repos associated with them, and are not considered input repos.
   102  If there are any processes external to pachyderm that put data into input repos using any method
   103  (the Pachyderm APIs, `pachctl put file`, etc.), 
   104  they need to be paused.  
   105  See [Loading data from other sources into pachyderm](../backup_restore/#loading-data-from-other-sources-into-pachyderm) below for design considerations for those processes that will minimize downtime during a restore or migration.
   106  
   107  Alternatively, you can use the following commands to stop all data loading into Pachyderm from outside processes.
   108  
   109  ```
   110  # Once you have stopped all running pachyderm pipelines, such as with this command,
   111  # $ pachctl list pipeline --raw \
   112  #   | jq -r '.pipeline.name' \
   113  #   | xargs -P3 -n1 -I{} pachctl stop pipeline {}
   114  
   115  # all pipelines in your cluster should be suspended. To stop all
   116  # data loading processes, we're going to modify the pachd Kubernetes service so that
   117  # it only accepts traffic on port 30649 (instead of the usual 30650). This way,
   118  # any background users and services that send requests to your Pachyderm cluster
   119  # while 'extract' is running will not interfere with the process
   120  #
   121  # Backup the Pachyderm service spec, in case you need to restore it quickly
   122  $ kubectl get svc/pachd -o json >pach_service_backup_30650.json
   123  
   124  # Modify the service to accept traffic on port 30649
   125  # Note that you'll likely also need to modify your cloud provider's firewall
   126  # rules to allow traffic on this port
   127  $ kubectl get svc/pachd -o json | sed 's/30650/30649/g' | kc apply -f -
   128  
   129  # Modify your environment so that *you* can talk to pachd on this new port
   130  $ pachctl config update context `pachctl config get active-context` --pachd-address=<cluster ip>:30649
   131  
   132  # Make sure you can talk to pachd (if not, firewall rules are a common culprit)
   133  $ pachctl version
   134  COMPONENT           VERSION
   135  pachctl             1.9.7
   136  pachd               1.9.7
   137  ```
   138  
   139  ### 2. Extract a pachyderm backup with the --no-objects flag
   140  
   141  This step and the following step, [3. Clone your object store bucket](#3-clone-your-object-store-bucket), can be run simultaneously.
   142  
   143  Using the `pachctl extract` command, create the backup you need.
   144  
   145  `pachctl extract --no-objects > path/to/your/backup/file`
   146  
   147  You can also use the `-u` or `--url` flag to put the backup directly into an object store.
   148  
   149  `pachctl extract --no-objects --url s3://...`
   150  
   151  Note that this s3 bucket is different than the s3 bucket will create to clone your object store.
   152  This is merely a bucket you allocated to hold the pachyderm backup without objects.
   153  
   154  ### 3. Clone your object store bucket
   155  
   156  This step and the prior step,
   157  [2. Extract a pachyderm backup with the --no-objects flag](#2-extract-a-pachyderm-backup-with-the-no-objects-flag),
   158  can be run simultaneously.
   159  Run the command that will clone a bucket in your object store.
   160  
   161  Below, we give an example using the Amazon Web Services CLI to clone one bucket to another,
   162  [taken from the documentation for that command](https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html).
   163  Similar commands are available for [Google Cloud](https://cloud.google.com/storage/docs/gsutil/commands/cp)
   164  and [Azure blob storage](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-linux?toc=%2fazure%2fstorage%2ffiles%2ftoc.json).
   165  
   166  `aws s3 sync s3://mybucket s3://mybucket2`
   167  
   168  ### 4. Restart all pipeline and data loading ops
   169  
   170  Once the backup and clone operations are complete,
   171  restart all paused pipelines and data loading operations,
   172  setting a checkpoint for the started operations that you can use in step [7. Load transactional data from checkpoint into new cluster](#7-load-transactional-data-from-checkpoint-into-new-cluster), below.
   173  See [Loading data from other sources into pachyderm](../backup_restore/#loading-data-from-other-sources-into-pachyderm) to understand why designing this checkpoint into your data loading systems is important.
   174  
   175  From the directed acyclic graphs (DAG) that define your pachyderm cluster,
   176  start each pipeline.
   177  You can either run a multiline shell command, 
   178  shown below,
   179  or you must,
   180  for each pipeline,
   181  manually run the 'start pipeline' command.
   182  
   183  `pachctl start pipeline <pipeline-name>`
   184  
   185  You can confirm each pipeline is started using the `list pipeline` command
   186  
   187  `pachctl list pipeline`
   188  
   189  A useful shell script for running `start pipeline` on all pipelines is included below.
   190  It may be necessary to install several of the utlilies used in the script, like jq, on your system.
   191  
   192  ```
   193  pachctl list pipeline --raw \
   194    | jq -r '.pipeline.name' \
   195    | xargs -P3 -n1 -I{} pachctl start pipeline {}
   196  ```
   197  
   198  If you used the port-changing technique,
   199  [above](#1-pause-all-pipeline-and-data-loading-operations),
   200  to stop all data loading into Pachyderm from outside processes,
   201  you should change the ports back.
   202  
   203  ```
   204  # Once you have restarted all running pachyderm pipelines, such as with this command,
   205  # $ pachctl list pipeline --raw \
   206  #   | jq -r '.pipeline.name' \
   207  #   | xargs -P3 -n1 -I{} pachctl start pipeline {}
   208  
   209  # all pipelines in your cluster should be restarted. To restart all data loading 
   210  # processes, we're going to change the pachd Kubernetes service so that
   211  # it only accepts traffic on port 30650 again (from 30649). 
   212  #
   213  # Backup the Pachyderm service spec, in case you need to restore it quickly
   214  $ kubectl get svc/pachd -o json >pachd_service_backup_30649.json
   215  
   216  # Modify the service to accept traffic on port 30650, again
   217  $ kubectl get svc/pachd -o json | sed 's/30649/30650/g' | kc apply -f -
   218  
   219  # Modify your environment so that *you* can talk to pachd on the old port
   220  $ pachctl config update context `pachctl config get active-context` --pachd-address=<cluster ip>:30650
   221  
   222  # Make sure you can talk to pachd (if not, firewall rules are a common culprit)
   223  $ pc version
   224  COMPONENT           VERSION
   225  pachctl             1.7.11
   226  pachd               1.7.11
   227  ```
   228  
   229  Your old pachyderm cluster can operate while you're creating a migrated one.
   230  It's important that your data loading operations are designed to use the "[Loading data from other sources into pachyderm](../backup_restore/#loading-data-from-other-sources-into-pachyderm)" design criteria below for this to work.
   231  
   232  ### 5. Deploy a 1.X Pachyderm cluster with cloned bucket
   233  
   234  Create a pachyderm cluster using the bucket you cloned in [3. Clone your object store bucket](#3-clone-your-object-store-bucket). 
   235  
   236  You'll want to bring up this new pachyderm cluster in a different namespace.
   237  You'll check at the steps below 
   238  to see if there was some kind of problem with the extracted data 
   239  and steps [2](#2-extract-a-pachyderm-backup-with-the-no-objects-flag) and
   240  [3](#3-clone-your-object-store-bucket) need to be run again. 
   241  Once your new cluster is up and you're connected to it, go on to the next step.
   242  
   243  Note that there may be modifications needed to Kubernetes ingress to Pachyderm deployment in the new namespace to avoid port conflicts in the same cluster.
   244  Please consult with your Kubernetes administrator for information on avoiding ingress conflicts,
   245  or check with us in your Pachyderm support channel if you need help.
   246  
   247  _Important: Use the_ `kubectl config current-config` _command to confirm you're talking to the correct kubernetes cluster configuration for the new cluster._
   248  
   249  ### 6. Restore the new 1.X Pachyderm cluster from your backup
   250  
   251  Using the Pachyderm cluster you deployed in the previous step, [5. Deploy a 1.x Pachyderm cluster with cloned bucket](#5-deploy-a-1x-pachyderm-cluster-with-cloned-bucket), run `pachctl restore` with the backup you created in [2. Extract a pachyderm backup with the --no-objects flag](#2-extract-a-pachyderm-backup-with-the-no-objects-flag).
   252  
   253  !!! note "Important"
   254      Use the_ `kubectl config current-config` _command to confirm you're
   255      talking to the correct kubernetes cluster configuration_.
   256  
   257  `pachctl restore < path/to/your/backup/file`
   258  
   259  You can also use the `-u` or `--url` flag to get the backup directly from the object store you placed it in
   260  
   261  `pachctl restore --url s3://...`
   262  
   263  Note that this s3 bucket is different than the s3 bucket you cloned, above. 
   264  This is merely a bucket you allocated to hold the Pachyderm backup without objects.
   265  
   266  ### 7. Load transactional data from checkpoint into new cluster
   267  
   268  Configure an instance of your data loading systems to point at the new, upgraded pachyderm cluster
   269  and play back transactions from the checkpoint you established in [4. Restart all pipeline and data loading operations](#4-restart-all-pipeline-and-data-loading-ops).
   270  
   271  Perform any reconfiguration to data loading or unloading operations.
   272  
   273  Confirm that the data output is as expected and the new cluster is operating as expected.
   274  
   275  
   276  ### 8. Disable the old cluster
   277  
   278  Once you've confirmed that the new cluster is operating, you can disable the old cluster.
   279  
   280  ### 9. Reconfigure new cluster as necessary
   281  
   282  You may also need to reconfigure
   283  
   284  - data loading operations from Pachyderm to processes outside of it to work as expected
   285  - Kubernetes ingress and port changes taken to avoid conflicts with the old cluster