github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.9.x/deploy-manage/manage/migrations.md (about) 1 # Migration 2 3 !!! info 4 If you need to upgrade Pachyderm from one minor version 5 to another, such as from 1.9.4 to 1.9.5, see 6 [Upgrade Pachyderm](upgrades.md). 7 8 - [Introduction](#introduction) 9 - [Note about 1.7 to 1.8 migrations](#note-about-1-7-to-1-8-migrations] 10 - [General migration procedure](#general-migration-procedure) 11 - [Before you start: backups](#before-you-start-backups) 12 - [Migration steps](#migration-steps) 13 - [1. Pause all pipeline and data loading operations](#1-pause-all-pipeline-and-data-loading-operations) 14 - [2. Extract a pachyderm backup with the --no-objects flag](#2-extract-a-pachyderm-backup-with-the-no-objects-flag) 15 - [3. Clone your object store bucket](#3-clone-your-object-store-bucket) 16 - [4. Restart all pipeline and data loading ops](#4-restart-all-pipeline-and-data-loading-ops) 17 - [5. Deploy a 1.X Pachyderm cluster with cloned bucket](#5-deploy-a-1x-pachyderm-cluster-with-cloned-bucket) 18 - [6. Restore the new 1.X Pachyderm cluster from your backup](#6-restore-the-new-1x-pachyderm-cluster-from-your-backup) 19 - [7. Load transactional data from checkpoint into new cluster](#7-load-transactional-data-from-checkpoint-into-new-cluster) 20 - [8. Disable the old cluster](#8-disable-the-old-cluster) 21 - [9. Reconfigure new cluster as necessary](#9-reconfigure-new-cluster-as-necessary) 22 23 ## Introduction 24 25 As new versions of Pachyderm are released, you may need to update your cluster to get access to bug fixes and new features. 26 These updates fall into two categories, upgrades and migrations. 27 28 An upgrade is moving between point releases within the same major release, 29 like 1.7.2 to 1.7.3. 30 Upgrades are typically a simple process that require little to no downtime. 31 32 Migrations involve moving between major releases, 33 like 1.8.6 to 1.9.0. 34 Migration is covered in this document. 35 36 In general, 37 Pachyderm stores all of its state in two places: 38 `etcd` 39 (which in turn stores its state in one or more persistent volumes, 40 which were created when the Pachyderm cluster was deployed) 41 and an object store bucket 42 (something like AWS S3, MinIO, or Azure Blob Storage). 43 44 In a migration, 45 the data structures stored in those locations need to be read, transformed, and rewritten, so the process involves: 46 47 1. bringing up a new Pachyderm cluster adjacent to the old pachyderm cluster 48 1. exporting the old Pachdyerm cluster's repos, pipelines, and input commits 49 1. importing the old cluster's repos, commits, and pipelines into the new 50 cluster. 51 52 *You must perform a migration to move between major releases*, 53 such as 1.8.7 to 1.9.0. 54 55 Whether you're doing an upgrade or migration, it is recommended you [backup Pachyderm](../backup_restore/#general-backup-procedure) prior. 56 That will guarantee you can restore your cluster to its previous, good state. 57 58 ## Note about 1.7 to 1.8 migrations 59 60 In Pachyderm 1.8, 61 we rearchitected core parts of the platform to [improve speed and scalability](http://www.pachyderm.io/2018/11/15/performance-improvements.html). 62 Migrating from 1.7.x to 1.8.x using the procedure below can a fairly lengthy process. 63 If your requirements fit, it may be easier to create a new 1.8 or greater cluster and reload your latest source data into your input repositories. 64 65 You may wish to keep your original 1.7 cluster around in a suspended state, reactivating it in case you need access to that provenance data. 66 67 ## General migration procedure 68 69 ### Before you start: backups 70 71 Please refer to [the documentation on backing up your cluster](../backup_restore/#general-backup-procedure). 72 73 ### Migration steps 74 #### 1. Pause all pipeline and data loading operations 75 76 From the directed acyclic graphs (DAG) that define your pachyderm cluster, stop each pipeline step. You can either run a multiline shell command, shown below, or you must, for each pipeline, manually run the `stop pipeline` command. 77 78 `pachctl stop pipeline <pipeline-name>` 79 80 You can confirm each pipeline is paused using the `list pipeline` command 81 82 `pachctl list pipeline` 83 84 Alternatively, a useful shell script for running `stop pipeline` on all pipelines is included below. It may be necessary to install the utilities used in the script, like `jq` and `xargs`, on your system. 85 86 ``` 87 pachctl list pipeline --raw \ 88 | jq -r '.pipeline.name' \ 89 | xargs -P3 -n1 -I{} pachctl stop pipeline {} 90 ``` 91 92 It's also a useful practice, for simple to moderately complex deployments, to keep a terminal window up showing the state of all running kubernetes pods. 93 94 `watch -n 5 kubectl get pods` 95 96 You may need to install the `watch` and `kubectl` commands on your system, and configure `kubectl` to point at the cluster that Pachyderm is running in. 97 98 #### Pausing data loading operations 99 **Input repositories** or **input repos** in Pachyderm are repositories created with the `pachctl create repo` command. 100 They're designed to be the repos at the top of a directed acyclic graph of pipelines. 101 Pipelines have their own output repos associated with them, and are not considered input repos. 102 If there are any processes external to pachyderm that put data into input repos using any method 103 (the Pachyderm APIs, `pachctl put file`, etc.), 104 they need to be paused. 105 See [Loading data from other sources into pachyderm](../backup_restore/#loading-data-from-other-sources-into-pachyderm) below for design considerations for those processes that will minimize downtime during a restore or migration. 106 107 Alternatively, you can use the following commands to stop all data loading into Pachyderm from outside processes. 108 109 ``` 110 # Once you have stopped all running pachyderm pipelines, such as with this command, 111 # $ pachctl list pipeline --raw \ 112 # | jq -r '.pipeline.name' \ 113 # | xargs -P3 -n1 -I{} pachctl stop pipeline {} 114 115 # all pipelines in your cluster should be suspended. To stop all 116 # data loading processes, we're going to modify the pachd Kubernetes service so that 117 # it only accepts traffic on port 30649 (instead of the usual 30650). This way, 118 # any background users and services that send requests to your Pachyderm cluster 119 # while 'extract' is running will not interfere with the process 120 # 121 # Backup the Pachyderm service spec, in case you need to restore it quickly 122 $ kubectl get svc/pachd -o json >pach_service_backup_30650.json 123 124 # Modify the service to accept traffic on port 30649 125 # Note that you'll likely also need to modify your cloud provider's firewall 126 # rules to allow traffic on this port 127 $ kubectl get svc/pachd -o json | sed 's/30650/30649/g' | kc apply -f - 128 129 # Modify your environment so that *you* can talk to pachd on this new port 130 $ pachctl config update context `pachctl config get active-context` --pachd-address=<cluster ip>:30649 131 132 # Make sure you can talk to pachd (if not, firewall rules are a common culprit) 133 $ pachctl version 134 COMPONENT VERSION 135 pachctl 1.9.7 136 pachd 1.9.7 137 ``` 138 139 ### 2. Extract a pachyderm backup with the --no-objects flag 140 141 This step and the following step, [3. Clone your object store bucket](#3-clone-your-object-store-bucket), can be run simultaneously. 142 143 Using the `pachctl extract` command, create the backup you need. 144 145 `pachctl extract --no-objects > path/to/your/backup/file` 146 147 You can also use the `-u` or `--url` flag to put the backup directly into an object store. 148 149 `pachctl extract --no-objects --url s3://...` 150 151 Note that this s3 bucket is different than the s3 bucket will create to clone your object store. 152 This is merely a bucket you allocated to hold the pachyderm backup without objects. 153 154 ### 3. Clone your object store bucket 155 156 This step and the prior step, 157 [2. Extract a pachyderm backup with the --no-objects flag](#2-extract-a-pachyderm-backup-with-the-no-objects-flag), 158 can be run simultaneously. 159 Run the command that will clone a bucket in your object store. 160 161 Below, we give an example using the Amazon Web Services CLI to clone one bucket to another, 162 [taken from the documentation for that command](https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html). 163 Similar commands are available for [Google Cloud](https://cloud.google.com/storage/docs/gsutil/commands/cp) 164 and [Azure blob storage](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-linux?toc=%2fazure%2fstorage%2ffiles%2ftoc.json). 165 166 `aws s3 sync s3://mybucket s3://mybucket2` 167 168 ### 4. Restart all pipeline and data loading ops 169 170 Once the backup and clone operations are complete, 171 restart all paused pipelines and data loading operations, 172 setting a checkpoint for the started operations that you can use in step [7. Load transactional data from checkpoint into new cluster](#7-load-transactional-data-from-checkpoint-into-new-cluster), below. 173 See [Loading data from other sources into pachyderm](../backup_restore/#loading-data-from-other-sources-into-pachyderm) to understand why designing this checkpoint into your data loading systems is important. 174 175 From the directed acyclic graphs (DAG) that define your pachyderm cluster, 176 start each pipeline. 177 You can either run a multiline shell command, 178 shown below, 179 or you must, 180 for each pipeline, 181 manually run the 'start pipeline' command. 182 183 `pachctl start pipeline <pipeline-name>` 184 185 You can confirm each pipeline is started using the `list pipeline` command 186 187 `pachctl list pipeline` 188 189 A useful shell script for running `start pipeline` on all pipelines is included below. 190 It may be necessary to install several of the utlilies used in the script, like jq, on your system. 191 192 ``` 193 pachctl list pipeline --raw \ 194 | jq -r '.pipeline.name' \ 195 | xargs -P3 -n1 -I{} pachctl start pipeline {} 196 ``` 197 198 If you used the port-changing technique, 199 [above](#1-pause-all-pipeline-and-data-loading-operations), 200 to stop all data loading into Pachyderm from outside processes, 201 you should change the ports back. 202 203 ``` 204 # Once you have restarted all running pachyderm pipelines, such as with this command, 205 # $ pachctl list pipeline --raw \ 206 # | jq -r '.pipeline.name' \ 207 # | xargs -P3 -n1 -I{} pachctl start pipeline {} 208 209 # all pipelines in your cluster should be restarted. To restart all data loading 210 # processes, we're going to change the pachd Kubernetes service so that 211 # it only accepts traffic on port 30650 again (from 30649). 212 # 213 # Backup the Pachyderm service spec, in case you need to restore it quickly 214 $ kubectl get svc/pachd -o json >pachd_service_backup_30649.json 215 216 # Modify the service to accept traffic on port 30650, again 217 $ kubectl get svc/pachd -o json | sed 's/30649/30650/g' | kc apply -f - 218 219 # Modify your environment so that *you* can talk to pachd on the old port 220 $ pachctl config update context `pachctl config get active-context` --pachd-address=<cluster ip>:30650 221 222 # Make sure you can talk to pachd (if not, firewall rules are a common culprit) 223 $ pc version 224 COMPONENT VERSION 225 pachctl 1.7.11 226 pachd 1.7.11 227 ``` 228 229 Your old pachyderm cluster can operate while you're creating a migrated one. 230 It's important that your data loading operations are designed to use the "[Loading data from other sources into pachyderm](../backup_restore/#loading-data-from-other-sources-into-pachyderm)" design criteria below for this to work. 231 232 ### 5. Deploy a 1.X Pachyderm cluster with cloned bucket 233 234 Create a pachyderm cluster using the bucket you cloned in [3. Clone your object store bucket](#3-clone-your-object-store-bucket). 235 236 You'll want to bring up this new pachyderm cluster in a different namespace. 237 You'll check at the steps below 238 to see if there was some kind of problem with the extracted data 239 and steps [2](#2-extract-a-pachyderm-backup-with-the-no-objects-flag) and 240 [3](#3-clone-your-object-store-bucket) need to be run again. 241 Once your new cluster is up and you're connected to it, go on to the next step. 242 243 Note that there may be modifications needed to Kubernetes ingress to Pachyderm deployment in the new namespace to avoid port conflicts in the same cluster. 244 Please consult with your Kubernetes administrator for information on avoiding ingress conflicts, 245 or check with us in your Pachyderm support channel if you need help. 246 247 _Important: Use the_ `kubectl config current-config` _command to confirm you're talking to the correct kubernetes cluster configuration for the new cluster._ 248 249 ### 6. Restore the new 1.X Pachyderm cluster from your backup 250 251 Using the Pachyderm cluster you deployed in the previous step, [5. Deploy a 1.x Pachyderm cluster with cloned bucket](#5-deploy-a-1x-pachyderm-cluster-with-cloned-bucket), run `pachctl restore` with the backup you created in [2. Extract a pachyderm backup with the --no-objects flag](#2-extract-a-pachyderm-backup-with-the-no-objects-flag). 252 253 !!! note "Important" 254 Use the_ `kubectl config current-config` _command to confirm you're 255 talking to the correct kubernetes cluster configuration_. 256 257 `pachctl restore < path/to/your/backup/file` 258 259 You can also use the `-u` or `--url` flag to get the backup directly from the object store you placed it in 260 261 `pachctl restore --url s3://...` 262 263 Note that this s3 bucket is different than the s3 bucket you cloned, above. 264 This is merely a bucket you allocated to hold the Pachyderm backup without objects. 265 266 ### 7. Load transactional data from checkpoint into new cluster 267 268 Configure an instance of your data loading systems to point at the new, upgraded pachyderm cluster 269 and play back transactions from the checkpoint you established in [4. Restart all pipeline and data loading operations](#4-restart-all-pipeline-and-data-loading-ops). 270 271 Perform any reconfiguration to data loading or unloading operations. 272 273 Confirm that the data output is as expected and the new cluster is operating as expected. 274 275 276 ### 8. Disable the old cluster 277 278 Once you've confirmed that the new cluster is operating, you can disable the old cluster. 279 280 ### 9. Reconfigure new cluster as necessary 281 282 You may also need to reconfigure 283 284 - data loading operations from Pachyderm to processes outside of it to work as expected 285 - Kubernetes ingress and port changes taken to avoid conflicts with the old cluster