github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.9.x/deploy-manage/manage/migrations.md

github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.9.x/deploy-manage/manage/migrations.md (about)

1 # Migration
2
3 !!! info
4 If you need to upgrade Pachyderm from one minor version
5 to another, such as from 1.9.4 to 1.9.5, see
6 [Upgrade Pachyderm](upgrades.md).
7
8 - [Introduction](#introduction)
9 - [Note about 1.7 to 1.8 migrations](#note-about-1-7-to-1-8-migrations]
10 - [General migration procedure](#general-migration-procedure)
11 - [Before you start: backups](#before-you-start-backups)
12 - [Migration steps](#migration-steps)
13 - [1. Pause all pipeline and data loading operations](#1-pause-all-pipeline-and-data-loading-operations)
14 - [2. Extract a pachyderm backup with the --no-objects flag](#2-extract-a-pachyderm-backup-with-the-no-objects-flag)
15 - [3. Clone your object store bucket](#3-clone-your-object-store-bucket)
16 - [4. Restart all pipeline and data loading ops](#4-restart-all-pipeline-and-data-loading-ops)
17 - [5. Deploy a 1.X Pachyderm cluster with cloned bucket](#5-deploy-a-1x-pachyderm-cluster-with-cloned-bucket)
18 - [6. Restore the new 1.X Pachyderm cluster from your backup](#6-restore-the-new-1x-pachyderm-cluster-from-your-backup)
19 - [7. Load transactional data from checkpoint into new cluster](#7-load-transactional-data-from-checkpoint-into-new-cluster)
20 - [8. Disable the old cluster](#8-disable-the-old-cluster)
21 - [9. Reconfigure new cluster as necessary](#9-reconfigure-new-cluster-as-necessary)
22
23 ## Introduction
24
25 As new versions of Pachyderm are released, you may need to update your cluster to get access to bug fixes and new features.
26 These updates fall into two categories, upgrades and migrations.
27
28 An upgrade is moving between point releases within the same major release,
29 like 1.7.2 to 1.7.3.
30 Upgrades are typically a simple process that require little to no downtime.
31
32 Migrations involve moving between major releases,
33 like 1.8.6 to 1.9.0.
34 Migration is covered in this document.
35
36 In general,
37 Pachyderm stores all of its state in two places:
38 `etcd`
39 (which in turn stores its state in one or more persistent volumes,
40 which were created when the Pachyderm cluster was deployed)
41 and an object store bucket
42 (something like AWS S3, MinIO, or Azure Blob Storage).
43
44 In a migration,
45 the data structures stored in those locations need to be read, transformed, and rewritten, so the process involves:
46
47 1. bringing up a new Pachyderm cluster adjacent to the old pachyderm cluster
48 1. exporting the old Pachdyerm cluster's repos, pipelines, and input commits
49 1. importing the old cluster's repos, commits, and pipelines into the new
50 cluster.
51
52 *You must perform a migration to move between major releases*,
53 such as 1.8.7 to 1.9.0.
54
55 Whether you're doing an upgrade or migration, it is recommended you [backup Pachyderm](../backup_restore/#general-backup-procedure) prior.
56 That will guarantee you can restore your cluster to its previous, good state.
57
58 ## Note about 1.7 to 1.8 migrations
59
60 In Pachyderm 1.8,
61 we rearchitected core parts of the platform to [improve speed and scalability](http://www.pachyderm.io/2018/11/15/performance-improvements.html).
62 Migrating from 1.7.x to 1.8.x using the procedure below can a fairly lengthy process.
63 If your requirements fit, it may be easier to create a new 1.8 or greater cluster and reload your latest source data into your input repositories.
64
65 You may wish to keep your original 1.7 cluster around in a suspended state, reactivating it in case you need access to that provenance data.
66
67 ## General migration procedure
68
69 ### Before you start: backups
70
71 Please refer to [the documentation on backing up your cluster](../backup_restore/#general-backup-procedure).
72
73 ### Migration steps
74 #### 1. Pause all pipeline and data loading operations
75
76 From the directed acyclic graphs (DAG) that define your pachyderm cluster, stop each pipeline step. You can either run a multiline shell command, shown below, or you must, for each pipeline, manually run the `stop pipeline` command.
77
78 `pachctl stop pipeline <pipeline-name>`
79
80 You can confirm each pipeline is paused using the `list pipeline` command
81
82 `pachctl list pipeline`
83
84 Alternatively, a useful shell script for running `stop pipeline` on all pipelines is included below. It may be necessary to install the utilities used in the script, like `jq` and `xargs`, on your system.
85
86 ```
87 pachctl list pipeline --raw \
88 | jq -r '.pipeline.name' \
89 | xargs -P3 -n1 -I{} pachctl stop pipeline {}
90 ```
91
92 It's also a useful practice, for simple to moderately complex deployments, to keep a terminal window up showing the state of all running kubernetes pods.
93
94 `watch -n 5 kubectl get pods`
95
96 You may need to install the `watch` and `kubectl` commands on your system, and configure `kubectl` to point at the cluster that Pachyderm is running in.
97
98 #### Pausing data loading operations
99 **Input repositories** or **input repos** in Pachyderm are repositories created with the `pachctl create repo` command.
100 They're designed to be the repos at the top of a directed acyclic graph of pipelines.
101 Pipelines have their own output repos associated with them, and are not considered input repos.
102 If there are any processes external to pachyderm that put data into input repos using any method
103 (the Pachyderm APIs, `pachctl put file`, etc.),
104 they need to be paused.
105 See [Loading data from other sources into pachyderm](../backup_restore/#loading-data-from-other-sources-into-pachyderm) below for design considerations for those processes that will minimize downtime during a restore or migration.
106
107 Alternatively, you can use the following commands to stop all data loading into Pachyderm from outside processes.
108
109 ```
110 # Once you have stopped all running pachyderm pipelines, such as with this command,
111 # $ pachctl list pipeline --raw \
112 # | jq -r '.pipeline.name' \
113 # | xargs -P3 -n1 -I{} pachctl stop pipeline {}
114
115 # all pipelines in your cluster should be suspended. To stop all
116 # data loading processes, we're going to modify the pachd Kubernetes service so that
117 # it only accepts traffic on port 30649 (instead of the usual 30650). This way,
118 # any background users and services that send requests to your Pachyderm cluster
119 # while 'extract' is running will not interfere with the process
120 #
121 # Backup the Pachyderm service spec, in case you need to restore it quickly
122 $ kubectl get svc/pachd -o json >pach_service_backup_30650.json
123
124 # Modify the service to accept traffic on port 30649
125 # Note that you'll likely also need to modify your cloud provider's firewall
126 # rules to allow traffic on this port
127 $ kubectl get svc/pachd -o json | sed 's/30650/30649/g' | kc apply -f -
128
129 # Modify your environment so that *you* can talk to pachd on this new port
130 $ pachctl config update context `pachctl config get active-context` --pachd-address=<cluster ip>:30649
131
132 # Make sure you can talk to pachd (if not, firewall rules are a common culprit)
133 $ pachctl version
134 COMPONENT VERSION
135 pachctl 1.9.7
136 pachd 1.9.7
137 ```
138
139 ### 2. Extract a pachyderm backup with the --no-objects flag
140
141 This step and the following step, [3. Clone your object store bucket](#3-clone-your-object-store-bucket), can be run simultaneously.
142
143 Using the `pachctl extract` command, create the backup you need.
144
145 `pachctl extract --no-objects > path/to/your/backup/file`
146
147 You can also use the `-u` or `--url` flag to put the backup directly into an object store.
148
149 `pachctl extract --no-objects --url s3://...`
150
151 Note that this s3 bucket is different than the s3 bucket will create to clone your object store.
152 This is merely a bucket you allocated to hold the pachyderm backup without objects.
153
154 ### 3. Clone your object store bucket
155
156 This step and the prior step,
157 [2. Extract a pachyderm backup with the --no-objects flag](#2-extract-a-pachyderm-backup-with-the-no-objects-flag),
158 can be run simultaneously.
159 Run the command that will clone a bucket in your object store.
160
161 Below, we give an example using the Amazon Web Services CLI to clone one bucket to another,
162 [taken from the documentation for that command](https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html).
163 Similar commands are available for [Google Cloud](https://cloud.google.com/storage/docs/gsutil/commands/cp)
164 and [Azure blob storage](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-linux?toc=%2fazure%2fstorage%2ffiles%2ftoc.json).
165
166 `aws s3 sync s3://mybucket s3://mybucket2`
167
168 ### 4. Restart all pipeline and data loading ops
169
170 Once the backup and clone operations are complete,
171 restart all paused pipelines and data loading operations,
172 setting a checkpoint for the started operations that you can use in step [7. Load transactional data from checkpoint into new cluster](#7-load-transactional-data-from-checkpoint-into-new-cluster), below.
173 See [Loading data from other sources into pachyderm](../backup_restore/#loading-data-from-other-sources-into-pachyderm) to understand why designing this checkpoint into your data loading systems is important.
174
175 From the directed acyclic graphs (DAG) that define your pachyderm cluster,
176 start each pipeline.
177 You can either run a multiline shell command,
178 shown below,
179 or you must,
180 for each pipeline,
181 manually run the 'start pipeline' command.
182
183 `pachctl start pipeline <pipeline-name>`
184
185 You can confirm each pipeline is started using the `list pipeline` command
186
187 `pachctl list pipeline`
188
189 A useful shell script for running `start pipeline` on all pipelines is included below.
190 It may be necessary to install several of the utlilies used in the script, like jq, on your system.
191
192 ```
193 pachctl list pipeline --raw \
194 | jq -r '.pipeline.name' \
195 | xargs -P3 -n1 -I{} pachctl start pipeline {}
196 ```
197
198 If you used the port-changing technique,
199 [above](#1-pause-all-pipeline-and-data-loading-operations),
200 to stop all data loading into Pachyderm from outside processes,
201 you should change the ports back.
202
203 ```
204 # Once you have restarted all running pachyderm pipelines, such as with this command,
205 # $ pachctl list pipeline --raw \
206 # | jq -r '.pipeline.name' \
207 # | xargs -P3 -n1 -I{} pachctl start pipeline {}
208
209 # all pipelines in your cluster should be restarted. To restart all data loading
210 # processes, we're going to change the pachd Kubernetes service so that
211 # it only accepts traffic on port 30650 again (from 30649).
212 #
213 # Backup the Pachyderm service spec, in case you need to restore it quickly
214 $ kubectl get svc/pachd -o json >pachd_service_backup_30649.json
215
216 # Modify the service to accept traffic on port 30650, again
217 $ kubectl get svc/pachd -o json | sed 's/30649/30650/g' | kc apply -f -
218
219 # Modify your environment so that *you* can talk to pachd on the old port
220 $ pachctl config update context `pachctl config get active-context` --pachd-address=<cluster ip>:30650
221
222 # Make sure you can talk to pachd (if not, firewall rules are a common culprit)
223 $ pc version
224 COMPONENT VERSION
225 pachctl 1.7.11
226 pachd 1.7.11
227 ```
228
229 Your old pachyderm cluster can operate while you're creating a migrated one.
230 It's important that your data loading operations are designed to use the "[Loading data from other sources into pachyderm](../backup_restore/#loading-data-from-other-sources-into-pachyderm)" design criteria below for this to work.
231
232 ### 5. Deploy a 1.X Pachyderm cluster with cloned bucket
233
234 Create a pachyderm cluster using the bucket you cloned in [3. Clone your object store bucket](#3-clone-your-object-store-bucket).
235
236 You'll want to bring up this new pachyderm cluster in a different namespace.
237 You'll check at the steps below
238 to see if there was some kind of problem with the extracted data
239 and steps [2](#2-extract-a-pachyderm-backup-with-the-no-objects-flag) and
240 [3](#3-clone-your-object-store-bucket) need to be run again.
241 Once your new cluster is up and you're connected to it, go on to the next step.
242
243 Note that there may be modifications needed to Kubernetes ingress to Pachyderm deployment in the new namespace to avoid port conflicts in the same cluster.
244 Please consult with your Kubernetes administrator for information on avoiding ingress conflicts,
245 or check with us in your Pachyderm support channel if you need help.
246
247 _Important: Use the_ `kubectl config current-config` _command to confirm you're talking to the correct kubernetes cluster configuration for the new cluster._
248
249 ### 6. Restore the new 1.X Pachyderm cluster from your backup
250
251 Using the Pachyderm cluster you deployed in the previous step, [5. Deploy a 1.x Pachyderm cluster with cloned bucket](#5-deploy-a-1x-pachyderm-cluster-with-cloned-bucket), run `pachctl restore` with the backup you created in [2. Extract a pachyderm backup with the --no-objects flag](#2-extract-a-pachyderm-backup-with-the-no-objects-flag).
252
253 !!! note "Important"
254 Use the_ `kubectl config current-config` _command to confirm you're
255 talking to the correct kubernetes cluster configuration_.
256
257 `pachctl restore < path/to/your/backup/file`
258
259 You can also use the `-u` or `--url` flag to get the backup directly from the object store you placed it in
260
261 `pachctl restore --url s3://...`
262
263 Note that this s3 bucket is different than the s3 bucket you cloned, above.
264 This is merely a bucket you allocated to hold the Pachyderm backup without objects.
265
266 ### 7. Load transactional data from checkpoint into new cluster
267
268 Configure an instance of your data loading systems to point at the new, upgraded pachyderm cluster
269 and play back transactions from the checkpoint you established in [4. Restart all pipeline and data loading operations](#4-restart-all-pipeline-and-data-loading-ops).
270
271 Perform any reconfiguration to data loading or unloading operations.
272
273 Confirm that the data output is as expected and the new cluster is operating as expected.
274
275
276 ### 8. Disable the old cluster
277
278 Once you've confirmed that the new cluster is operating, you can disable the old cluster.
279
280 ### 9. Reconfigure new cluster as necessary
281
282 You may also need to reconfigure
283
284 - data loading operations from Pachyderm to processes outside of it to work as expected
285 - Kubernetes ingress and port changes taken to avoid conflicts with the old cluster