github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.11.x/deploy-manage/manage/backup_restore.md (about) 1 # Backup Your Cluster 2 3 Pachyderm provides the `pachctl extract` and `pachctl restore` commands to 4 back up and restore the state of a Pachyderm cluster. 5 6 The `pachctl extract` command requires that all pipeline and data loading 7 activity into Pachyderm stop before the extract occurs. This enables 8 Pachyderm to create a consistent, point-in-time backup. 9 10 Extract and restore commands are used to migrate between minor 11 and major releases of Pachyderm. In addition, there are a few design 12 points and operational techniques that data engineers should take 13 into consideration when creating complex pachyderm deployments to 14 minimize disruptions to production pipelines. 15 16 Backing up Pachyderm involves the persistent volume (PV) that 17 `etcd` uses for administrative data and the object store bucket that 18 holds Pachyderm's actual data. 19 Restoring involves populating that PV and object store with data to 20 recreate a Pachyderm cluster. 21 22 ## Before You Begin 23 24 Before you begin, you need to pause all the pipelines and data operations 25 that run in your cluster. You can do so either by running a multi-line 26 shell script or by running the `pachctl stop pipeline` command for each 27 pipeline individually. 28 29 If you decide to use a shell script below, you need to have `jq` and 30 `xargs` installed on your system. Also, you might need to install 31 the `watch` and `kubectl` commands on your system, and configure 32 `kubectl` to point at the cluster that Pachyderm is running in. 33 34 To stop a running pipeline, complete the following steps: 35 36 1. Pause each pipeline individually by repeatedly running the single 37 `pachctl` command or by running a script: 38 39 ```pachctl tab="Command" 40 pachctl stop pipeline <pipeline-name> 41 ``` 42 43 ```shell tab="Script" 44 pachctl list pipeline --raw \ 45 | jq -r '.pipeline.name' \ 46 | xargs -P3 -n1 -I{} pachctl stop pipeline {} 47 ``` 48 49 1. Optionally, run the `watch` command to monitor the pods 50 terminating: 51 52 ```shell 53 watch -n 5 kubectl get pods 54 ``` 55 56 1. Confirm that pipelines are paused: 57 58 ```shell 59 pachctl list pipeline 60 ``` 61 62 ### Pause External Data Loading Operations 63 64 **Input repositories** or **input repos** in pachyderm are 65 repositories created with the `pachctl create repo` command. 66 They are designed to be the repos at the top of a directed 67 acyclic graph of pipelines. Pipelines have their own output 68 repos associated with them. These repos are different from 69 input repos. 70 71 If you have any processes external to Pachyderm 72 that put data into input repos using any supported method, 73 such as the Pachyderm APIs, `pachctl put file`, or other, 74 you need to pause those processes. 75 76 When an external system writes data into Pachyderm 77 input repos, you need to provide ways of *pausing* 78 output while queueing any data output 79 requests to be output when the systems are *resumed*. 80 This allows all Pachyderm processing to be stopped while 81 the extract takes place. 82 83 In addition, it is desirable for systems that load data 84 into Pachyderm have a mechanism for replaying a queue 85 from any checkpoint in time. 86 This is useful when doing migrations from one release 87 to another, where you want to minimize downtime 88 of a production Pachyderm system. After an extract, 89 the old system is kept running with the checkpoint 90 established while a duplicate, upgraded Pachyderm 91 cluster is being migrated with duplicated data. 92 Transactions that occur while the migrated, 93 upgraded cluster is being brought up are not lost. 94 95 If you are not using any external way of pausing input 96 from internal systems, you can use the following commands to stop 97 all data loading into Pachyderm from outside processes. 98 To stop all data loading processes, you need to modify 99 the `pachd` Kubernetes service so that it only accepts 100 traffic on port 30649 instead of the usual 30650. This way, 101 any background users and services that send requests to 102 your Pachyderm cluster while `pachctl extract` is 103 running will not interfere with the process. Use this port switching 104 technique to minimize downtime during the migration. 105 106 To pause external data loading operations, complete the 107 following steps: 108 109 1. Verify that all Pachyderm pipelines are paused: 110 111 ```shell 112 pachctl list pipeline 113 ``` 114 115 1. For safery, save the Pachyderm service spec in a `json`: 116 117 ```shell 118 kubectl get svc/pachd -o json >pach_service_backup_30650.json 119 ``` 120 121 1. Modify the `pachd` service to accept traffic on port 30649: 122 123 ```shell 124 kubectl get svc/pachd -o json | sed 's/30650/30649/g' | kubectl apply -f - 125 ``` 126 127 Most likely, you will need to modify your cloud provider's firewall 128 rules to allow traffic on this port. 129 130 Depending on your deployment, you might need to switch 131 additional ports: 132 133 1. Back up the `etcd` and dashboard manifests: 134 135 ```shell 136 kubectl get svc/etcd -o json >etcd_svc_backup_32379.json 137 kubectl get svc/dash -o json >dash_svc_backup_30080.json 138 ``` 139 140 1. Switch the `etcd` and dashboard manifests: 141 142 ```shell 143 kubectl get svc/pachd -o json | sed 's/30651/30648/g' | kubectl apply -f - 144 kubectl get svc/pachd -o json | sed 's/30652/30647/g' | kubectl apply -f - 145 kubectl get svc/pachd -o json | sed 's/30654/30646/g' | kubectl apply -f - 146 kubectl get svc/pachd -o json | sed 's/30655/30644/g' | kubectl apply -f - 147 kubectl get svc/etcd -o json | sed 's/32379/32378/g' | kubectl apply -f - 148 kubectl get svc/dash -o json | sed 's/30080/30079/g' | kubectl apply -f - 149 kubectl get svc/dash -o json | sed 's/30081/30078/g' | kubectl apply -f - 150 kubectl get svc/pachd -o json | sed 's/30600/30611/g' | kubectl apply -f - 151 ``` 152 153 1. Modify your environment so that you can access `pachd` on this new 154 port 155 156 ```shell 157 pachctl config update context `pachctl config get active-context` --pachd-address=<cluster ip>:30649 158 ``` 159 160 1. Verify that you can talk to `pachd`: (if not, firewall rules are a common culprit) 161 162 ```shell 163 pachctl version 164 ``` 165 166 **System Response:** 167 168 ``` 169 COMPONENT VERSION 170 pachctl {{ config.pach_latest_version }} 171 pachd {{ config.pach_latest_version }} 172 ``` 173 174 ??? note "pause-pipelines.sh" 175 Alternatively, you can run **Steps 1 - 3** by using the following script: 176 177 ```shell 178 #!/bin/bash 179 # Stop all pipelines: 180 pachctl list pipeline --raw \ 181 | jq -r '.pipeline.name' \ 182 | xargs -P3 -n1 -I{} pachctl stop pipeline {} 183 184 # Backup the Pachyderm services specs, in case you need to restore them: 185 kubectl get svc/pachd -o json >pach_service_backup_30650.json 186 kubectl get svc/etcd -o json >etcd_svc_backup_32379.json 187 kubectl get svc/dash -o json >dash_svc_backup_30080.json 188 189 # Modify all ports of all the Pachyderm service to avoid collissions 190 # with the migration cluster: 191 # Modify the pachd API endpoint to run on 30649: 192 kubectl get svc/pachd -o json | sed 's/30650/30649/g' | kubectl apply -f - 193 # Modify the pachd trace port to run on 30648: 194 kubectl get svc/pachd -o json | sed 's/30651/30648/g' | kubectl apply -f - 195 # Modify the pachd api-over-http port to run on 30647: 196 kubectl get svc/pachd -o json | sed 's/30652/30647/g' | kubectl apply -f - 197 # Modify the pachd saml authentication port to run on 30646: 198 kubectl get svc/pachd -o json | sed 's/30654/30646/g' | kubectl apply -f - 199 # Modify the pachd git api callback port to run on 30644: 200 kubectl get svc/pachd -o json | sed 's/30655/30644/g' | kubectl apply -f - 201 # Modify the etcd client port to run on 32378: 202 kubectl get svc/etcd -o json | sed 's/32379/32378/g' | kubectl apply -f - 203 # Modify the dashboard ports to run on 30079 and 30078: 204 kubectl get svc/dash -o json | sed 's/30080/30079/g' | kubectl apply -f - 205 kubectl get svc/dash -o json | sed 's/30081/30078/g' | kubectl apply -f - 206 # Modify the pachd s3 port to run on 30611: 207 kubectl get svc/pachd -o json | sed 's/30600/30611/g' | kubectl apply -f - 208 ``` 209 210 ## Back up Your Pachyderm Cluster 211 212 After you pause all pipelines and external data operations, 213 you can use the `pachctl extract` command to back up your data. 214 You can use `pachctl extract` alone or in combination with 215 cloning or snapshotting services offered by your cloud provider. 216 217 The backup includes the following: 218 219 * Your data that is typically stored in an object store 220 * Information about Pachyderm primitives, such as pipelines, repositories, 221 commits, provenance and so on. This information is stored in etcd. 222 223 You can back up everything to one local file or you can back up 224 Pachyderm primitives to a local file and use object store's 225 capabilities to clone the data stored in object store buckets. 226 The latter is preferred for large volumes of data and minimizing 227 the downtime during the upgrade. Use the 228 `--no-objects` flag to separate backups. 229 230 In addition, you can extract your partial or full backup into a 231 separate S3 bucket. The bucket must have the same permissions policy as 232 the one you have configured when you originally deployed Pachyderm. 233 234 To back up your Pachyderm cluster, run one of the following commands: 235 236 * To create a partial back up of metadata-only, run: 237 238 ```shell 239 pachctl extract --no-objects > path/to/your/backup/file 240 ``` 241 242 * If you want to save this partial backup in an object store by using the 243 `--url` flag, run: 244 245 ```shell 246 pachctl extract --no-objects --url s3://... 247 ``` 248 249 * To back up everything in one local file: 250 251 ```shell 252 pachctl extract > path/to/your/backup/file 253 ``` 254 255 Similarly, this backup can be saved in an object store with the `--url` 256 flag. 257 258 ## Using your Cloud Provider's Clone and Snapshot Services 259 260 Follow your cloud provider's recommendation 261 for backing up persistent volumes and object stores. Here are some pointers to the relevant documentation: 262 263 * Creating a snapshot of persistent volumes: 264 265 - [Creating snapshots of GCE persistent volumes](https://cloud.google.com/compute/docs/disks/create-snapshots) 266 - [Creating snapshots of Elastic Block Store (EBS) volumes](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-creating-snapshot.html) 267 - [Creating snapshots of Azure Virtual Hard Disk volumes](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/snapshot-copy-managed-disk) 268 269 For on-premises Kubernetes deployments, check the vendor documentation for 270 your PV implementation on backing up and restoring. 271 272 * Cloning object stores: 273 274 - [Using AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html) 275 - [Using gsutil](https://cloud.google.com/storage/docs/gsutil/commands/cp) 276 - [Using azcopy](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-linux?toc=%2fazure%2fstorage%2ffiles%2ftoc.json). 277 278 For on-premises Kubernetes deployments, check the vendor documentation 279 for your on-premises object store for details on backing up and 280 restoring a bucket. 281 282 # Restore your Cluster from a Backup: 283 284 After you backup your cluster, you can restore it by using the 285 `pachctl restore` command. Typically, you would deploy a new Pachyderm cluster 286 either in another Kubernetes namespace or in a completely separate Kubernetes cluster. 287 288 To restore your Cluster from a Backup, run the following command: 289 290 * If you have backed up your cluster to a local file:, run: 291 292 ```shell 293 pachctl restore < path/to/your/backup/file 294 ``` 295 296 * If you have backed up your cluster to an object store, run: 297 298 ```shell 299 pachctl restore --url s3://<path-to-backup>> 300 ``` 301 302 !!! note "See Also:" 303 - [Migrate Your Cluster](../migrations/)