github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.11.x/how-tos/use-transactions-to-run-multiple-commands.md (about) 1 # Use Transactions 2 3 !!! note "TL;DR" 4 Use transactions to run multiple Pachyderm commands 5 simultaneously in one job run. 6 7 A transaction is a Pachyderm operation that enables you to create 8 a collection of Pachyderm commands and execute them concurrently. 9 Regular Pachyderm operations, that are not in a transaction, are 10 executed one after another. However, when you need 11 to run multiple commands at the same time, you can use transactions. 12 This functionality is useful in particular for pipelines with multiple 13 inputs. If you need to update two or more input repos, you might not want 14 pipeline jobs for each state change. You can issue a transaction 15 to start commits in each of the input repos, which creates a single 16 downstream commit in the pipeline repo. After the transaction, you 17 can put files and finish the commits at will, and the pipeline job 18 will run once all the input commits have been finished. 19 20 ## Use Cases 21 22 Pachyderm users implement transactions to their own workflows finding 23 unique ways to benefit from this feature, whether it is a small 24 research team or an enterprise-grade machine learning workflow. 25 26 Below are examples of the most commonly employed ways of using transactions. 27 28 ### Commit to Separate Repositories Simultaneously 29 30 For example, you have a Pachyderm pipeline with two input 31 repositories. One repository includes training data and the 32 other `parameters` for your machine learning pipeline. If you need 33 to run specific data against specific parameters, you need to 34 run your pipeline against specific commits in both repositories. 35 To achieve this, you need to commit to these repositories 36 simultaneously. 37 38 If you use a regular Pachyderm workflow, the data is uploaded sequentially, 39 each time triggering a separate job instead of one job with both commits 40 of new data. One `put file` operation commits changes to 41 the data repository and the other updates the parameters repository. 42 The following animation shows the standard Pachyderm workflow without 43 a transaction: 44 45  46 47 In Pachyderm, a pipeline starts as soon as a new commit lands in 48 a repository. In the diagram above, as soon as `commit 1` is added 49 to the `data` repository, Pachyderm runs a job for `commit 1` and 50 `commit 0` in the `parameters` repository. You can also see 51 that Pachyderm runs the second job and processes `commit 1` 52 from the `data` repository with the `commit 1` in the `parameters` 53 repository. In some cases, this is perfectly acceptable solution. 54 But if your job takes many hours and you are only interested in the 55 result of the pipeline run with `commit 1` from both repositories, 56 this approach does not work. 57 58 With transactions, you can ensure that only one job triggers with 59 both the new `data` and `parameters`. The following animation 60 demonstrates how transactions work: 61 62  63 64 The transaction ensures that a single job runs for the two commits 65 that were started within the transaction. 66 While Pachyderm supports some workflows where you can get the 67 same effect by having both data and parameters in the same repo, 68 often separating them and using transactions is much more efficient for 69 organizational and performance reasons. 70 71 ### Switching from Staging to Master Simultaneously 72 73 If you are using [deferred processing](../../concepts/advanced-concepts/deferred_processing/) 74 in your repositories because you want to commit your changes frequently 75 without triggering jobs every time, then transactions can help you 76 manage deferred processing with multiple inputs. You commit your 77 changes to the staging branch and 78 when needed, switch the `HEAD` of you master branch to a commit in the 79 staging branch. To do this simultaneously, you can use transactions. 80 81 For example, you have two repositories `data` and `parameters`, both 82 of which have a `master` and `staging` branch. You commit your 83 changes to the staging branch while your pipeline is subscribed to the 84 master branch. To switch to these branches simultaneously, you can 85 use transactions like this: 86 87 ```shell 88 pachctl start transaction 89 ``` 90 91 **System Response:** 92 93 ```shell 94 Started new transaction: 0d6f0bc3-37a0-4936-96e3-82034a2a2055 95 pachctl pachctl create branch data@master --head staging 96 Added to transaction: 0d6f0bc3-37a0-4936-96e3-82034a2a2055 97 pachctl create branch parameters@master --head staging 98 Added to transaction: 0d6f0bc3-37a0-4936-96e3-82034a2a2055 99 pachctl finish transaction 100 Completed transaction with 2 requests: 0d6f0bc3-37a0-4936-96e3-82034a2a2055 101 ``` 102 103 When you finish the transaction, both repositories switch to 104 to the master branch at the same time which triggers one job to process 105 those commits together. 106 107 ## Start and Finish Transactions 108 109 To start a transaction, run the following command: 110 111 ```shell 112 pachctl start transaction 113 ``` 114 115 **System Response:** 116 117 ```shell 118 Started new transaction: 7a81eab5-e6c6-430a-a5c0-1deb06852ca5 119 ``` 120 121 This command generates a transaction object in the cluster and saves 122 its ID in the local Pachyderm configuration file. By default, this file 123 is stored at `~/.pachyderm/config.json`. 124 125 !!! example 126 ```json hl_lines="9" 127 { 128 "user_id": "b4fe4317-be21-4836-824f-6661c68b8fba", 129 "v2": { 130 "active_context": "local-2", 131 "contexts": { 132 "default": {}, 133 "local-2": { 134 "source": 3, 135 "active_transaction": "7a81eab5-e6c6-430a-a5c0-1deb06852ca5", 136 "cluster_name": "minikube", 137 "auth_info": "minikube", 138 "namespace": "default" 139 }, 140 ``` 141 142 After you start a transaction, you can add supported commands, such 143 as `pachctl create repo`, `pachctl create branch`, and so on, to the 144 transaction. All commands that are performed in a transaction are 145 queued up and not executed against the actual cluster until you finish 146 the transaction. When you finish the transaction, all queued command 147 are executed atomically. 148 149 To finish a transaction, run: 150 151 ```shell 152 pachctl finsh transaction 153 ``` 154 155 **System Response:** 156 157 ```shell 158 Completed transaction with 1 requests: 7a81eab5-e6c6-430a-a5c0-1deb06852ca5 159 ``` 160 161 ## Other Transaction Commands 162 Other supporting commands for transactions include the following commands: 163 164 | Command | Description | 165 | ------------ | ----------- | 166 | `pachctl list transaction` | List all unfinished transactions available in the Pachyderm cluster. | 167 | `pachctl stop transaction` | Remove the currently active transaction from the local Pachyderm config file. The transaction remains in the Pachyderm cluster and can be resumed later. | 168 | `pachctl resume transaction` | Set an already-existing transaction as the active transaction in the local Pachyderm config file. | 169 | `pachctl delete transaction` | Deletes a transaction from the Pachyderm cluster. | 170 | `pachctl inspect transaction` | Provides detailed information about an existing transaction, including which operations it will perform. By default, displays information about the current transaction. If you specify a transaction ID, displays information about the corresponding transaction. | 171 172 ## Supported Operations 173 174 While there is a transaction object in the Pachyderm configuration 175 file, all supported API requests append the request to the 176 transaction instead of running directly. These supported commands include: 177 178 ```shell 179 create repo 180 delete repo 181 start commit 182 finish commit 183 delete commit 184 create branch 185 delete branch 186 ``` 187 188 Each time you add a command to a transaction, Pachyderm validates the 189 transaction against the current state of the cluster metadata and obtains 190 any return values, which is important for such commands as 191 `start commit`. If validation fails for any reason, Pachyderm does 192 not add the operation to the transaction. If the transaction has been 193 invalidated by changing the cluster state, you must delete the transaction 194 and start over, taking into account the new state of the cluster. 195 From a command-line perspective, these commands work identically within 196 a transaction as without. The only difference is that you do not apply 197 your changes until you run `finish transaction`, and a message that 198 Pachyderm logs to `stderr` to indicate that the command was placed 199 in a transaction rather than run directly. 200 201 ## Multiple Opened Transactions 202 203 Some systems have a notion of *nested* transactions. That is when you 204 open transactions within an already opened transaction. In such systems, the 205 operations added to the subsequent transactions are not executed 206 until all the nested transactions and the main transaction are closed. 207 208 Pachyderm does not support such behavior. Instead, when you open a 209 transaction, the transaction ID is written to the Pachyderm configuration 210 file. If you begin another transaction while the first one is open, Pachyderm 211 returns an error. 212 213 Every time you add a command to a transaction, 214 Pachyderm creates a blueprint of the commit and verifies that the 215 command is valid. However, one transaction can invalidate another. 216 In this case, a transaction that is closed first takes precedence 217 over the other. For example, if two transactions create a repository 218 with the same name, the one that is executed first results in the 219 creation of the repository, and the other results in error. 220 221 !!! tip 222 While you cannot use `pachctl put file` in a transaction, you can 223 start a commit within a transaction, finish the transation, 224 then put as many files as you need, and then finish your commit. 225 Your changes will only be applied in one batch when you close 226 the commit. 227 228 To get a better understanding of how transactions work in practice, try 229 [Use Transactions with Hyperparameter Tuning](https://github.com/pachyderm/pachyderm/tree/master/examples/transactions/). 230