github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.10.x/how-tos/export-data-out-pachyderm.md (about) 1 # Export Your Data From Pachyderm 2 3 After you build a pipeline, you probably want to see the 4 results that the pipeline has produced. Every commit into an 5 input repository results in a corresponding commit into an 6 output repository. 7 8 To access the results of 9 a pipeline, you can use one of the following methods: 10 11 * By running the `pachctl get file` command. This 12 command returns the contents of the specified file.<br> 13 To get the list of files in a repo, you should first 14 run the `pachctl list file` command. 15 See [Export Your Data with `pachctl`](#export-your-data-with-pachctl).<br> 16 17 * By configuring the pipeline. A pipeline can push or expose 18 output data to external sources. You can configure the following 19 data exporting methods in a Pachyderm pipeline: 20 21 * An `egress` property enables you to export your data to 22 an external datastore, such as Amazon S3, 23 Google Cloud Storage, and others.<br> 24 See [Export data by using `egress`](#export-your-data-with-egress).<br> 25 26 * A service. A Pachyderm service exposes the results of the 27 pipeline processing on a specific port in the form of a dashboard 28 or similar endpoint.<br> 29 See [Service](../concepts/pipeline-concepts/pipeline/service.md).<br> 30 31 * Configure your code to connect to an external data source. 32 Because a pipeline is a Docker container that runs your code, 33 you can egress your data to any data source, even to those that the 34 `egress` field does not support, by connecting to that source from 35 within your code. 36 37 * By using the S3 gateway. Pachyderm Enterprise users can reuse 38 their existing tools and libraries that work with object store 39 to export their data with the S3 gateway.<br> 40 See [Using the S3 Gateway](../../deploy-manage/manage/s3gateway/). 41 42 ## Export Your Data with `pachctl` 43 44 The `pachctl get file` command enables you to get the contents 45 of a file in a Pachyderm repository. You need to know the file 46 path to specify it in the command. 47 48 To export your data with pachctl: 49 50 1. Get the list of files in the repository: 51 52 ```shell 53 pachctl list file <repo>@<branch> 54 ``` 55 56 **Example:** 57 58 ```shell 59 pachctl list commit data@master 60 ``` 61 62 **System Response:** 63 64 ```shell 65 REPO BRANCH COMMIT PARENT STARTED DURATION SIZE 66 data master 230103d3c6bd45b483ab6d0b7ae858d5 f82b76f463ca4799817717a49ab74fac 2 seconds ago Less than a second 750B 67 data master f82b76f463ca4799817717a49ab74fac <none> 40 seconds ago Less than a second 375B 68 ``` 69 70 1. Get the contents of a specific file: 71 72 ```shell 73 pachctl get file <repo>@<branch>:<path/to/file> 74 ``` 75 76 **Example:** 77 78 ```shell 79 pachctl get file data@master:user_data.csv 80 ``` 81 82 **System Response:** 83 84 ```shell 85 1,cyukhtin0@stumbleupon.com,144.155.176.12 86 2,csisneros1@over-blog.com,26.119.26.5 87 3,jeye2@instagram.com,13.165.230.106 88 4,rnollet3@hexun.com,58.52.147.83 89 5,bposkitt4@irs.gov,51.247.120.167 90 6,vvenmore5@hubpages.com,161.189.245.212 91 7,lcoyte6@ask.com,56.13.147.134 92 8,atuke7@psu.edu,78.178.247.163 93 9,nmorrell8@howstuffworks.com,28.172.10.170 94 10,afynn9@google.com.au,166.14.112.65 95 ``` 96 97 Also, you can view the parent, grandparent, and any previous 98 revision by using the caret (`^`) symbol with a number that 99 corresponds to an ancestor in sequence: 100 101 * To view a parent of a commit: 102 103 1. List files in the parent commit: 104 105 ```shell 106 pachctl list commit <repo>@<branch-or-commit>^:<path/to/file> 107 ``` 108 109 1. Get the contents of a file: 110 111 ```shell 112 pachctl get file <repo>@<branch-or-commit>^:<path/to/file> 113 ``` 114 115 * To view an `<n>` parent of a commit: 116 117 1. List files in the parent commit: 118 119 ```shell 120 pachctl list commit <repo>@<branch-or-commit>^<n>:<path/to/file> 121 ``` 122 123 **Example:** 124 125 ```shell 126 NAME TYPE SIZE 127 /user_data.csv file 375B 128 ``` 129 130 1. Get the contents of a file: 131 132 ```shell 133 pachctl get file <repo>@<branch-or-commit>^<n>:<path/to/file> 134 ``` 135 136 **Example:** 137 138 ```shell 139 pachctl get file datas@master^4:user_data.csv 140 ``` 141 142 You can specify any number in the `^<n>` notation. If the file 143 exists in that commit, Pachyderm returns it. If the file 144 does not exist in that revision, Pachyderm displays the following 145 message: 146 147 ```shell 148 pachctl get file <repo>@<branch-or-commit>^<n>:<path/to/file> 149 ``` 150 151 **System Response:** 152 153 ```shell 154 file "<path/to/file>" not found 155 ``` 156 157 ## Export Your Data with `egress` 158 159 The `egress` field in the Pachyderm [pipeline specification](../reference/pipeline_spec.md) 160 enables you to push the results of a pipeline to an 161 external datastore such as Amazon S3, Google Cloud Storage, or 162 Azure Blob Storage. After the user code has finished running, but 163 before the job is marked as successful, Pachyderm pushes the data 164 to the specified destination. 165 166 You can specify the following `egress` protocols for the 167 corresponding storage: 168 169 | Cloud Platform | Protocol | Description | 170 | -------------- | -------- | ----------- | 171 | Google Cloud <br>Storage | `gs://` | GCP uses the utility called `gsutil` to access GCP storage resources <br> from a CLI. This utility uses the `gs://` prefix to access these resources. <br>**Example:**<br> `gs://gs-bucket/gs-dir` | 172 | Amazon S3 | `s3://` | The Amazon S3 storage protocol requires you to specify an `s3://`<br>prefix before the address of an Amazon resource. A valid address must <br>include an endpoint and a bucket, and, optionally, a directory in your <br>Amazon storage. <br>**Example:**<br> `s3://s3-endpoint/s3-bucket/s3-dir` | 173 | Azure Blob <br>Storage | `wasb://` | Microsoft Windows Azure Storage Blob (WASB) is the default Azure <br>filesystem that outputs your data through `HDInsight`. To output your <br>data to Azure Blob Storage, use the ``wasb://`` prefix, the container name, <br>and your storage account in the path to your directory. <br>**Example:**<br>`wasb://default-container@storage-account/az-dir` | 174 175 !!! example 176 ```json 177 "egress": { 178 "URL": "s3://bucket/dir" 179 }, 180 ```