github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.9.x/how-tos/export-data-out-pachyderm.md (about) 1 # Export Your Data From Pachyderm 2 3 After you build a pipeline, you probably want to see the 4 results that the pipeline has produced. Every commit into an 5 input repository results in a corresponding commit into an 6 output repository. 7 8 To access the results of 9 a pipeline, you can use one of the following methods: 10 11 * By running the `pachctl get file` command. This 12 command returns the contents of the specified file.<br> 13 To get the list of files in a repo, you should first 14 run the `pachctl list file` command. 15 See [Export Your Data with `pachctl`](#export-your-data-with-pachctl).<br> 16 17 * By configuring the pipeline. A pipeline can push or expose 18 output data to external sources. You can configure the following 19 data exporting methods in a Pachyderm pipeline: 20 21 * An `egress` property enables you to export your data to 22 an external datastore, such as Amazon S3, 23 Google Cloud Storage, and others.<br> 24 See [Export data by using `egress`](#export-your-data-with-egress).<br> 25 26 * A service. A Pachyderm service exposes the results of the 27 pipeline processing on a specific port in the form of a dashboard 28 or similar endpoint.<br> 29 See [Service](../concepts/pipeline-concepts/pipeline/service.md).<br> 30 31 * Configure your code to connect to an external data source. 32 Because a pipeline is a Docker container that runs your code, 33 you can egress your data to any data source, even to those that the 34 `egress` field does not support, by connecting to that source from 35 within your code. 36 37 * By using the S3 gateway. Pachyderm Enterprise users can reuse 38 their existing tools and libraries that work with object store 39 to export their data with the S3 gateway.<br> 40 See [Using the S3 Gateway](./s3gateway.md). 41 42 ## Export Your Data with `pachctl` 43 44 The `pachctl get file` command enables you to get the contents 45 of a file in a Pachyderm repository. You need to know the file 46 path to specify it in the command. 47 48 To export your data with pachctl: 49 50 1. Get the list of files in the repository: 51 52 ```shell 53 $ pachctl list file <repo>@<branch> 54 ``` 55 56 **Example:** 57 58 ```shell 59 $ pachctl list commit data@master 60 REPO BRANCH COMMIT PARENT STARTED DURATION SIZE 61 data master 230103d3c6bd45b483ab6d0b7ae858d5 f82b76f463ca4799817717a49ab74fac 2 seconds ago Less than a second 750B 62 data master f82b76f463ca4799817717a49ab74fac <none> 40 seconds ago Less than a second 375B 63 ``` 64 65 1. Get the contents of a specific file: 66 67 ```shell 68 pachctl get file <repo>@<branch>:<path/to/file> 69 ``` 70 71 **Example:** 72 73 ```shell 74 $ pachctl get file data@master:user_data.csv 75 1,cyukhtin0@stumbleupon.com,144.155.176.12 76 2,csisneros1@over-blog.com,26.119.26.5 77 3,jeye2@instagram.com,13.165.230.106 78 4,rnollet3@hexun.com,58.52.147.83 79 5,bposkitt4@irs.gov,51.247.120.167 80 6,vvenmore5@hubpages.com,161.189.245.212 81 7,lcoyte6@ask.com,56.13.147.134 82 8,atuke7@psu.edu,78.178.247.163 83 9,nmorrell8@howstuffworks.com,28.172.10.170 84 10,afynn9@google.com.au,166.14.112.65 85 ``` 86 87 Also, you can view the parent, grandparent, and any previous 88 revision by using the caret (`^`) symbol with a number that 89 corresponds to an ancestor in sequence: 90 91 * To view a parent of a commit: 92 93 1. List files in the parent commit: 94 95 ```shell 96 $ pachctl list commit <repo>@<branch-or-commit>^:<path/to/file> 97 ``` 98 99 1. Get the contents of a file: 100 101 ```shell 102 $ pachctl get file <repo>@<branch-or-commit>^:<path/to/file> 103 ``` 104 105 * To view an `<n>` parent of a commit: 106 107 1. List files in the parent commit: 108 109 ```shell 110 $ pachctl list commit <repo>@<branch-or-commit>^<n>:<path/to/file> 111 ``` 112 113 **Example:** 114 115 ```shell 116 NAME TYPE SIZE 117 /user_data.csv file 375B 118 ``` 119 120 1. Get the contents of a file: 121 122 ```shell 123 $ pachctl get file <repo>@<branch-or-commit>^<n>:<path/to/file> 124 ``` 125 126 **Example:** 127 128 ```shell 129 $ pachctl get file datas@master^4:user_data.csv 130 ``` 131 132 You can specify any number in the `^<n>` notation. If the file 133 exists in that commit, Pachyderm returns it. If the file 134 does not exist in that revision, Pachyderm displays the following 135 message: 136 137 ```shell 138 $ pachctl get file <repo>@<branch-or-commit>^<n>:<path/to/file> 139 file "<path/to/file>" not found 140 ``` 141 142 ## Export Your Data with `egress` 143 144 The `egress` field in the Pachyderm [pipeline specification](../reference/pipeline_spec.md) 145 enables you to push the results of a pipeline to an 146 external datastore such as Amazon S3, Google Cloud Storage, or 147 Azure Blob Storage. After the user code has finished running, but 148 before the job is marked as successful, Pachyderm pushes the data 149 to the specified destination. 150 151 You can specify the following `egress` protocols for the 152 corresponding storage: 153 154 | Cloud Platform | Protocol | Description | 155 | -------------- | -------- | ----------- | 156 | Google Cloud <br>Storage | `gs://` | GCP uses the utility called `gsutil` to access GCP storage resources <br> from a CLI. This utility uses the `gs://` prefix to access these resources. <br>**Example:**<br> `gs://gs-bucket/gs-dir` | 157 | Amazon S3 | `s3://` | The Amazon S3 storage protocol requires you to specify an `s3://`<br>prefix before the address of an Amazon resource. A valid address must <br>include an endpoint and a bucket, and, optionally, a directory in your <br>Amazon storage. <br>**Example:**<br> `s3://s3-endpoint/s3-bucket/s3-dir` | 158 | Azure Blob <br>Storage | `wasb://` | Microsoft Windows Azure Storage Blob (WASB) is the default Azure <br>filesystem that outputs your data through `HDInsight`. To output your <br>data to Azure Blob Storage, use the ``wasb://`` prefix, the container name, <br>and your storage account in the path to your directory. <br>**Example:**<br>`wasb://default-container@storage-account/az-dir` | 159 160 !!! example 161 ```json 162 "egress": { 163 "URL": "s3://bucket/dir" 164 }, 165 ```