github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.10.x/deploy-manage/manage/s3gateway/supported-operations.md (about)

     1  # Supported Operations
     2  
     3  The Pachyderm S3 gateway supports the following operations:
     4  
     5  * Create buckets: Creates a repo and branch.
     6  * Delete buckets: Deletes a branch or a repo with all branches.
     7  * List buckets: Lists all branches on all repos as S3 buckets.
     8  * Write objects: Atomically overwrites a file on a branch.
     9  * Remove objects: Atomically removes a file on a branch.
    10  * List objects: Lists the files in the HEAD of a branch.
    11  * Get objects: Gets file contents on a branch.
    12  
    13  ## List Filesystem Objects
    14  
    15  If you have configured your S3 client correctly, you should be
    16  able to see the list of filesystem objects in your Pachyderm
    17  repository by running an S3 client `ls` command.
    18  To list filesystem objects, complete the following steps:
    19  
    20  1. Verify that your S3 client can access all of your Pachyderm repositories:
    21  
    22     * If you are using MinIO, type:
    23  
    24       ```shell
    25       mc ls local
    26       ```
    27  
    28       **System Response:**
    29  
    30       ```
    31       [2019-07-12 15:09:50 PDT]      0B master.train/
    32       [2019-07-12 14:58:50 PDT]      0B master.pre_process/
    33       [2019-07-12 14:58:09 PDT]      0B master.split/
    34       [2019-07-12 14:58:09 PDT]      0B stats.split/
    35       [2019-07-12 14:36:27 PDT]      0B master.raw_data/
    36       ```
    37  
    38     * If you are using AWS, type:
    39  
    40       ```shell
    41       aws --endpoint-url http://localhost:30600 s3 ls
    42       ```
    43  
    44       **System Response:**
    45  
    46       ```
    47       2019-07-12 15:09:50 master.train
    48       2019-07-12 14:58:50 master.pre_process
    49       2019-07-12 14:58:09 master.split
    50       2019-07-12 14:58:09 stats.split
    51       2019-07-12 14:36:27 master.raw_data
    52       ```
    53  
    54  1. List the contents of a repository:
    55  
    56     * If you are using MinIO, type:
    57  
    58       ```shell
    59       mc ls local/master.raw_data
    60       ```
    61  
    62       **System Response:**
    63  
    64       ```
    65       [2019-07-19 12:11:37 PDT]  2.6MiB github_issues_medium.csv
    66       ```
    67  
    68     * If you are using AWS, type:
    69  
    70       ```shell
    71       aws --endpoint-url http://localhost:30600/ s3 ls s3://master.raw_data
    72       ```
    73  
    74       **System Response:**
    75  
    76       ```
    77       2019-07-26 11:22:23    2685061 github_issues_medium.csv
    78       ```
    79  
    80  ## Create an S3 Bucket
    81  
    82  You can create an S3 bucket in Pachyderm by using the AWS CLI or
    83  the MinIO client commands.
    84  The S3 bucket that you create is a branch in a repository
    85  in Pachyderm.
    86  
    87  To create an S3 bucket, complete the following steps:
    88  
    89  1. Use a corresponding command below to create a new
    90  S3 bucket, which is a repository with a branch in Pachyderm.
    91  
    92     * If you are using MinIO, type:
    93  
    94       ```shell
    95       mc mb local/master.test
    96       ```
    97  
    98       **System Response:**
    99  
   100       ```
   101       Bucket created successfully `local/master.test`.
   102       ```
   103  
   104     * If you are using AWS, type:
   105  
   106       ```shell
   107       aws --endpoint-url http://localhost:30600/ s3 mb s3://master.test
   108       ```
   109  
   110       **System Response:**
   111  
   112       ```
   113       make_bucket: master.test
   114       ```
   115  
   116  1. Verify that the S3 bucket has been successfully created:
   117  
   118     * If you are using MinIO, type:
   119  
   120       ```shell
   121       mc ls local
   122       ```
   123  
   124       **System Response:**
   125  
   126       ```
   127       [2019-07-18 13:32:44 PDT]      0B master.test/
   128       [2019-07-12 15:09:50 PDT]      0B master.train/
   129       [2019-07-12 14:58:50 PDT]      0B master.pre_process/
   130       [2019-07-12 14:58:09 PDT]      0B master.split/
   131       [2019-07-12 14:58:09 PDT]      0B stats.split/
   132       [2019-07-12 14:36:27 PDT]      0B master.raw_data/
   133       ```
   134  
   135     * If you are using AWS, type:
   136  
   137       ```shell
   138       aws --endpoint-url http://localhost:30600/ s3 ls
   139       ```
   140  
   141       **System Response:**
   142  
   143       ```
   144       2019-07-26 11:35:28 master.test
   145       2019-07-12 14:58:50 master.pre_process
   146       2019-07-12 14:58:09 master.split
   147       2019-07-12 14:58:09 stats.split
   148       2019-07-12 14:36:27 master.raw_data
   149       ```
   150  
   151     * You can also use the `pachctl list repo` command to view the
   152     list of repositories:
   153  
   154       ```shell
   155       pachctl list repo
   156       ```
   157  
   158       **System Response:**
   159  
   160       ```
   161       NAME               CREATED                    SIZE (MASTER)
   162       test               About an hour ago          0B
   163       train              6 days ago                 68.57MiB
   164       pre_process        6 days ago                 1.18MiB
   165       split              6 days ago                 1.019MiB
   166       raw_data           6 days ago                 2.561MiB
   167       ```
   168  
   169       You should see the newly created repository in this list.
   170  
   171  ### Delete an S3 Bucket
   172  
   173  You can delete an empty S3 bucket in Pachyderm by running a corresponding
   174  command for your S3 client. The bucket must be completely empty.
   175  
   176  To remove an S3 bucket, run one of the following commands:
   177  
   178  * If you are using MinIO, type:
   179  
   180    ```shell
   181    mc rb local/master.test
   182    ```
   183  
   184    **System Response:**
   185  
   186    ```
   187    Removed `local/master.test` successfully.
   188    ```
   189  
   190  * If you are using AWS, type:
   191  
   192    ```shell
   193    aws --endpoint-url http://localhost:30600/ s3 rb s3://master.test
   194    ```
   195  
   196    **System Response:**
   197  
   198    ```
   199    remove_bucket: master.test
   200    ```
   201  
   202  ## Upload and Download File Objects
   203  
   204  For input repositories at the top of your DAG, you can both add files
   205  to and download files from the repository.
   206  
   207  Not all the repositories that you see in the results of the `ls` command are
   208  input repositories that can be written to. Some of them might be read-only
   209  output repos. Check your pipeline specification to verify which
   210  repositories are the input repos.
   211  
   212  To add a file to a repository, complete the following steps:
   213  
   214  1. Run the `cp` command for your S3 client:
   215  
   216     * If you are using MinIO, type:
   217  
   218       ```shell
   219       mc cp test.csv local/master.raw_data/test.csv
   220       ```
   221  
   222       **System Response:**
   223  
   224       ```
   225       test.csv:                  62 B / 62 B  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓  100.00% 206 B/s 0s
   226       ```
   227  
   228     * If you are using AWS, type:
   229  
   230       ```shell
   231       aws --endpoint-url http://localhost:30600/ s3 cp test.csv s3://master.raw_data
   232       ```
   233  
   234       **System Response:**
   235  
   236       ```
   237       upload: ./test.csv to s3://master.raw_data/test.csv
   238       ```
   239  
   240     These commands add the `test.csv` file to the `master` branch in
   241     the `raw_data` repository. `raw_data` is an input repository.
   242  
   243  1. Check that the file was added:
   244  
   245     * If you are using MinIO, type:
   246  
   247       ```shell
   248       mc ls local/master.raw_data
   249       ```
   250  
   251       **System Response:**
   252  
   253       ```
   254       [2019-07-19 12:11:37 PDT]  2.6MiB github_issues_medium.csv
   255       [2019-07-19 12:11:37 PDT]     62B test.csv
   256       ```
   257  
   258     * If you are using AWS, type:
   259  
   260       ```shell
   261       aws --endpoint-url http://localhost:30600/ s3 ls s3://master.raw_data/
   262       ```
   263  
   264       **System Response:**
   265  
   266       ```
   267       2019-07-19 12:11:37  2685061 github_issues_medium.csv
   268       2019-07-19 12:11:37       62 test.csv
   269       ```
   270  
   271  1. Download a file from MinIO to the
   272  current directory by running the following commands:
   273  
   274     * If you are using MinIO, type:
   275  
   276       ```shell
   277       mc cp local/master.raw_data/github_issues_medium.csv .
   278       ```
   279  
   280       **System Response:**
   281  
   282       ```
   283       ...hub_issues_medium.csv:  2.56 MiB / 2.56 MiB  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 100.00% 1.26 MiB/s 2s
   284       ```
   285  
   286     * If you are using AWS, type:
   287  
   288       ```
   289       aws --endpoint-url http://localhost:30600/ s3 cp s3://master.raw_data/test.csv .
   290       ```
   291  
   292       **System Response:**
   293  
   294       ```
   295       download: s3://master.raw_data/test.csv to ./test.csv
   296       ```
   297  
   298  ## Remove a File Object
   299  
   300  You can delete a file in the `HEAD` of a Pachyderm branch by using the
   301  MinIO command-line interface:
   302  
   303  1. List the files in the input repository:
   304  
   305     * If you are using MinIO, type:
   306  
   307       ```shell
   308       mc ls local/master.raw_data/
   309       ```
   310  
   311       **System Response:**
   312  
   313       ```
   314       [2019-07-19 12:11:37 PDT]  2.6MiB github_issues_medium.csv
   315       [2019-07-19 12:11:37 PDT]     62B test.csv
   316       ```
   317  
   318     * If you are using AWS, type:
   319  
   320       ```shell
   321       aws --endpoint-url http://localhost:30600/ s3 ls s3://master.raw_data
   322       ```
   323  
   324       **System Response:**
   325  
   326       ```
   327       2019-07-19 12:11:37    2685061 github_issues_medium.csv
   328       2019-07-19 12:11:37         62 test.csv
   329       ```
   330  
   331  1. Delete a file from a repository. Example:
   332  
   333     * If you are using MinIO, type:
   334  
   335       ```shell
   336       mc rm local/master.raw_data/test.csv
   337       ```
   338  
   339       **System Response:**
   340  
   341       ```
   342       Removing `local/master.raw_data/test.csv`.
   343       ```
   344  
   345     * If you are using AWS, type:
   346  
   347       ```shell
   348       aws --endpoint-url http://localhost:30600/ s3 rm s3://master.raw_data/test.csv
   349       ```
   350  
   351       **System Response:**
   352  
   353       ```
   354       delete: s3://master.raw_data/test.csv
   355       ```