github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.10.x/concepts/data-concepts/file.md (about)

     1  # File
     2  
     3  A file is a Unix filesystem object, which is a directory or
     4  file, that stores data. Unlike source code
     5  version-control systems that are most suitable for storing plain text
     6  files, you can store any type of file in Pachyderm, including
     7  binary files. Often, data scientists operate with
     8  comma-separated values (CSV), JavaScript Object Notation (JSON),
     9  images, and other plain text and binary file
    10  formats. Pachyderm supports all file sizes and formats and applies
    11  storage optimization techniques, such as deduplication, in the
    12  background.
    13  
    14  To upload your files to a Pachyderm repository, run the
    15  `pachctl put file` command. By using the `pachctl put file`
    16  command, you can put both files and directories into a Pachyderm repository.
    17  
    18  ## File Processing Strategies
    19  
    20  Pachyderm provides the following file processing strategies:
    21  
    22  **Appending files**
    23  :   By default, when you put a file into a Pachyderm repository and a
    24      file by the same name already exists in the repo, Pachyderm appends
    25      the new data to the existing file.
    26      For example, you have an `A.csv` file in a repository. If you upload the
    27      same file to that repository, Pachyderm *appends* the data to the existing
    28      file, which results in the `A.csv` file having twice the data from its
    29      original size.
    30  
    31  !!! example
    32  
    33      1. View the list of files:
    34  
    35         ```shell
    36         pachctl list file images@master
    37         ```
    38  
    39         **System Response:**
    40  
    41         ```shell
    42         NAME   TYPE SIZE
    43         /A.csv file 258B
    44         ```
    45  
    46      1. Add the `A.csv` file once again:
    47  
    48         ```shell
    49         pachctl put file images@master -f A.csv
    50         ```
    51  
    52      1. Verify that the file has doubled in size:
    53  
    54         ```shell
    55         pachctl list file images@master
    56         ```
    57  
    58         **System Response:**
    59  
    60         ```shell
    61         NAME   TYPE SIZE
    62         /A.csv file 516B
    63         ```
    64  
    65  **Overwriting files**
    66  :   When you enable the overwrite mode by using the `--overwrite`
    67      flag or `-o`, the file replaces the existing file instead of appending to
    68      it. For example, you have an `A.csv` file in the `images` repository.
    69      If you upload the same file to that repository with the
    70      `--overwrite` flag, Pachyderm *overwrites* the whole file.
    71  
    72  !!! example
    73  
    74      1. View the list of files:
    75  
    76         ```shell
    77         pachctl list file images@master
    78         ```
    79  
    80         **System Response:**
    81  
    82         ```shell
    83         NAME   TYPE SIZE
    84         /A.csv file 258B
    85         ```
    86  
    87      1. Add the `A.csv` file once again:
    88  
    89         ```shell
    90         pachctl put file --overwrite images@master -f A.csv
    91         ```
    92  
    93      1. Check the file size:
    94  
    95         ```shell
    96         pachctl list file images@master
    97         ```
    98  
    99         **System Response:**
   100  
   101         ```shell
   102         NAME   TYPE SIZE
   103         /A.csv file 258B
   104         ```