github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.9.x/concepts/data-concepts/file.md (about)

     1  # File
     2  
     3  A file is a Unix filesystem object, which is a directory or
     4  file, that stores data. Unlike source code
     5  version-control systems that are most suitable for storing plain text
     6  files, you can store any type of file in Pachyderm, including
     7  binary files. Often, data scientists operate with
     8  comma-separated values (CSV), JavaScript Object Notation (JSON),
     9  images, and other plain text and binary file
    10  formats. Pachyderm supports all file sizes and formats and applies
    11  storage optimization techniques, such as deduplication, in the
    12  background.
    13  
    14  To upload your files to a Pachyderm repository, run the
    15  `pachctl put file` command. By using the `pachctl put file`
    16  command, you can put both files and directories into a Pachyderm repository.
    17  
    18  ## File Processing Strategies
    19  
    20  Pachyderm provides the following file processing strategies:
    21  
    22  **Appending files**
    23  :   By default, when you put a file into a Pachyderm repository and a
    24      file by the same name already exists in the repo, Pachyderm appends
    25      the new data to the existing file.
    26      For example, you have an `A.csv` file in a repository. If you upload the
    27      same file to that repository, Pachyderm *appends* the data to the existing
    28      file, which results in the `A.csv` file having twice the data from its
    29      original size.
    30  
    31  !!! example
    32  
    33      1. View the list of files:
    34  
    35         ```shell
    36         $ pachctl list file images@master
    37         NAME   TYPE SIZE
    38         /A.csv file 258B
    39         ```
    40  
    41      1. Add the `A.csv` file once again:
    42  
    43         ```shell
    44         $ pachctl put file images@master -f A.csv
    45         ```
    46  
    47      1. Verify that the file has doubled in size:
    48  
    49         ```shell
    50         $ pachctl list file images@master
    51         NAME   TYPE SIZE
    52         /A.csv file 516B
    53         ```
    54  
    55  **Overwriting files**
    56  :   When you enable the overwrite mode by using the `--overwrite`
    57      flag or `-o`, the file replaces the existing file instead of appending to
    58      it. For example, you have an `A.csv` file in the `images` repository.
    59      If you upload the same file to that repository with the
    60      `--overwrite` flag, Pachyderm *overwrites* the whole file.
    61  
    62  !!! example
    63  
    64      1. View the list of files:
    65  
    66         ```shell
    67         $ pachctl list file images@master
    68         NAME   TYPE SIZE
    69         /A.csv file 258B
    70         ```
    71  
    72      1. Add the `A.csv` file once again:
    73  
    74         ```shell
    75         $ pachctl put file --overwrite images@master -f A.csv
    76         ```
    77  
    78      1. Check the file size:
    79  
    80         ```shell
    81         $ pachctl list file images@master
    82         NAME   TYPE SIZE
    83         /A.csv file 258B
    84         ```