github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.9.x/concepts/data-concepts/file.md (about) 1 # File 2 3 A file is a Unix filesystem object, which is a directory or 4 file, that stores data. Unlike source code 5 version-control systems that are most suitable for storing plain text 6 files, you can store any type of file in Pachyderm, including 7 binary files. Often, data scientists operate with 8 comma-separated values (CSV), JavaScript Object Notation (JSON), 9 images, and other plain text and binary file 10 formats. Pachyderm supports all file sizes and formats and applies 11 storage optimization techniques, such as deduplication, in the 12 background. 13 14 To upload your files to a Pachyderm repository, run the 15 `pachctl put file` command. By using the `pachctl put file` 16 command, you can put both files and directories into a Pachyderm repository. 17 18 ## File Processing Strategies 19 20 Pachyderm provides the following file processing strategies: 21 22 **Appending files** 23 : By default, when you put a file into a Pachyderm repository and a 24 file by the same name already exists in the repo, Pachyderm appends 25 the new data to the existing file. 26 For example, you have an `A.csv` file in a repository. If you upload the 27 same file to that repository, Pachyderm *appends* the data to the existing 28 file, which results in the `A.csv` file having twice the data from its 29 original size. 30 31 !!! example 32 33 1. View the list of files: 34 35 ```shell 36 $ pachctl list file images@master 37 NAME TYPE SIZE 38 /A.csv file 258B 39 ``` 40 41 1. Add the `A.csv` file once again: 42 43 ```shell 44 $ pachctl put file images@master -f A.csv 45 ``` 46 47 1. Verify that the file has doubled in size: 48 49 ```shell 50 $ pachctl list file images@master 51 NAME TYPE SIZE 52 /A.csv file 516B 53 ``` 54 55 **Overwriting files** 56 : When you enable the overwrite mode by using the `--overwrite` 57 flag or `-o`, the file replaces the existing file instead of appending to 58 it. For example, you have an `A.csv` file in the `images` repository. 59 If you upload the same file to that repository with the 60 `--overwrite` flag, Pachyderm *overwrites* the whole file. 61 62 !!! example 63 64 1. View the list of files: 65 66 ```shell 67 $ pachctl list file images@master 68 NAME TYPE SIZE 69 /A.csv file 258B 70 ``` 71 72 1. Add the `A.csv` file once again: 73 74 ```shell 75 $ pachctl put file --overwrite images@master -f A.csv 76 ``` 77 78 1. Check the file size: 79 80 ```shell 81 $ pachctl list file images@master 82 NAME TYPE SIZE 83 /A.csv file 258B 84 ```