github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.11.x/concepts/data-concepts/file.md (about) 1 # File 2 3 A file is a Unix filesystem object, which is a directory or 4 file, that stores data. Unlike source code 5 version-control systems that are most suitable for storing plain text 6 files, you can store any type of file in Pachyderm, including 7 binary files. Often, data scientists operate with 8 comma-separated values (CSV), JavaScript Object Notation (JSON), 9 images, and other plain text and binary file 10 formats. Pachyderm supports all file sizes and formats and applies 11 storage optimization techniques, such as deduplication, in the 12 background. 13 14 To upload your files to a Pachyderm repository, run the 15 `pachctl put file` command. By using the `pachctl put file` 16 command, you can put both files and directories into a Pachyderm repository. 17 18 ## File Processing Strategies 19 20 Pachyderm provides the following file processing strategies: 21 22 **Appending files** 23 : By default, when you put a file into a Pachyderm repository and a 24 file by the same name already exists in the repo, Pachyderm appends 25 the new data to the existing file. 26 For example, you have an `A.csv` file in a repository. If you upload the 27 same file to that repository, Pachyderm *appends* the data to the existing 28 file, which results in the `A.csv` file having twice the data from its 29 original size. 30 31 !!! example 32 33 1. View the list of files: 34 35 ```shell 36 pachctl list file images@master 37 ``` 38 39 **System Response:** 40 41 ```shell 42 NAME TYPE SIZE 43 /A.csv file 258B 44 ``` 45 46 1. Add the `A.csv` file once again: 47 48 ```shell 49 pachctl put file images@master -f A.csv 50 ``` 51 52 1. Verify that the file has doubled in size: 53 54 ```shell 55 pachctl list file images@master 56 ``` 57 58 **System Response:** 59 60 ```shell 61 NAME TYPE SIZE 62 /A.csv file 516B 63 ``` 64 65 **Overwriting files** 66 : When you enable the overwrite mode by using the `--overwrite` 67 flag or `-o`, the file replaces the existing file instead of appending to 68 it. For example, you have an `A.csv` file in the `images` repository. 69 If you upload the same file to that repository with the 70 `--overwrite` flag, Pachyderm *overwrites* the whole file. 71 72 !!! example 73 74 1. View the list of files: 75 76 ```shell 77 pachctl list file images@master 78 ``` 79 80 **System Response:** 81 82 ```shell 83 NAME TYPE SIZE 84 /A.csv file 258B 85 ``` 86 87 1. Add the `A.csv` file once again: 88 89 ```shell 90 pachctl put file --overwrite images@master -f A.csv 91 ``` 92 93 1. Check the file size: 94 95 ```shell 96 pachctl list file images@master 97 ``` 98 99 **System Response:** 100 101 ```shell 102 NAME TYPE SIZE 103 /A.csv file 258B 104 ```