github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.11.x/reference/pachctl/pachctl_garbage-collect.md (about) 1 ## pachctl garbage-collect 2 3 Garbage collect unused data. 4 5 ### Synopsis 6 7 Garbage collect unused data. 8 9 When a file/commit/repo is deleted, the data is not immediately removed from 10 the underlying storage system (e.g. S3) for performance and architectural 11 reasons. This is similar to how when you delete a file on your computer, the 12 file is not necessarily wiped from disk immediately. 13 14 To actually remove the data, you will need to manually invoke garbage 15 collection with "pachctl garbage-collect". 16 17 Currently "pachctl garbage-collect" can only be started when there are no 18 pipelines running. You also need to ensure that there's no ongoing "put file". 19 Garbage collection puts the cluster into a readonly mode where no new jobs can 20 be created and no data can be added. 21 22 Pachyderm's garbage collection uses bloom filters to index live objects. This 23 means that some dead objects may erronously not be deleted during garbage 24 collection. The probability of this happening depends on how many objects you 25 have; at around 10M objects it starts to become likely with the default values. 26 To lower Pachyderm's error rate and make garbage-collection more comprehensive, 27 you can increase the amount of memory used for the bloom filters with the 28 --memory flag. The default value is 10MB. 29 30 31 ``` 32 pachctl garbage-collect [flags] 33 ``` 34 35 ### Options 36 37 ``` 38 -h, --help help for garbage-collect 39 -m, --memory string The amount of memory to use during garbage collection. Default is 10MB. (default "0") 40 ``` 41 42 ### Options inherited from parent commands 43 44 ``` 45 --no-color Turn off colors. 46 -v, --verbose Output verbose logs 47 ``` 48