github.com/pachyderm/pachyderm@v1.13.4/CHANGELOG.md (about)

     1  # Changelog
     2  
     3  ## 1.13.4
     4  - Adds support to retry download of a partially retrieved file in the `pachctl get file --retry` (#6702)
     5  - Fixes a bug that ignored containers’ default working directories when docker is not used (#6662)
     6  - Fixes a bug with multiple pachyderm deployments in the same cluster (#6656)
     7  - Fixes a bug that did not set IDE namespace and also add a deploy option `--namespace` to specify a namespace to deploy (#6448)
     8  - Fixes couple of bugs with multipart s3 upload (#6447)
     9  
    10  ## 1.13.3
    11  - Adds support to list files at a commit via S3 Gateway (#6293)
    12  - Fixes a bug that would crash pachd when writing a file larger than the requested memory (#6281)
    13  - Fixes a bug where pipelines could not be updated or deleted due to revoked auth tokens (#6276)
    14  - Fixes a bug that prevented the collection of metrics (#6266)
    15  - Fixes a bug that did not check for metrics (enable/disable) state in workers (#6225)
    16  
    17  ## 1.13.2
    18  
    19  - Fixes a bug that causes pipeline master to block after losing connection to etcd (#6042)
    20  - Fixes a bug that failed initialization if pachd was not run as root (#6065)
    21  - Fixes a bug that failed to run pipelines after few hours of operation (#6083)
    22  
    23  ## 1.13.1
    24  
    25  - Fixes a bug that would fail enterprise check when autoscaling is enabled (#6008)
    26  - Changes to increase the maximum size of an object that can be uploaded in a single request to the s3 gateway. This is the recommended workaround for issues with multipart uploads to output repos (#6005)
    27  - Fixes a bug that would cause high scheduling latency for goroutines (#5973)
    28  - Fixes a bug that would limit the number of writes handled in S3 gateway (#5956)
    29  
    30  ## 1.13.0
    31  Deprecation notice: The following pachctl deploy flags are deprecated and will be removed in a future release. Deprecated flags: dash-image, dashboard-only, no-dashboard, expose-object-api, storage-v2, shards, no-rbac, no-guaranteed, static-etcd-volume, disable-ssl, max-upload-parts, no-verify-ssl, obj-log-options, part-size, retries, reverse, timeout, upload-acl
    32  
    33  - [Security] Adds required authentication for various API calls (#5582) (#5577) (#5575)
    34  - Adds a new flag `--status-only` to improve performance of `list datum` command (#5935)
    35  - Fixes a bug with recursive put file from pachctl and improves the performance of put file in general (#5922)
    36  - Changes to shorten `prometheus-metrics` to `prom-metrics`, in order to meet length limitation (#5912)
    37  - Add version labels to pachyderm docker images (#5909)
    38  - Allow pipelines that do not skip datums, for trigger, deployment, or other side-effecting pipelines (#5871) (#5920)
    39  - Add libgl to the Python build image (#5855)
    40  - Add deprecation warnings for uncommonly-used pachctl deploy flags (#5848)
    41  - Fixes a bug that caused pach worker pods to stack trace (#5842)
    42  - Fixes a bug that caused pachd to stack trace when under heavy load (#5831)
    43  - Changes to improve ListPipeline performance when it returns many pipelines. (#5830)
    44  - Fixes a bug that caused pachd to crash on some incorrect glob patterns (#5812)
    45  - Added support to `fsck` to fix provenance relationships not mirrored by subvenance relationships and vice versa (#5782)
    46  - Fixes a bug that caused the `since` field to not propagate to Loki for some `logs` calls. (#5777)
    47  - Fixed a bug that causes panic in GetLogs when `since` has not been specified (#5769)
    48  - Changes to switches to an inode generation scheme to work around the reserved inode issues which prevent `pachctl mount` from succeeding. (#5766)
    49  - Fixes a bug that would cause a job merge to hang when the job output metadata is not cached in the cluster (#5754)
    50  - Fixes a bug that causes pachd to crash when collecting metrics (#5752)
    51  - Changes to improve performance of file downloads and egress (#5744)
    52  - Adds support for autoscaling pipelines which will more efficiently use resources and mitigate stragglers. (#5738) (#5923)
    53  - Fixed an issue where datums were ordered by size, causing workers to process large datums together. We have changed how work is distributed so straggling datums will be much less common. (#5738)
    54  - Fixes a bug that causes pachctl commands to hang when metrics were disabled (#5724)
    55  - Changes to capture previous logs in debug dump (#5723)
    56  - Adds support for additional metrics (#5713)
    57  - Fixes a bug that would crash when email verified claim is not set by OIDC provider (#5709)
    58  - Fixes a bug that prevents InitContainer from initializing if pipelines are already running  (#5701)
    59  - Adds support for services without ports set (#5691)
    60  - Fixes a bug that causes intermittent pachd crashes (#5690)
    61  - Fixes a bug that does not return objects with paths that have a leading slash in S3 gateway requests (#5679)
    62  - Fixes a bug that causes update-pipeline to time out in some cases (#5661)
    63  - Added support to show progress bars during downloads (#5654)
    64  - Added support to expose Prometheus-metrics ports (#5646)
    65  - Fixes a bug that can deadlock in listfile (#5638)
    66  - Added support to capture commit and job info in debug dump (#5619)
    67  - Changes to improve the performance of file upload in spout pipelines (#5613)
    68  - Changes to improve the performance of reading output repo metadata (#5609)
    69  - Changes to improve the performance for repos with a large number of files (#5600)
    70  - Fixes a bug that prevented the creation of build pipeline when auth is enabled (#5594)
    71  - Fixes several issues with logging, specifically with the Loki backend. Adds support for getting logs since a particular time. (#5438)
    72  
    73  ## 1.12.5
    74  Deprecation notice: Deprecating the use of vault plugin. It will be removed from the code in a future release.
    75  
    76  - Changes to switches to an inode generation scheme to work around the reserved inode issues which prevent `pachctl mount` from succeeding. (#5766)
    77  - Fixed a bug that causes panic in GetLogs when `since` has not been specified (#5769)
    78  - Fixes a bug that caused the `since` field to not propagate to Loki for some `logs` calls. (#5777)
    79  - Added support to `fsck` to fix provenance relationships not mirrored by subvenance relationships and vice versa (#5782)
    80  - Fixes a bug that caused pachd to crash on some incorrect glob patterns (#5812)
    81  - Changes to improve ListPipeline performance when it returns many pipelines. (#5830)
    82  
    83  ## 1.12.4
    84  
    85  - Changes to capture previous logs in debug dump (#5723)
    86  - Fixes a bug that causes pachctl commands to hang when metrics were disabled (#5724)
    87  - Changes to improve performance of file downloads and egress (#5744)
    88  - Fixes a bug that causes pachd to crash when collecting metrics (#5752)
    89  - Fixes a bug that would cause a job merge to hang when the job output metadata is not cached in the cluster (#5754)
    90  
    91  
    92  ## 1.12.3
    93  - Fixes a bug that does not return objects with paths that have a leading slash in S3 gateway requests (#5679)
    94  - Fixes a bug that causes intermittent pachd crashes (#5690)
    95  - Adds support for services without ports set (#5691)
    96  - Fixes a bug that prevents InitContainer from initializing if pipelines are already running  (#5701)
    97  - Fixes a bug that would crash when email verified claim is not set by OIDC provider (#5709)
    98  - Adds support for additional metrics (#5713)
    99  
   100  ## 1.12.2
   101  
   102  - Fixes several issues with logging, specifically with the Loki backend. Adds support for getting logs since a particular time. (#5438)
   103  - Fixes a bug that can deadlock in listfile (#5638)
   104  - Added support to expose Prometheus-metrics ports (#5646)
   105  - Added support to show progress bars during downloads (#5654)
   106  - Fixes a bug that causes update-pipeline to time out in some cases (#5661)
   107  
   108  ## 1.12.1
   109  - [Security] Adds required authentication for various API calls (#5582) (#5577) (#5575)
   110  - Fixes a bug that prevented the creation of build pipeline when auth is enabled (#5594)
   111  - Changes to improve the performance for repos with a large number of files (#5600)
   112  - Changes to improve the performance of reading output repo metadata (#5609)
   113  - Changes to improve the performance of file upload in spout pipelines (#5613)
   114  - Added support to capture commit and job info in debug dump (#5619)
   115  
   116  ## 1.12.0
   117  - Fixed a race condition that updated a job state after it is finished (#5099)
   118  - Fixes a bug that would prevent successful initialization (#5128)
   119  - Changes to `debug dump` command to capture debug info from pachd and all worker pods by default. Debug info includes logs, goroutines, profiles, and specs (#5128)
   120  - Added support for grouping datums in pipelines similar to grouping in SQL (#5147) (#5484)
   121  - Added support to capture enterprise key via stdin (#5162)
   122  - Changes to create/update pipeline to warn users about using the “latest” tag for images (#5164)
   123  - Fixes a bug that prevented progress counts from being updated. In addition, make progress counts update more granularly in `inspect job` (#5173)
   124  - Fixes a bug that would cause certain kinds of jobs to pick an incorrect commit if there were multiple commits on the same branch in the provenance (#5189)
   125  - Fixed a bug that would return an error when listing commits and the list reaches the user-specified limit (#5190)
   126  - Fixes a bug that mistagged user logs messages for spouts and services as master log messages (#5191)
   127  - Fixes `create_python_pipeline` in the python client library when auth is enabled (#5193)
   128  - Fixes a bug that fails to execute a pipeline if the build pipeline does not any wheels (#5196)
   129  - Fixes a bug that would immediately cancel job egress (#5201)
   130  - Fixes a bug that did not correctly port forward OIDC port (#5214)
   131  - ACLs support an "allClusterUsers" principal (#5222)
   132  - Pipelines can now associate triggers with their inputs that define conditions that must be met for the pipeline to run (#5225) (#5483) (#5538)
   133  - Fixes a bug that would fail the `run cron <pipeline>` command if multiple cron inputs have been specified (#5227)
   134  - Changes to allow configuration of SAML and OIDC default server ports (#5230)
   135  - Changes to improve the reliability of handling streams in spouts (#5237)
   136  - Fixes a bug that leaked goroutine (#5263)
   137  - Fixes a race condition that prevents a standby pipeline from transitioning out of crashing state (#5273)
   138  - Added alias support for cloud providers deployments - aws, azure, gcp (#5278)
   139  - Fixes a bug that did not correctly set the provenance when specified in `run pipeline` command (#5291)
   140  - Authenticate accepts an OIDC ID token with an appropriate audience (#5292)
   141  - Fixes a bug that can cause get file request to fail when the request falls on a certain boundary condition (#5302)
   142  - Fixes a bug that causes a connection failure when DNS is not configured properly (#5303)
   143  - Changes to fix multiple error log messages when processing `list pipeline` (#5304)
   144  - Added support for OIDC `groups` claim for syncing user group membership (#5308)
   145  - Added support for Outers joins to include files that have no match (#5309)
   146  - Fixes a bug that can leave a stats commit open when stats enabled pipeline is updated with `--reprocess` option. This bug will also prevent new jobs from getting created (#5314)
   147  - Changes for better error handling when pipelines info cannot be fully initialized due to transient failures or user errors (#5322)
   148  - Fixes a bug that did not stop a job before deleting a job when `delete job` is called (#5324)
   149  - Fixes a family of bug that did not properly clean up temporary artifacts from a job (#5332)
   150  - Added a deploy option to enable verbose logging in S3 client (#5341)
   151  - Changes to move some noisy log message to DEBUG level (#5344)
   152  - Added support for filtering by state in `list job` and `list pipeline` (#5355) (#5351)
   153  - Fixes a bug that can sometimes leave pipeline in STANDBY state (#5363)
   154  - Fixes a bug that causes incorrect datums to be processed due to trailing slashes in joins (#5367)
   155  - Changes the metric reporting interval to 60mins (#5369)
   156  - Fixes a family of bugs to handle pipeline state transitions. The change resolves a few issues: pipelines getting stuck in STARTING state if Kubernetes is unavailable; cannot delete and recreate pipelines in STANDBY state; fixes jobs occasionally getting stuck in CRASHING state (#5387) (#5273) (#5356)
   157  - Fix a bug that would leak a revoked pipeline token object (#5389) (#5400)
   158  - Added support to `list datum` to accept a pipeline spec which allows you to list datums for a pipeline without creating it (#5394)
   159  - Added support to display when a job is in the egress state (#5395)
   160  - New implementation of Spouts that uses pachctl -- deprecation (spouts using named pipes will be deprecated in a future release) (#5398) (#5528)
   161  - Fix a bug causing extra data to be written to small job artifact files in some cases (#5401)
   162  - Fix a bug causing workers to attempt to read certain job artifacts before they were fully written (#5401)
   163  - Changes to always create/update pipelines in a transaction (#5431)
   164  - Fixes a bug that prevented deletion directory under certain conditions (#5449)
   165  - Added an option `--split-txn` to pachctl delete pipeline` or `pachctl delete repo` commands for deployments with a very large number of commits and job history (#5461)
   166  - Fixes a bug that failed objects uploads when single grpc message is greater than 20MB (#5468)
   167  - Fixes a bug that prevented debug dump command when Auth is enabled (#5471)
   168  - Pipeline triggers (#5483) (#5538)
   169  - Added support for extracting and restoring data from clusters with authentication enabled (#5494) (#5532)
   170  - Fixed a bug preventing creating some build pipelines with auth enabled (#5523)
   171  - Update crewjam/saml to 0.45 to fix vulnerabilities in SAML auth provider (#5527)
   172  
   173  ## 1.11.0
   174  
   175  Deprecation notice: Support for S3V2 signatures is deprecated in 1.11.0 and will reach end-of-life in 1.12.0. Users who are using S3V4-capable storage should make sure their deployment is using the supported storage backend by redeploying without `--isS3V2` flag. If you need help, please reach out to Pachyderm support.
   176  
   177  - Adds support for running multiple jobs in parallel in a single pipeline (#4572)
   178  - Adds support for logs stack traces when a request encounters an error (#4681)
   179  - Adds support for the first release of the pachyderm IDE (#4732) (#4790) (#4838)
   180  - Adds support for displaying progress bar during `pachctl put file` (#4745)
   181  - Adds support for writable pachctl mount which checkpoints data back into pfs when it's unmounted (#4772)
   182  - Adds support for metric endpoint configurable via METRICS_ENDPOINT env variable (#4793)
   183  - Adds an "exit" command to the pachctl shell (#4802)
   184  - Adds a `--compress` option to `pachctl put file` which GZIP compresses the upload stream (#4814)
   185  - Adds a `--put-file-concurrency-limit` option to `pachctl put file` command to limits the upload parallelism which limits the memory footprint in pachd to avoid OOM condition (#4827)
   186  - Adds support to periodically reload TLS certs (#4835)
   187  - Adds a new pipeline state "crashing" which pipelines enter when they encounter Kubernetes errors. Pipelines in this state will have a human-readable "Reason" that explains why they're crashing. Pipelines also now expose the number of pods that are up and responsive. Both values can be seen with `inspect pipeline` (#4922)
   188  - Adds support to allow etcd volumes to be expanded. (Special thanks to @mattrobenolt.) (#4925)
   189  - Adds experimental support for using Loki as a logging backend rather than k8s. Enable with the `LOKI_LOGGING` feature flag to pachd (#4946)
   190  - Adds support for copy object in S3 gateway (#4972)
   191  - Adds a new cluster-admin role, "FS", which grants access to all repos but not other admin-only endpoints (#4975) (#5103)
   192  - Adds support to surface image pull errors in pipeline sidecar containers (#4979)
   193  - Adds support for colorizing level in `pachctl logs` (Special thanks to @farhaanbukhsh) (#4996)
   194  - Adds configurable resource limits to the storage side and set default resource limits for the init container (#4999)
   195  - Adds support user sign in by authenticating with an OIDC provider (#5005)
   196  - Adds error handling when starting a transaction when another one is pending (Special thanks to @farhaanbukhsh) (#5010)
   197  - Adds support for using TLS (if enabled) for downloading files over HTTP (#5023)
   198  - Adds an option for specifying the Kubernetes service account to use in worker pods (#5056)
   199  - Adds build steps for pipelines (#5064)
   200  - Adds support for a dockerized version of `pachctl` available on docker hub (#5073) (#5079)
   201  - Adds support for configuring Go's GC Percentage (#5089)
   202  - Changes to propagate feature flags to sidecar (#4718)
   203  - Changes to route all object store access through the sidecar (#4741)
   204  - Changes to better support disparate S3 client behaviors. Includes numerous compatibility improvements in S3 gateway (#4902)
   205  - Changes debug dump to collect sidecar goroutines (#4954)
   206  - Fixes a bug that would cause spouts to lose data when spouts are rapidly opened and closed (#4693) (#4910)
   207  - Fixes a bug that allowed spouts with inputs (#4747)
   208  - Fixes a bug that prevented access to S3 gateway when other workers are running in a different namespace than Pachyderm namespace (#4753)
   209  - Fixes a bug that would not delete Kubernetes service when a pipeline is restarted due to updates (#4782)
   210  - Fixes a bug that created messages larger than expected size which can fail some operations with grpc: received message larger than max error (#4819)
   211  - Fixes a bug that caused an EOF error in get file request when using azure blob storage client (#4824)
   212  - Fixes a bug that would fail a restore operation in certain scenarios when the extract operation captures commits in certain failed/incomplete states (#4839)
   213  - Fixes a bug that causes garbage collection to fail for standby pipelines (#4860)
   214  - Fixes a bug that did not use the native DNS resolver in pachctl client which may prevent pachd access over VPNs (#4876)
   215  - Fixes a bug that caused `pachctl list datum <running job>` to return an error "output commit not finished" on pipelines with stats enabled (#4886)
   216  - Fixes a bug causing a resource leak in pachd when certain protocol errors occur in PutFile (#4908)
   217  - Fixes a bug where downloading files over HTTP didn't work with authorization enabled (#4930)
   218  - Fixes a family of issues that caused workers to indefinitely wait on etcd after a pod eviction (#4947) (#4948) (#4959)
   219  - Fixes a bug that did not set environment variables for service pipelines (#5009)
   220  - Fixes a bug where users get an error if they run `pachctl debug pprof`, but don't have to “go” installed on their machine (#5022)
   221  - Fixes a bug which caused the metadata of a spout pipeline's spec commit to grow without bound (#5050)
   222  - Fixes a bug that caused the metadata in commit info to not get carried between an extract and a restore operation (#5052)
   223  - Fixes a bug which caused crashes when creating pipelines with certain invalid parameters (#5054)
   224  - Fixes a bug that causes the dash compatibility file not found error (#5063)
   225  - Moves etcd image to Docker Hub from Quay.io (#4899)
   226  - Updates dash version to the latest published version 0.5.48 (#4756)
   227  
   228  ## 1.10.0
   229  
   230  - Change Pachyderm license from Apache 2.0 to Pachyderm Community License
   231  - Changes to how resources are applied to pipeline containers (#4675)
   232  - Changes to GitHook and Prometheus ports (#4537)
   233  - Changes to handle S3 credentials passed to S3 gateway when Auth is disabled (#4585)
   234  - Changes to add support for ‘zsh’ shell (#4494)
   235  - Changes to allow only critical servers to startup with `--required-critical-servers-only` (#4536)
   236  - Changes to improve job logging (#4538)
   237  - Changes to support copying files from output repo to input repos (#4475)
   238  - Changes to ‘flush job’ CLI to support streaming output with --raw option (#4569)
   239  - Changes to remove cluster ID check (#4532)
   240  - Adds annotations and labels to top-level pipeline spec (#4608) (NOTE: If your pipeline spec specifies “service.annotations”, it is recommended that you follow the upgrade path and manually update the pipelines specs to include annotations under the new metadata tag)
   241  - Adds support for S3 inputs & outputs in pipeline specs (#4605, #4660)
   242  - New interactive Pachyderm Shell. The shell provides an easier way to interact with pachctl, including advanced auto-completion support (#4485, #4557)
   243  - Adds support for creating secrets through Pachyderm. (#4483)
   244  - Adds support for disabling commit progress indicator to reduce load on etcd (#4696)
   245  - Fixes a bug that ignored the EDITOR environment variable (#4672)
   246  - Fixes a bug that would cause restore failures from v1.8.x version to v1.9.x+ version (#4662)
   247  - Fixes a bug that would result in missing output data, under specific conditions, when a job resumes processing (#4656)
   248  - Fixes a bug that caused errors when specifying a branch name as the provenance of a new commit (#4657)
   249  - Fixes a bug that would leave a stats commit open under some failure conditions during run pipeline (#4637)
   250  - Fixes a bug that resulted in a stuck merge process when some commits are left in an unfinished state (#4595)
   251  - Fixes a bug that ignored the cron pipeline overwrite value when ‘run cron’ is called from the command line (#4517)
   252  - Fixes a bug that caused `edit pipeline` command to open an empty file (#4526)
   253  - Fixes a bug where some unfinished commit finish times displayed the Unix Epoch time. (#4539)
   254  - Fixes a family of bugs and edge conditions with spout marker (#4487)
   255  - Fixes a bug that would cause crash in ‘diff file’ command (#4601)
   256  - Fixes a bug that caused a crash when `run pipeline` is executed with stats enabled (#4615)
   257  - Fixes a bug that incorrectly skips duplicate datums in a union, under specific conditions (#4691)
   258  - Fixes a bug that ignored the logging level set in the environment variable (#4706)
   259  
   260  
   261  ## 1.9.12
   262  
   263  - New configuration for deployments (exposed through pachctl deploy flags):
   264    - Only require critical servers to startup and run without error (--require-critical-servers-only). (#4512)
   265  - Improved job logging. (#4523)
   266  - Fixes a bug where some unfinished commit finish times displayed the Unix Epoch time. (#4524)
   267  - Fixes a bug with edit pipeline. (#4530)
   268  - Removed cluster id check. (#4534)
   269  - Fixes a bug with spout markers. (#4487)
   270  
   271  ## 1.9.11
   272  
   273  - New configuration for deployments (exposed through pachctl deploy flags):
   274    - Object storage upload concurrency limit (--upload-concurrency-limit). (#4393)
   275  - Various configuration improvements. (#4442)
   276  - Fixes a bug that would cause workers to segfault. (#4459)
   277  - Upgrades pachyderm to go 1.13.5. (#4472)
   278  - New configuration for amazon and custom deployments (exposed through pachctl deploy amazon/custom flags):
   279    - Disabling ssl (--disable-ssl) (#4473)
   280    - Skipping certificate verification (--no-verify-ssl) (#4473)
   281  - Further improves the logging and error reporting during pachd startup. (#4486)
   282  - Removes pprof http server from pachd (debugging should happen through the debug api). (#4496)
   283  - Removes k8s api access from worker code. (#4498)
   284  
   285  ## 1.9.10
   286  
   287  - Fixes a bug that causes `pachctl` to connect to the wrong cluster (#4416)
   288  - Fixes a bug that causes hashtree resource leak in certain conditions (#4420)
   289  - Fixes a family of minor bugs found through static code analysis (#4410)
   290  - Fixes a family of bugs that caused pachd panic when it processed invalid arguments (#4391)
   291  - Fixes a family of bugs that caused deploy yaml to fail (#4290)
   292  - Changes to use standard go modules instead of old vendor directory (#4323)
   293  - Changes to add additional logging during pachd startup (#4447)
   294  - Changes to CLI to add a command, `run cron <pipeline>` to manually trigger a CRON pipeline (#4419)
   295  - Changes to improve performance of join datum processing (#4441)
   296  - Open source Pachyderm S3 gateway to allow applications to interact with PFS storage (#4399)
   297  
   298  ## 1.9.9
   299  
   300  - Adds support for spout marker to keep track of metadata during spout processing. (#4224)
   301  - Updates GPT 2 example to use GPU. (#4325)
   302  - Fixes a bug that did not extract all the pipeline fields (#4204)
   303  - Fixes a bug that did not retry a previously skipped datum when pipeline specs are updated. (#4310)
   304  - Fixes a family of bugs which failed the building of docker images with create pipeline --build command. (#4319)
   305  - Fixed a bug that did not prompt users if auto-derivation of docker credentials fails. (#4319)
   306  - Changes to track commit progress through DAG. (#4203)
   307  - Changes to CLI syntax for run pipeline to accept —job option to re-run a job. (#4267)
   308  - Changes to CLI syntax for inspect to accept branch option. (#4293)
   309  - Changes to CLI output for list repo and list pipeline to show description. (#4368)
   310  - Changes to CLI output for list commit to show progress and description while removing parent and duration output. (#4368)
   311  
   312  ## 1.9.8
   313  
   314  - Fixes a bug that prevent the `--reprocess` flag in `edit pipeline` from working. (#4232)
   315  - Changes the CLI syntax for `run pipeline` to accept commit branch pairs. (#4262)
   316  - Fixes a bug that caused `pachctl logs --follow` to exit immediately. (#4259)
   317  - Fixes a bug that joins to sometimes miss pairs that matched. (#4256)
   318  - Fixes a bug that prevent pachyderm from deploying on Kuberentes 1.6 without modifying manifests. (#4242)
   319  - Fixes a family of bugs that could cause output and stats commits to remain open and block later jobs. (#4215)
   320  
   321  ## 1.9.7
   322  
   323  - Fixes a bug that prevent pachctl from connecting to clusters with TLS enabled. (#4167)
   324  
   325  ## 1.9.6
   326  
   327  - Fixes a bug which would cause jobs to report success despite datum failures. (#4158)
   328  - Fixes a bug which prevent Disk resource requests in pipelines from working. (#4157)
   329  - Fixes a bug which caused `pachctl fsck --fix` to exit with an error and not complete the fix. (#4155)
   330  - Pachctl contexts now have support for importing Kubernetes contexts. (#4152)
   331  - Fixes a bug which caused Spouts to create invalid provenance. (#4145)
   332  - Fixes a bug which allowed creation, but not deletion, of pipelines with invalid names. (#4133)
   333  - Fixes a bug which caused ListTag to fail with WriteHeader already called. (#4132)
   334  - Increases the max transaction operations and max request bytes values for etcd's deployment. (#4121)
   335  - Fixes a bug that caused `run pipeline` to crash pachd. (#4109)
   336  - Pachctl deploy amazon now exposes several new s3 connection options. (#4107)
   337  - Readds the `--namespace` flag to `port forward`. (#4105)
   338  - Removes and unused field `Batch` from the pipeline spec. (#4104)
   339  
   340  ## 1.9.5
   341  
   342  - Fixes a bug that caused the Salt field to be stripped from restored pipelines. (#4086)
   343  - Fixes a bug that caused datums to fail with `io: read/write on closed pipe`. (#4085)
   344  - Fixes a bug that prevented reading logs from running jobs with stats enabled. (#4083)
   345  - Fixes a bug that prevented putting files into output commits via s3gateway. (#4076)
   346  
   347  ## 1.9.4
   348  
   349  - Fixes a bug (#4053) which made it impossible to read files written to output commits with `put file`. (#4055)
   350  - Adds a flag `--fix` to `pachctl fsck` which will fix some of the issues that it detects. (#4052)
   351  - Fixes a bug (#3879) which caused `pachctl debug dump` to hit max message size issues. (#4015)
   352  - The Microsoft Azure Blob Storage client has been upgraded to the most recent version. (#4000)
   353  - Extract now correctly extracts the `pod_patch` and `pod_spec` for pipelines. (#3964, thanks to @mrene)
   354  - S3Gateway now has support for multi-part uploads. (#3903)
   355  - S3Gateway now has support for multi-deletes. (#4004)
   356  - S3Geteway now has support for auth. (#3937)
   357  
   358  ## 1.9.3
   359  
   360  - Fixes a bug that caused the Azure driver to lock up when there were too many active requests. (#3970)
   361  - Increases the max message size for etcd, this should eliminate errors that would appear with large etcd requests such as those created when deleting repos and pipelines. (#3958)
   362  - Fixes several bugs that would cause commits not to be finished when jobs encountered errors, which would lead to pipelines getting stuck. (#3951)
   363  
   364  ## 1.9.2
   365  
   366  - Fixes a bug that broke Pachyderm on Openshift. (#3935, thanks to @jiangytcn)
   367  - Fixes a bug that caused pachctl to crash when deleting a transaction while no active transaction was set. (#3929)
   368  - Fixes a bug that broke provenance when deleting a repo or pipeline. (#3925)
   369  
   370  ## 1.9.1
   371  
   372  - Pachyderm now uses go modules. (#3870)
   373  - `pachctl diff file` now diffs content, similar to `git diff`. (#3866)
   374  - It's now possible to create spout services as ingress endpoints. (#3829)
   375  - Pachyderm now supports contexts as a way to access multiple clusters. (#3786)
   376  - Fixes a bug that causes `pachctl put file --overwrite` to fail when reading from stdin. (#3882)
   377  - Fixes a bug that caused jobs from run pipeline to succeed when they should fail. (#3872)
   378  - Fixes a bug that caused workers to get stuck in a crashloop. (#3858)
   379  - Fixes a bug that causes pachd to panic when a pipeline had no transform. (#3866)
   380  
   381  ## 1.9.0
   382  
   383  - `pachctl` now has a new, more consistent syntax that's more in line with other container clis such as `kubectl`. (#3617)
   384  - Pachyderm now exposes an s3 interface to the data stored in pfs. (#3411, #3432, #3508)
   385  - Pachyderm now supports transactional PFS operations. (#3658)
   386  - The `--history` flag has been extended to `list job` and `list pipeline` (in addition to `list file`.) (#3692)
   387  - The ancestry syntax for accessing branches (`master^`) has been extended to include forward references i.e. `master.1`. (#3692)
   388  - You can now define service annotations and service type in your pipeline specs. (#3755, thanks to @cfga and @DanielMorales9)
   389  - You can now define error handlers for your pipelines. (#3611)
   390  - Pachyderm has a new command, `fsck` which will check pfs for corruption issues. (#3691)
   391  - Pachyderm has a new command, `run pipeline` which allows you to manually trigger a pipelined on a set of commits. (#3642)
   392  - Commits now store the original branch that they were created on. (#3583)
   393  - Pachyderm now exposes tracing via Jaeger. (#3541)
   394  - Fixes several issues that could lead to object store corruption, particularly on alternative object stores. (#3797)
   395  - Fixes several issues that could cause pipelines to get hung under heavy load. (#3788)
   396  - Fixes an issue that caused jobs downstream from jobs that output nothing to fail. (#3787)
   397  - Fixes a bug that prevent stats from being toggled on after a pipeline had already run. (#3744)
   398  - Fixes a bug that caused `pachctl` to crash in `list commit`. (#3699)
   399  - Fixes a bug that caused provenance to get corrupted on `delete commit`. (#3696)
   400  - A few minor bugs in the output and erroring behavior of `list file` have been fixed. (#3601, #3596)
   401  - Preflight object store tests have been revamped and their error output made less confusing. (#3592)
   402  - A bug that causes stopping a pipeline to create a new job has been fixed. (#3585)
   403  - Fixes a bug that caused pachd to panic if the `input` field of a pipeline was nil. (#3580)
   404  - The performance of `list job` has been greatly improved. (#3557)
   405  - `atom` inputs have been removed and use `pfs` inputs instead. (#3639)
   406  - The `ADDRESS` env var for connecting to pachd has been removed, use `PACHD_ADDRESS` instead. (#3638)
   407  
   408  ## 1.8.8
   409  
   410  - Fixes a bug that caused pipelines to recompute everything when they were restored. (#4079)
   411  
   412  ## 1.8.7
   413  
   414  - Make the 'put file' directory traversal change backwards compatible for legacy branches (#3707)
   415  - Several fixes to provenance (#3734):
   416      - Force provenance to be transitively closed
   417      - Propagate all affected branches on deleteCommit
   418      - Fix weird two branches with one commit bugs
   419  - Added a new fsck utility for PFS (#3734)
   420  - Make stats somewhat toggleable (#3758)
   421  - Example of spouts using kafka (#3752)
   422  - Refactor/fix some of the PFS upload steps (#3750)
   423  
   424  ## 1.8.6
   425  
   426  - The semantics of Cron inputs have changed slightly, each tick will now be a separate file unless the `Overwrite` flag is set to true, which will get you the old behavior. The name of the emitted file is now the timestamp that triggered the cron, rather than a static filename. Pipelines that use cron will need to be updated to work in 1.8.6. See [the docs](https://docs-archive.pachyderm.com/en/v1.8.6/reference/pipeline_spec.html#cron-input) for more info. (#3509)
   427  - 1.8.6 contains alpha support for a new kind of pipeline, spouts, which take no inputs and run continuously outputting (or spouting) data. Documentation and an example of spout usage will be in a future release. (#3531)
   428  - New debug commands have been added to `pachctl` to easily profile running pachyderm clusters. They are `debug-profile` `debug-binary` and `debug-pprof`.  See the docs for these commands for more information. (#3559)
   429  - The performance of `list-job` has been greatly improved. (#3557)
   430  - `pachctl undeploy` now asks for confirmation in all cases. (#3535)
   431  - Logging has been unified and made less verbose. (#3532)
   432  - Bogus output in with `--raw` flags has been removed. (#3523, thanks to @mdaniel)
   433  - Fixes a bug in `list-file --history` that would cause it to fail with too many files. (#3516)
   434  - `pachctl deploy` is more liberal in what it accepts for bucket names. (#3506)
   435  - `pachctl` now respects Kubernetes auth when port-forwarding. (#3504)
   436  - Output repos now report non-zero sizes, the size reported is that of the HEAD commit of the master branch. (#3475)
   437  - Pachyderm will no longer mutate custom image names when there's no registry. (#3487, thanks to @mdaniel)
   438  - Fixes a bug that caused `pod_patch` and `pod_spec` to be reapplied over themselves. (#3484, thanks to @mdaniel)
   439  
   440  ## 1.8.5
   441  - New shuffle step which should improve the merge performance on certain workloads.
   442  
   443  ## 1.8.4
   444  
   445  - Azure Blob Storage block size has been changed to 4MB due to object body too large errors. (#3464)
   446  - Fixed a bug in `--no-metrics` and `--no-port-forwarding`. (#3462)
   447  - Fixes a bug that caused `list-job` to panic if the `Reason` field was too short. (#3453)
   448  
   449  ## 1.8.3
   450  
   451  - `--push-images` on `create-pipeline` has been replaced with `--build` which builds and pushes docker images. (#3370)
   452  - Fixed a bug that would cause malformed config files to panic pachctl. (#3336)
   453  - Port-forwarding will now happen automatically when commands are run. (#3340)
   454  - Fix bug where `create-pipeline` accepts names which Kubernetes considers invalid. (#3344)
   455  - Fix a bug where put-file would respond `master not found` for an open commit. (#3184)
   456  - Fix a bug where jobs with stats enabled and no datums would never close their stats commit. (#3355)
   457  - Pipelines now reject files paths with utf8 unprintable characters. (#3356)
   458  - Fixed a bug in the Azure driver that caused it to choke on large files. (#3378)
   459  - Fixed a bug that caused pipelines go into a loop and log a lot when they were stopped. (#3397)
   460  - `ADDRESS` has been renamed to `PACHD_ADDRESS` to be less generic. `ADDRESS` will still work for the remainder of the 1.8.x series of releases. (#3415)
   461  - The `pod_spec` field in pipelines has been revamped to use JSON Merge Patch (rfc7386) Additionally, a field, `pod_patch` has been added the the pipeline spec which is similar to `pod_spec` but uses JSON Patches (rfc6902) instead. (#3427)
   462  - Pachyderm developer names should no longer appear in backtraces. (#3436)
   463  
   464  ## 1.8.2
   465  
   466  - Updated support for GPUs (through device plugins).
   467  
   468  ## 1.8.1
   469  
   470  - Adds support for viewing file history via the `--history` flag to `list-file` (#3277, #3299).
   471  - Adds a new job state, `merging` which indicates that a job has finished processing everything and is merging the results together (#3261).
   472  - Fixes a bug that prevented s3 `put-file` from working (#3273).
   473  - `atom` inputs have been renamed to `pfs` inputs. They behave the same, `atom` still works but is deprecated and will be removed in 1.9.0 (#3258).
   474  - Removed `message` and `description` from `put-file`, they don't work with the new multi `put-file` features and weren't commonly used enough to reimplement. For similar functionality use `start-commit` (#3251).
   475  
   476  ## 1.8.0
   477  
   478  - Completely rewritten hashtree backend that provides massive performance boosts.
   479  - Single sign-on Auth via Okta.
   480  - Support for groups and robot users.
   481  - Support for splitting file formats with headers and footers such as SQL and CSV.
   482  
   483  ## 1.7.10
   484  
   485  - Adds `put-file --split` support for SQL dumps. (#3064)
   486  - Adds support for headers and footers for data types passed to `--split` such as CSV and the above mentioned SQL. (#3064)
   487  - Adds support for accessing previous versions of pipelines using the same syntax as is used with commits. I.e. `pachctl inspect-pipeline foo^` will give the previous version of `foo`. (#3159)
   488  - Adds support in pipelines for additional Kubernetes primitives on workers, including: node selectors, priority class and storage requests and limits. Additionally there is now a field in the pipeline spec `pod_spec` that allows you to set any field on the pod using json. (#3169)
   489  
   490  ## 1.7.9
   491  
   492  - Moves garbage collection over to a bloom filter based indexing method. This
   493  greatly decreases the amount of memory that garbage collection requires, at the
   494  cost of a small probability of not deleting objects that should be. Garbage
   495  collection can be made more accurate by using more memory with the flag
   496  `--memory` passed to `pachctl garbage-collect`. (#3161)
   497  
   498  ## 1.7.8
   499  
   500  - Fixes multiple issues that could cause jobs to hang when they encountered intermittent errors such as network hiccups. (#3155)
   501  
   502  ## 1.7.7
   503  
   504  - Greatly improves the performance of the pfs FUSE implementation. Performance should be close to on par with the that of pachctl get-file. The only trade-off is that the new implementation will use disk space to cache file contents. (#3140)
   505  
   506  ## 1.7.6
   507  
   508  - Pachyderm's FUSE support (`pachctl mount`) has been rewritten. (#3088)
   509  - `put-file` requests that put files from multiple sources (`-i` or `-r`) now create a single commit. (#3118)
   510  - Fixes a bug that caused `put-file` to throw spurious warnings about URL formatted paths. (#3117)
   511  - Several fixes have been made to which user code runs as to allow services such as Jupyter to work out of the box. (#3085)
   512  - `pachctl` now has `auth set-config` and `auth get-config` commands. (#3095)<Paste>
   513  
   514  ## 1.7.5
   515  
   516  - Workers no longer run privileged containers. (#3031) To achieve this a few modifications had to be made to the `/pfs` directory that may impact some user code. Directories under `/pfs` are now symlinks to directories, previously they were bind-mounts (which requires that the container be privileged). Furthermore there's now a hidden directory under `/pfs` called `.scratch` which contains the directories that the symlinks under `/pfs` point to.
   517  - The number of times datums are retries is now configurable. (#3033)
   518  - Fixed a bug that could cause Kubernetes errors to prevent pipelines from coming up permanently. (#3043, #3005)
   519  - Robot users can now modify admins. (#3049)
   520  - Fixed a bug that could permanently lock robot-only admins out of the cluster. (#3050)
   521  - Fixed a couple of bugs (#3045, #3046) that occurred when a pipeline was rapidly updated several times. (#3054)
   522  - `restore` now propagates user credentials, allowing it to work on clusters with auth turned on. (#3057)
   523  - Adds a `debug-dump` command which dumps running goroutines from the cluster. (#3078)
   524  - `pachd` now prints a full goroutine dump if it encounters an error. (#3103)
   525  
   526  ## 1.7.4
   527  
   528  - Fixes a bug that prevented image pull secrets from propagating through `pachctl deploy`. (#2956, thanks to @jkinkead)
   529  - Fixes a bug that made `get-file` fail on empty files. (#2960)
   530  - `ListFile` and `GlobFile` now return results leixcographically sorted. (#2972)
   531  - Fixes a bug that caused `Extract` to crash. (#2973)
   532  - Fixes a bug that caused pachd to crash when given a pipeline without a name field. (#2974)
   533  - Adds dial options to the Go client's connect methods. (#2978)
   534  - `pachctl get-logs` now accepts `-p` as a synonym for `--pipeline`. (#3009, special thanks to @jdelfino)
   535  - Fixes a bug that caused connections to leak in the vault plugin. (#3016)
   536  - Fixes a bug that caused incremental pipelines that are downstream from other pipelines to not run incrementally. (#3023)
   537  - Updates monitoring deployments to use the latest versions of Influx, Prometheus and Grafana. (#3026)
   538  - Fixes a bug that caused `update-pipeline` to modify jobs that had already run. (#3028)
   539  
   540  ## 1.7.3
   541  
   542  - Fixes an issue that caused etcd deployment to fail when using a StatefulSet. (#2929, #2937)
   543  - Fixes an issue that prevented pipelines from starting up. (#2949)
   544  
   545  ## 1.7.2
   546  
   547  - Pachyderm now exposes metrics via Prometheus. (#2856)
   548  - File commands now all support globbing syntax. I.e. you can do pachctl list-file ... foo/*. (#2870)
   549  - garbage-collect is now safer and less error prone. (#2912)
   550  - put-file no longer requires starting (or finishing) a commit. Similar to put-file -c, but serverside. (#2890)
   551  - pachctl deploy --dry-run can now output YAML as well as JSON. Special thanks to @jkinkead. (#2872)
   552  - Requirements on pipeline container images have been removed. (#2897)
   553  - Pachyderm no longer requires privileged pods. (#2887)
   554  - Fixes several issues that prevented deleting objects in degraded states. (#2912)
   555  - Fixes bugs that could cause stats branches to not be cleaned up. (#2855)
   556  - Fixes 2 bugs related to auth services not coming up completely. (#2843)
   557  - Fixes a bug that prevented pachctl deploy storage amazon from working. (#2863)
   558  - Fixes a class of bugs that occurred due to misuse of our collections package. (#2865)
   559  - Fixes a bug that caused list-job to delete old jobs if you weren't logged in. (#2879)
   560  - Fixes a bug that caused put-file --split to create too many goroutines. (#2906)
   561  - Fixes a bug that prevent deploying to AWS using an IAM role. (#2913)
   562  - Pachyderm now deploys and uses the latest version of etcd. (#2914)
   563  
   564  ## 1.7.1
   565  
   566  - Introduces a new model for scaling up and down pipeline workers. [Read more](http://docs.pachyderm.io/en/latest/reference/pipeline_spec.html#standby-optional).
   567  - It's now possible to run Pachyderm without workers needing access to the docker socket. (#2813)
   568  - Fixes a bug that caused stats enabled pipelines to get stuck in a restart loop if they were deleted and recreated. (#2816)
   569  - Fixes a bug that broke logging due to removing newlines between log messages. (#2852)
   570  - Fixes a bug that caused pachd to segfault when etcd didn't come up properly. (#2840)
   571  - Fixes a bug that would cause jobs to occasionally fail with a "broken pipe" error. (#2832)
   572  - `pachctl version` now supports the `--raw` flag like other `pachctl` commands. (#2817)
   573  - Fixes a bug that caused `max_queue_size` to be ignored in pipelines. (#2818)
   574  
   575  ## 1.7.0
   576  
   577  - Implements a new algorithm for triggering jobs in response to new commits.
   578  - Pachyderm now tracks subvenance, the inverse of provenance.
   579  - Branches now track provenance and subvenance.
   580  - Restrictions on delete-commit have been removed, you can now delete any input commit and the DAG will repair itself appropriately.
   581  - Pachyderm workers no longer use long running grpc requests to schedule work, they use an etcd based queue instead. This solves a number of bugs we had with larger jobs.
   582  - You can now backup and restore your cluster with extract and restore.
   583  - Pipelines now support timeouts, both for the job as a whole or for individual datums.
   584  - You can now follow jobs logs with -f.
   585  - Support for Kubernetes RBAC.
   586  - Docker images with entrypoints can now be run, you do this by not specifying a cmd.
   587  - Pachctl now has bash completion, including for values stored within it. (pachctl completion to install it)
   588  - pachctl deploy now has a --namespace flag to deploy to a specific namespace.
   589  - You can no longer commit directly to output repos, this would cause a number of problems with the internal model that were tough to recover from.
   590  
   591  ## 1.6.10
   592  
   593  - Fixes a bug in extract that prevented some migrations from completing.
   594  
   595  ## 1.6.9
   596  
   597  - Adds admin commands extract and restore.
   598  
   599  ## 1.6.8
   600  
   601  - Fixed an issue that could cause output data to get doubled. (#2644)
   602  - Fix / add filtering of jobs in list-job by input commits. (#2642)
   603  - Extends bash completion to cover values as well as keywords. (#2617)
   604  - Adds better validation of file paths. (#2627)
   605  
   606  ## 1.6.7
   607  
   608  - Support for Google Service Accounts
   609  - RBAC support
   610  - Follow and tail logs
   611  - Expose public IP for githook service
   612  - Handle many 100k+ files in a single commit, which allows users to more easily manage/version millions of files.
   613  - Fix datum status in the UI
   614  
   615  ## 1.6.6
   616  
   617  - Users can now specify k8s resource limits on a pipeline
   618  - Users can specify a `datum_timeout` and `job_timeout` on a pipeline
   619  - Minio S3V2 support
   620  - New worker model (to eliminate long running grpc calls)
   621  
   622  ## 1.6.5
   623  
   624  - Adds support for Kubernetes 1.8
   625  - Fixes a bug that caused jobs with small numbers of datums not to use available nodes for processing. #2480.
   626  
   627  ## 1.6.4
   628  
   629  ## 1.6.3
   630  
   631  - Fixes a bug that corrupted large files ingressed from object stores. #2405
   632  - Fixes a migration bug that could get pipelines stuck in a crash loop
   633  - Fixes an issue with pipelines processing old data #2469
   634  - Fixes a bug that allowed paused pipelines to restart themselves.
   635  
   636  ## 1.6.2
   637  
   638  - Changes default memory settings so that Pachyderm works on Minikube out of the box.
   639  - Implements streaming versions of `ListFile` and `GlobFile` which prevents crashing on larger datasets.
   640  - Fixes a race condition with `GetLogs`
   641  
   642  ## 1.6.1
   643  
   644  - Adds support for private registries. (#2360)
   645  - Fixes a bug that prevent cloud front deployments from working. (#2381)
   646  - Fixes a failure that code arise while watching k8s resources. (#2382)
   647  - Uses k8s' Guaranteed QoS for etcd and pachd. (#2368)
   648  
   649  ## 1.6.0
   650  
   651  New Features:
   652  
   653  - Cron Inputs
   654  - Access Control Model
   655  - Advanced Statistic tracking for jobs
   656  - Extended UI
   657  
   658  ## 1.5.3
   659  
   660  Bug Fixes:
   661  
   662  - Fix an issue that prevented deployment on GCE #2139
   663  - Fix an issue that could cause jobs to hang due to lockups with bind mounts. #2178
   664  - FromCommit in pipelines is now exclusive and able to be used with branch names as well as commit ids. #2180
   665  - Egress was broken for certain object stores, this should be fixed now. #2156
   666  
   667  New Features:
   668  
   669  - Union inputs can now be given the same name, making union much more ergonomic. #2174
   670  - PutFile now has an `--overwrite` flag which overwrites the previous version of the file rather than appending. #2142
   671  - We've introduce a new type of input, `Cron`, which can be used to trigger pipelines based on time. #2150.
   672  
   673  ## 1.5.1 / 1.5.2
   674  
   675  ### Bug Fixes
   676  
   677  * A pipeline can get stuck after repeated worker failures.  (#2064)
   678  * `pachctl port-forward` can leave a orphaned process after it exits.  (#2098)
   679  * `alpine`-based pipelines fail to load input data.  (#2118)
   680  * Logs are written to the object store even when stats is not enabled, slowing down the pipeline unnecessarily.  (#2119)
   681  
   682  ### Features / Improvements
   683  
   684  * Pipelines now support the “stats” feature.  See the [docs](https://http://docs-archive.pachyderm.com/en/latest/reference/pipeline_spec.html#enable-stats-optional) for details.  (#1998)
   685  * Pipeline cache size is now configurable.  See the [docs](https://docs-archive.pachyderm.com/en/latest/reference/pipeline_spec.html#cache-size-optional) for details.  (#2033)
   686  * `pachctl update-pipeline` now **only** process new input data with the new code; the old input data is not re-processed. If it’s desired that all data are re-processed, use the `--reprocess` flag.  See the [docs](http://docs-archive.pachyderm.com/en/latest/how-tos/updating_pipelines.html) for details.  (#2034)
   687  * Pipeline workers now support “pipelining”, meaning that they start downloading the next datums while processing the current datum, thereby improving overall throughput.  (#2057)
   688  * The `scaleDownThreshold` feature has been improved such that when a pipeline is scaled down, the remaining worker only takes up minimal system resources.  (#2091)
   689  
   690  ## 1.5.0
   691  
   692  ### Bug Fixes
   693  
   694  * Downstream repos' provenance is not updated properly when `update-pipeline` changes the inputs for a pipeline. (#1958)
   695  * `pachctl version` blocks when pachctl doesn't have Internet connectivity. (#1971)
   696  * `incremental` misbehaves when files are deeply nested. (#1974)
   697  * An `incremental` pipeline blocks if there's provenance among its inputs. (#2002)
   698  * PPS fails to create subsequent pipelines if any pipeline failed to be created. (#2004)
   699  * Pipelines sometimes reprocess datums that have already been processed. (#2008)
   700  * Putting files into open commits fails silently. (#2014)
   701  * Pipelines with inputs that use different branch names fail to create jobs. (#2015)
   702  * `get-logs` returns incomplete logs.  (#2019)
   703  
   704  ### Features
   705  
   706  * You can now use `get-file` and `list-file` on open commits. (#1943)
   707  
   708  ## 1.4.8
   709  
   710  ### Bug Fixes
   711  
   712  - Fixes bugs that caused us to swamp etcd with traffic.
   713  - Fixes a bug that could cause corruption to in pipeline output.
   714  
   715  ### Features
   716  
   717  - Readds incremental processing mode
   718  - Adds `DiffFile` which is similar in function to `git diff`
   719  - Adds the ability to use cloudfront as a caching layer for additional scalability on aws.
   720  - `DeletePipeline` now allows you to delete the output repos as well.
   721  - `DeletePipeline` and `DeleteRepo` now support a `--all` flag
   722  
   723  ### Removed Features
   724  
   725  - Removes one-off jobs, they were a rarely used feature and the same behavior can be replicated with pipelines
   726  
   727  ## 1.4.7
   728  
   729  ### Bug fixes
   730  
   731  * [Copy elision](http://docs-archive.pachyderm.com/en/latest/managing_pachyderm/data_management.html#shuffling-files) does not work for directories. (#1803)
   732  * Deleting a file in a closed commit fails silently. (#1804)
   733  * Pachyderm has trouble processing large files. (#1819)
   734  * etcd uses an unexpectedly large amount of space. (#1824)
   735  * `pachctl mount` prints lots of benevolent FUSE errors. (#1840)
   736  
   737  ### New features
   738  
   739  * `create-repo` and `create-pipeline` now accept the `--description` flag, which creates the repo/pipeline with a "description" field.  You can then see the description via `inspect-repo/inspect-pipeline`. (#1805)
   740  * Pachyderm now supports garbage collection, i.e. removing data that's no longer referenced anywhere.  See the [docs](http://docs-archive.pachyderm.com/en/latest/managing_pachyderm/data_management.html#garbage-collection) for details. (#1826)
   741  * Pachyderm now has GPU support!  See the [docs](http://docs-archive.pachyderm.com/en/latest/managing_pachyderm/sharing_gpu_resources.html#without-configuration) for details. (#1835)
   742  * Most commands in `pachctl` now support the `--raw` flag, which prints the raw JSON data as opposed to pretty-printing.  For instance, `pachctl inspect-pipeline --raw` would print something akin to a pipeline spec. (#1839)
   743  * `pachctl` now supports `delete-commit`, which allows for deleting a commit that's not been finished.  This is useful when you have added the wrong data in a commit and you want to start over.
   744  * The web UI has added a file viewer, which allows for viewing PFS file content in the browser.
   745  
   746  ## 1.4.6
   747  
   748  ### Bug fixes
   749  
   750  * `get-logs` returns errors along the lines of `Invalid character…`. (#1741)
   751  * etcd is not properly namespaced. (#1751)
   752  * A job might get stuck if it uses `cp -r` with lazy files. (#1757)
   753  * Pachyderm can use a huge amount of memory, especially when it processes a large number of files. (#1762)
   754  * etcd returns `database space exceeded` errors after the cluster has been running for a while. (#1771)
   755  * Jobs crashing might eventually lead to disk space being exhausted. (#1772)
   756  * `port-forward` uses wrong port for UI websocket requests to remote clusters (#1754)
   757  * Pipelines can end up with no running workers when the cluster is under heavy load. (#1788)
   758  * API calls can start returning `context deadline exceeded` when the cluster is under heavy load. (#1796)
   759  
   760  ### New features / improvements
   761  
   762  * Union input: a pipeline can now take the union of inputs, in addition to the cross-product of them.  Note that the old `inputs` field in the pipeline spec has been deprecated in favor of the new `input` field.  See the [pipeline spec](http://pachyderm.readthedocs.io/en/latest/reference/pipeline_spec.html#input-required) for details. (#1665)
   763  * Copy elision: a pipeline that shuffles files can now be made more efficient by simply outputting symlinks to input files.  See the [docs on shuffling files](http://pachyderm.readthedocs.io/en/latest/reference/best_practices.html#shuffling-files) for details. (#1791)
   764  * `pachctl glob-file`: ever wonder if your glob pattern actually works?  Wonder no more.  You can now use `pachctl glob-file` to see the files that match a given glob pattern. (#1795)
   765  * Workers no longer send/receive data through pachd.  As a result, pachd is a lot more responsive and stable even when there are many ongoing jobs.  (#1742)
   766  
   767  ## 1.4.5
   768  
   769  ### Bug fixes
   770  
   771  * Fix a bug where pachd may crash after creating/updating a pipeline that has many input commits. (#1678)
   772  * Rules for determining when input data is re-processed are made more intuitive.  Before, if you update a pipeline without updating the `transform`, the input data is not re-processed.  Now, different pipelines or different versions of pipelines always re-process data, even if they have the same `transform`. (#1685)
   773  * Fix several issues with jobs getting stuck. (#1717)
   774  * Fix several issues with lazy pipelines getting stuck. (#1721)
   775  * Fix an issue with Minio deployment that results in job crash loop. (#1723)
   776  * Fix an issue where a job can crash if it outputs a large number of files. (#1724)
   777  * Fix an issue that causes intermittent gRPC errors. (#1727)
   778  
   779  ### New features
   780  
   781  * Pachyderm now ships with a web UI!  To deploy a new Pachyderm cluster with the UI, use `pachctl deploy <arguments> --dashboard`.  To deploy the UI onto an existing cluster, use `pachctl deploy <arguments> --dashboard-only`.  To access the UI, simply `pachctl port-forward`, then go to `localhost:38080`.  Note that the web UI is currently in alpha; expect bugs and significant changes.
   782  * You can now specify the amount of resources (i.e. CPU & memory) used by Pachyderm and etcd.  See `pachctl deploy --help` for details. (#1676)
   783  * You can now specify the amount of resources (i.e. CPU & memory) used by your pipelines. (#1683)
   784  
   785  ## 1.4.4
   786  
   787  ### Bug fixes
   788  
   789  * A job can fail to restart when encountering an internal error.
   790  * A deployment with multiple pachd nodes can get stalled jobs.
   791  * `delete-pipeline` is supposed to have the `--delete-jobs` flag but doesn't.
   792  * `delete-pipeline` can fail if there are many jobs in the pipeline.
   793  * `update-pipeline` can fail if the original pipeline has not outputted any commits.
   794  * pachd can crash if etcd is flaky.
   795  * pachd memory can be easily exhausted on GCE deployments.
   796  * If a pipeline is created with multiple input commits already present, all jobs spawn and run in parallel.  After the fix, jobs always run serially.
   797  
   798  ### Features
   799  
   800  * Pachyderm now supports auto-scaling: a pipeline's worker pods can be terminated automatically when the pipeline has been idle for a configurable amount of time.  See the `scaleDownThreshold` field of the [pipeline spec](http://docs-archive.pachyderm.com/en/latest/reference/pipeline_spec.html#standby-optional) for details.
   801  * The processing of a datum can be restarted manually via `restart-datum`.
   802  * Workers' statuses are now exposed through `inspect-job`.
   803  * A job can be stopped manually via `stop-job`.
   804  
   805  ## 1.4.3
   806  
   807  ### Bug fixes
   808  
   809  * Pipelines with multiple inputs process only a subset of data.
   810  * Workers may fall into a crash loop under certain circumstances. (#1606)
   811  
   812  ### New features
   813  
   814  * `list-job` and `inspect-job` now display a job's progress, i.e. they display the number of datums processed thus far, and the total number of datums.
   815  * `delete-pipeline` now accepts an option (`--delete-jobs`) that deletes all jobs in the pipeline. (#1540)
   816  * Azure deployments now support dynamic provisioning of volumes.
   817  
   818  ## 1.4.2
   819  
   820  ### Bug fixes
   821  
   822  * Certain network failures may cause a job to be stuck in the `running` state forever.
   823  * A job might get triggered even if one of its inputs is empty.
   824  * Listing or getting files from an empty output commit results in `node "" not found` error.
   825  * Jobs are not labeled as `failure` even when the user code has failed.
   826  * Running jobs do not resume when pachd restarts.
   827  * `put-file --recursive` can fail when there are a large number of files.
   828  * minio-based deployments are broken.
   829  
   830  ### Features
   831  
   832  * `pachctl list-job` and `pachctl inspect-job` now display the number of times each job has restarted.
   833  * `pachctl list-job` now displays the pipeline of a job even if the job hasn't completed.
   834  
   835  ## 1.4.1
   836  
   837  ### Bug fixes
   838  
   839  * Getting files from GCE results in errors.
   840  * A pipeline that has multiple inputs might place data into the wrong `/pfs` directories.
   841  * `pachctl put-file --split` errors when splitting to a large number of files.
   842  * Pipeline names do not allow underscores.
   843  * `egress` does not work with a pipeline that outputs a large number of files.
   844  * Deleting nonexistent files returns errors.
   845  * A job might try to process datums even if the job has been terminated.
   846  * A job doesn't exit after it has encountered a failure.
   847  * Azure backend returns an error if it writes to an object that already exists.
   848  
   849  ### New features
   850  
   851  * `pachctl get-file` now supports the `--recursive` flag, which can be used to download directories.
   852  * `pachctl get-logs` now outputs unstructured logs by default.  To see structured/annotated logs, use the `--raw` flag.
   853  
   854  ## 1.4.0
   855  
   856  Features/improvements:
   857  
   858  - Correct processing of modifications and deletions.  In prior versions, Pachyderm pipelines can only process data additions; data that are removed or modified are effectively ignored.  In 1.4, when certain input data are removed (or modified), downstream pipelines know to remove (or modify) the output that were produced as a result of processing the said input data.
   859  
   860  As a consequence of this change, a user can now fix a pipeline that has processed erroneous data by simply making a new commit that fixes the said erroneous data, as opposed to having to create a new pipeline.
   861  
   862  - Vastly improved performance for metadata operations (e.g. list-file, inspect-file).  In prior versions, metadata operations on commits that are N levels deep are O(N) in runtime.  In 1.4, metadata operations are always O(1), regardless of the depth of the commit.
   863  
   864  - A new way to specify how input data is partitioned.  Instead of using two flags `partition` and `incrementality`, we now use a single `glob` pattern.  See the [glob doc](http://docs-archive.pachyderm.com/en/latest/reference/pipeline_spec.html#the-input-glob-pattern) for details.
   865  
   866  - Flexible branch management.  In prior versions, branches are fixed, in that a commit always stays on the same branch, and a branch always refers to the same series of commits.  In 1.4, branches are modeled similar to Git's tags; they can be created, deleted, and renamed independently of commits.
   867  
   868  - Simplified commit states.  In prior versions, commits can be in many states including `started`, `finished`, `cancelled`, and `archived`.  In particular, `cancelled` and `archived` have confusing semantics that routinely trip up users.  In 1.4, `cancelled` and `archived` have been removed.
   869  
   870  - Flexible pipeline updates.  In prior versions, pipeline updates are all-or-nothing.  That is, an updated pipeline either processes all commits from scratch, or it processes only new commits.  In 1.4, it's possible to have the updated pipeline start processing from any given commit.
   871  
   872  - Reduced cluster resource consumption.  In prior versions, each Pachyderm job spawns up a Kubernetes job which in turn spawns up N pods, where N is the user-specified parallelism.  In 1.4, all jobs from a pipeline share N pods.  As a result, a cluster running 1.4 will likely spawn up way fewer pods and use fewer resources in total.
   873  
   874  - Simplified deployment dependencies.  In prior versions, Pachyderm depends on RethinkDB and etcd to function.  In 1.4, Pachyderm no longer depends on RethinkDB.
   875  
   876  - Dynamic volume provisioning.  GCE and AWS users (Azure support is coming soon) no longer have to manually provision persistent volumes for deploying Pachyderm.  `pachctl deploy` is now able to dynamically provision persistent volumes.
   877  
   878  Removed features:
   879  
   880  A handful of APIs have been removed because they no longer make sense in 1.4.  They include:
   881  
   882  - ForkCommit (no longer necessary given the new branch APIs)
   883  - ArchiveCommit (the `archived` commit state has been removed)
   884  - ArchiveAll (same as above)
   885  - DeleteCommit (the original implementation of DeleteCommit is very limiting: only open head commits may be removed.  An improved version of DeleteCommit is coming soon)
   886  - SquashCommit (was only necessary due to the way PPS worked in prior versions)
   887  - ReplayCommit (same as above)
   888  
   889  ## 1.3.0
   890  
   891  Features:
   892  
   893  - Embedded Applications - Our “service” enhancement allows you to embed applications, like Jupyter, dashboards, etc., within Pachyderm, access versioned data from within the applications, and expose the applications externally.
   894  - Pre-Fetched Input Data - End-to-end performance of typical Pachyderm pipelines will see a many-fold speed up thanks to a prefetch of input data.
   895  - Put Files via Object Store URLs - You can now use “put-file” with s3://, gcs://, and as:// URLS.
   896  - Update your Pipeline code easily - You can now call “create-pipeline” or “update-pipeline” with the “--push-images” flag to re-run your pipeline on the same data with new images.
   897  - Support for all Docker images - It is no longer necessary to include anything Pachyderm specific in your custom Docker images, so use any Docker image you like (with a couple very small caveats discussed below).
   898  - Cloud Deployment with a single command for Amazon / Google / Microsoft / a local cluster - via `pachctl deploy ...`
   899  - Migration support for all Pachyderm data from version `1.2.2` through latest `1.3.0`
   900  - High Availability upgrade to rethink, which is now deployed as a petset
   901  - Upgraded fault tolerance via a new PPS job subscription model
   902  - Removed redundancy in log messages, making logs substantially smaller
   903  - Garbage collect completed jobs
   904  - Support for deleting a commit
   905  - Added user metrics (and an opt out mechanism) to anonymously track usage, so we can discover new bottlenecks
   906  - Upgrade to k8s 1.4.6
   907  
   908  ## 1.2.0
   909  
   910  Features:
   911  
   912  - PFS has been rewritten to be more reliable and optimizeable
   913  - PFS now has a much simpler name scheme for commits (e.g. `master/10`)
   914  - PFS now supports merging, there are 2 types of merge. Squash and Replay
   915  - Caching has been added to several of the higher cost parts of PFS
   916  - UpdatePipeline, which allows you to modify an existing pipeline
   917  - Transforms now have an Env section for specifying environment variables
   918  - ArchiveCommit, which allows you to make commits not visible in ListCommit but still present and readable
   919  - ArchiveAll, which archives all data
   920  - PutFile can now take a URL in place of a local file, put multiple files and start/finish its own commits
   921  - Incremental Pipelines now allow more control over what data is shown
   922  - `pachctl deploy` is now the recommended way to deploy a cluster
   923  - `pachctl port-forward` should be a much more reliable way to get your local machine talking to pachd
   924  - `pachctl mount` will recover if it loses and regains contact with pachd
   925  - `pachctl unmount` has been added, it can be used to unmount a single mount or all of them with `-a`
   926  - Benchmarks have been added
   927  - pprof support has been added to pachd
   928  - Parallelization can now be set as a factor of cluster size
   929  - `pachctl put-file` has 2 new flags `-c` and `-i` that make it more usable
   930  - Minikube is now the recommended way to deploy locally
   931  
   932  Content:
   933  
   934  - Our developer portal is now available at: https://docs.pachyderm.com/latest/
   935  - We've added a quick way for people to reach us on Slack at: http://slack.pachyderm.io
   936  - OpenCV example
   937  
   938  ## 1.1.0
   939  
   940  Features:
   941  
   942  - Data Provenance, which tracks the flow of data as it's analyzed
   943  - FlushCommit, which tracks commits forward downstream results computed from them
   944  - DeleteAll, which restores the cluster to factory settings
   945  - More featureful data partitioning (map, reduce and global methods)
   946  - Explicit incrementality
   947  - Better support for dynamic membership (nodes leaving and entering the cluster)
   948  - Commit IDs are now present as env vars for jobs
   949  - Deletes and reads now work during job execution
   950  - pachctl inspect-* now returns much more information about the inspected objects
   951  - PipelineInfos now contain a count of job outcomes for the pipeline
   952  - Fixes to pachyderm and bazil.org/fuse to support writing a larger number of files
   953  - Jobs now report their end times as well as their start times
   954  - Jobs have a pulling state for when the container is being pulled
   955  - Put-file now accepts a -f flag for easier puts
   956  - Cluster restarts now work, even if kubernetes is restarted as well
   957  - Support for json and binary delimiters in data chunking
   958  - Manifests now reference specific pachyderm container version making deployment more bulletproof
   959  - Readiness checks for pachd which makes deployment more bulletproof
   960  - Kubernetes jobs are now created in the same namespace pachd is deployed in
   961  - Support for pipeline DAGs that aren't transitive reductions.
   962  - Appending to files now works in jobs, from shell scripts you can do `>>`
   963  - Network traffic is reduced with object stores by taking advantage of content addressability
   964  - Transforms now have a `Debug` field which turns on debug logging for the job
   965  - Pachctl can now be installed via Homebrew on macOS or apt on Ubuntu
   966  - ListJob now orders jobs by creation time
   967  - Openshift Origin is now supported as a deployment platform
   968  
   969  Content:
   970  
   971  - Webscraper example
   972  - Neural net example with Tensor Flow
   973  - Wordcount example
   974  
   975  Bug fixes:
   976  
   977  - False positive on running pipelines
   978  - Makefile bulletproofing to make sure things are installed when they're needed
   979  - Races within the FUSE driver
   980  - In 1.0 it was possible to get duplicate job ids which, that should be fixed now
   981  - Pipelines could get stuck in the pulling state after being recreated several times
   982  - Map jobs no longer return when sharded unless the files are actually empty
   983  - The fuse driver could encounter a bounds error during execution, no longer
   984  - Pipelines no longer get stuck in restarting state when the cluster is restarted
   985  - Failed jobs were being marked failed too early resulting in a race condition
   986  - Jobs could get stuck in running when they had failed
   987  - Pachd could panic due to membership changes
   988  - Starting a commit with a nonexistent parent now errors instead of silently failing
   989  - Previously pachd nodes would crash when deleting a watched repo
   990  - Jobs now get recreated if you delete and recreate a pipeline
   991  - Getting files from non existent commits gives a nicer error message
   992  - RunPipeline would fail to create a new job if the pipeline had already run
   993  - FUSE no longer chokes if a commit is closed after the mount happened
   994  - GCE/AWS backends have been made a lot more reliable
   995  
   996  Tests:
   997  
   998  From 1.0.0 to 1.1.0 we've gone from 70 tests to 120, a 71% increase.
   999  
  1000  ## 1.0.0 (5/4/2016)
  1001  
  1002  1.0.0 is the first generally available release of Pachyderm.
  1003  It's a complete rewrite of the 0.* series of releases, sharing no code with them.
  1004  The following major architectural changes have happened since 0.*:
  1005  
  1006  - All network communication and serialization is done using protocol buffers and GRPC.
  1007  - BTRFS has been removed, instead build on object storage, s3 and GCS are currently supported.
  1008  - Everything in Pachyderm is now scheduled on Kubernetes, this includes Pachyderm services and user jobs.
  1009  - We now have several access methods, you can use `pachctl` from the command line, our go client within your own code and the FUSE filesystem layer