github.com/rohankumardubey/proxyfs@v0.0.0-20210108201508-653efa9ab00e/release_notes.md (about) 1 # ProxyFS Release Notes 2 3 ## 1.17.0 (October 21, 2020) 4 5 ### Bug Fixes: 6 7 Resolved lack of randomness in pfs_middleware when selecting ProxyFS instance to 8 query to learn which ProxyFS instance is currently serving a Volume. If the first 9 ProxyFS instance selected just happens to be unreachable, a 503 status could be 10 returned by Swift Proxy to the HTTP Client. No amount of retries would previously 11 select a different ProxyFS instance. Now, the list is uniquely randomized on each 12 retry. 13 14 Resolved metadata defragmentation RESTful API (/meta-defrag) implementation's 15 inability to defragment the last 1% of metadata. 16 17 ### Features: 18 19 Added support for building Docker Image for SAIO. 20 21 Added support for inserting PFSAgent instances into a Docker Image. 22 23 Enabled use of fractional percentage ranges in metadata defragmentation RESTful 24 API (/meta-defrag?range=...-...). Since Volume metadata changes are frozen during 25 a defragmentation operation, this is useful when the total amount of metadata is 26 very large. 27 28 Enabled multiple file defragmentation RESTful API (/defrag/\<filepath\>) invocations 29 to proceed in parallel. Previously only one such would execute at a time. 30 31 ## 1.16.9 (August 7, 2020) 32 33 ### Bug Fixes: 34 35 With liveness checker enabled on multiple nodes, a race condition 36 could be hit in the "doFollower()" logic of the RaFT-lite cluster 37 formed among the ProxyFS instances resulting in a crash. 38 39 PFSAgent had a variety of errors returning stale file sizes and, 40 sometimes, content. 41 42 ### Notes: 43 44 Picked up support for Swift 2.25.0.5 in test VMs/containers. 45 46 ## 1.16.8 (July 28, 2020) 47 48 ### Bug Fixes: 49 50 Prevent premature invalidation of dirty file inode caching in 51 PFSAgent leading to subsequent incorrect (stale) read results. 52 53 Correct reporting of file size when a non-empty file is first 54 accessed by PFSAgent. 55 56 ### Features: 57 58 Log rotation events are not logged. 59 60 Memory fragmentation has been improved particularly in calls to 61 dump a configuration (package conf's ConfMap). 62 63 ## 1.16.7 (July 16, 2020) 64 65 ### Bug Fixes: 66 67 Avoid PFSAgent crashes when the lack of leases means it doesn't know 68 that the underlying file inode has been removed when it is lazily 69 flushing write data to it. 70 71 Provide a cap on remaining unbounded memory growth due to caching of 72 extent maps for files in PFSAgent. The new tunable that limits this 73 memory consumption is [Agent]ExtentMapEntryLimit. 74 75 ## 1.16.6 (July 13, 2020) 76 77 ### Bug Fixes: 78 79 Fixed memory leak in PFSAgent where file stats not recently accessed 80 were cached indefinitely. 81 82 Fixed race condition in retryrpc path (PFSAgent->ProxyFS) that could 83 be exposed when connections need to be reestablished and requests 84 retransmitted. 85 86 Updated pfs-swift-load benchmarking tool to issue flushes as the 87 file is closed so as not to overstate write performance for short 88 tests. 89 90 ### Features: 91 92 Added full /debug/pprof support in both ProxyFS and PFSAgent on their 93 standard embedded HTTP servers. 94 95 ## 1.16.4 (July 7, 2020) 96 97 ### Bug Fixes: 98 99 Opening a file with the O_TRUNC option in a PFSAgent-mounted file 100 system would fail to truncate a pre-existing file. 101 102 ## 1.16.3 (July 2, 2020) 103 104 ### Bug Fixes: 105 106 Memory leak in ProxyFS when PFSAgent is in use has been resolved. 107 108 PFSAgent's Swift Auth PlugIn now may be located in the provisioned 109 SAIO search path of both the root and vagrant users. 110 111 ### Features: 112 113 Detection of available space on each Swift device (Account, Container, 114 and Object) is now enabled. If any device utilization crosses either of 115 two thresholds, access to all Volumes will be restricted. The first 116 threshold prevents writes to files in the file system. Since there are 117 many more operations that actually consume device space (even deletes), 118 exceeding the second threshold converts all Volumes to be read-only. 119 The "/liveness" URL (in the embedded HTTPServer in ProxyFS) will now 120 also report the disk space utilization of each Swift device as well. 121 122 In addition to the one-node SAIO VM already present, a new three-node 123 SAIT ("Swift All In Three") set of VMs are now presented. This enables 124 testing ProxyFS Cluster functionality. 125 126 ### Notes: 127 128 This release also includes initial work in support of the Lease Management 129 feature that will ultimately enable multiple PFSAgent instances to safely 130 share read/write access to a common ProxyFS Volume. The new RPC is simply 131 called "Lease" (i.e. Server.RpcLease) but remains non-functional at this 132 point. Such functionality also includes an "upcall" mechanism such that 133 ProxyFS may inform a PFSAgent instance of important events requiring their 134 response (e.g. a conflicting Lease is needed by another PFSAgent instance, 135 or the Volume is leaving and requires the PFSAgent to unmount it). 136 137 ## 1.16.1 (May 26, 2020) 138 139 ### Bug Fixes: 140 141 Stale versions of PFSAgent were able to generate a connection storm against 142 ProxyFS resulting in exploding the number of open files for the ProxyFS 143 process. This update prevents that condition and generates log entries to 144 indicate which PFSAgent clients are out of date. 145 146 If one ProxyFS peer is stuck handling a SIGHUP, a subsequent ProxyFS instance 147 could indefinitely block in its SIGHUP simply waiting for the first instance 148 to resolve its issue (or be terminated). At most, this condition can now delay 149 the second ProxyFS by only one second (see Cluster.MaxRequestDuration). 150 151 ## 1.16.0 (May 12, 2020) 152 153 ### Features: 154 155 Added support for a pluggable Authentication mechanism in 156 PFSAgent. See the README.md files in both pfsagentd/ and 157 pfsagentd/pfsagentd-swift-auth-plugin/ for details. 158 159 Updated PFSAgent to tune mount initialization to obtimize parallel requests. 160 161 Object API will create directories and files readable and writeable 162 by anybody from the file side. 163 164 Added performance stats and config reporting to PFSAgent HTTPServer. 165 166 User Agent used in Swift API calls now customized to identify ProxyFS 167 and PFSAgent specifically. 168 169 ### Bug Fixes: 170 171 Added missing write flow control to PFSAgent to limit RAM footprint 172 and avoid overrunning Swift. 173 174 Fixed incorrect flushing during PFSAgent dismount. 175 176 ### Notes: 177 178 See new CONFIGURING.md to learn about configuring ProxyFS. 179 180 ## 1.15.5 (March 8, 2020) 181 182 ### Bug Fixes: 183 184 Corrected a B+Tree balance violation (package sortedmap) that could 185 result in worse than the theoretical worst-case metadata tree height. 186 187 Previously, if a file is `moved` over the top of another file, the 188 file being replaced would have its `LinkCount` reduced by one and, 189 yet, if the `LinkCount` reached zero would not be erased from the 190 volume. As such, the Inode and any of its referenced LogSegments 191 would remain in Swift unreferenced. This has now been corrected. 192 193 ### Notes: 194 195 The fix to the B+Tree imbalance bug includes a new pfs-fsck failure 196 being reported for currently imbalanced metadata trees. New volumes 197 created after the fix has been applied will not suffer from this 198 reported corruption that is, otherwise, only a performance impact. 199 200 The fix that cleans up an Inode when its `LinkCount` reaches zero, 201 once applied, will only correctly clean up Inodes deleted after 202 that point. Previously unlinked Inodes will, unfortunately, 203 remain in Swift unreferenced. 204 205 ## 1.15.4.5 (March 2, 2020) 206 207 ### Notes: 208 209 Added a new SwiftRetryDelayVariance setting to PFSAgent to perform 210 HTTP Retries at varying backoff delays. This avoids certain lock-step 211 pathologic cases when overrunning Swift. 212 213 ## 1.15.4.3 (February 24, 2020) 214 215 Fixed hangs in PFSAgent hit under heavy write loads. 216 217 Fixed stale file size reported by stat() for PFSAgent files that 218 have been expanded via `truncate --size`. 219 220 ## 1.15.4 (February 14, 2020) 221 222 ### Bug Fixes: 223 224 Fixed crash in ProxyFS when pruning acknowledged RPC replies. 225 226 ## 1.15.3 (February 13, 2020) 227 228 ### Bug Fixes: 229 230 A number of hang conditions in PFSAgent when servicing large numbers 231 of small file reads and writes have now been addressed. 232 233 A new mechanism used by PFSAgent to GET and PUT file data objects 234 (LogSegments) avoids the 404 Not Found responses as well as the 235 performance impact of unnecessarily invalidating the Swift memcache. 236 Swift version 2.24.0.2 is now minimally required. 237 238 Buffering RPC responses from ProxyFS to PFSAgent previously could 239 exhaust memory in extreme circumstances. This has now been addressed 240 by implementing aggressive pruning of this buffer on the ProxyFS end. 241 242 ## 1.15.2 (February 5, 2020) 243 244 ### Bug Fixes: 245 246 PFSAgent needing to retry GETs would possibly crash requiring a 247 restart. Retries should be expected from time to time. 248 249 PFSAgent GETS and PUTs to a Swift Proxy bypass the normal 250 translation performed by ProxyFS to gain direct access to 251 the LogSegments containing file data. This previously often 252 caused 404 Not Found responses due to overwhelming the memcached 253 and Swift Container servers for a variety of reasons. This update 254 avoids this behavior entirely. 255 256 ### Notes: 257 258 The aforementioned 404 Not Found fix required an update to 259 Swift itself. Hence, in order for PFSAgent to operate at 260 all, Swift must be at least at version 2.24 that includes 261 a new mechanism by which PFSAgent requests the Swift Proxy 262 to bypass consulting ProxyFS. 263 264 ## 1.15.1 (January 30, 2020) 265 266 ### Bug Fixes: 267 268 Overly short [Agent]SwiftTimeout settings resulted in PUT timeouts 269 from PFSAgent to Swift that were fatal. The new suggested timeout 270 moves from 20s to 10m to avoid this issue. 271 272 Very short [Agent]MaxFlushTime settings (e.g. 100ms) resulted in 273 a lockup of writes arriving for a file that was in the middle of 274 time-triggered flushes. For the previous suggestex timeout of 10s, 275 this was very rarely an issue because either clients would flush 276 their file writes long before this or they would easily exceed 277 MaxFlushSize (suggested value of 10MiB) data triggering a flush. 278 By lowering MaxFlushTime, time-triggered flushes would expose the 279 condition. This has now ben resolved. 280 281 ### Issues: 282 283 As mentioned above, [Agent]MaxFlushTime settings can be set very 284 low (e.g. 100ms). At such short durations, performance on small 285 file workloads dramatically improves. This is not well understood 286 yet why the system doesn't respond more gracefully to such small 287 file workloads, so PFSAgent users should make note to lower this 288 value in their previous configurations that probably set MaxFlushTime 289 to 10s. Indeed, with the aforemented fix for handling short 290 MaxFlushTime settings, the new suggested value is now 200ms. 291 292 Continuing from 1.15.0, PFSAgent currently cannot run successfully on 293 macOS due to some as yet unresolved support in package fission. 294 295 ## 1.15.0 (January 23, 2020) 296 297 ### Features: 298 299 PFSAgent now uses a new path to ProxyFS for metadata operations that 300 bypasses the hop through a Swift Proxy process. This new connection 301 is long lived and secured by TLS. Should this connection drop, it will 302 be reestablished in such a way that issued metadata requests are only 303 executed once. 304 305 PFSAgent now utilizes a new "fission" package enabling multithreaded 306 upcall servicing whereever possible. Linux will still, under some 307 circumstances, serialize what it deems potentially conflicting 308 operations. 309 310 ### Bug Fixes: 311 312 Various fixes in PFSAgent write path resulted in an incoherent ExtentMap 313 describing a file's contents in Swift such that a subsequent Read could 314 read invalid data. This also affected files that are truncated or later 315 extended. 316 317 ### Issues: 318 319 PFSAgent currently cannot run successfully on macOS due to some as yet 320 unresolved support in package fission. 321 322 ### Notes: 323 324 To expose the TLS Port utilized by PFSAgent, [JSONRPCServer]RetryRPCPort 325 specifies the port on the PublicIPAddr of the node hosting ProxyFS. This 326 port (on each Swift Proxy node running ProxyFS) must be made accessible 327 to any entity running PFSAgent. 328 329 ## 1.14.2.1 (December 6, 2019) 330 331 ### Bug Fixes: 332 333 Memory leak PFSAgent's read cache is fixed. This bug would present whenever 334 a file is larger than the read cache line size and a read was issued to the 335 portion of the file beyond the cache line size (assuming the file was written 336 sequentially to one or more LogSegments lartger than cache line size). 337 338 Note that an Out-Of-Memory ("OOM") condition is still entirely possible with 339 PFSAgent deployed as the only limits to its read cache memory consumption are 340 the ReadCacheLineSize and ReadCacheLineCount configuration parameters. 341 342 ## 1.14.2 (December 2, 2019) 343 344 ### Bug Fixes: 345 346 Allow `chmod` to work on a directory via PFSAgent. 347 348 ## 1.14.1 (November 26, 2019) 349 350 ### Features: 351 352 Added an online FSCK tool (pfs-fsck). Note that the tool will not "stop the 353 Add bucketstat counters to measure all checkpoint operations and how 354 long individual parts of checkpoint processing take. Add some bucketstat 355 counters to measure extent map lookups and updates for Read() and Write() 356 operations and their analogous object operations. Add bucketstat counters 357 to measure B+Tree flush operations. 358 359 Added pfs_middleware configuration data to /info resource. 360 361 ### Bug Fixes: 362 363 Significantly improve the performance of concurrent sync operations by 364 batching checkpoint operations. If multiple threads request a checkpoint, 365 only perform one checkpoint instead of one for each request. 366 367 Significantly improve the performance of the FUSE mountpoint by treating 368 a Flush() operation as a no-op, which it is. It does not imply any sort 369 of persistence guarantees. 370 371 Pick up sortedmap.TouchItem() fix in 1.6.1 (glide update). 372 373 Fix a few bugs in confgen. 374 375 ## 1.13.4 (October 30, 2019) 376 377 ### Bug Fixes: 378 379 Changes to runway environment for pfsagent mount points and allow users 380 to easily enable/disable core dumps. 381 382 Fixes to confgen search for template files. 383 384 ## 1.13.0 (October 28, 2019) 385 386 ### Features: 387 388 Add confgen tool to generate SMB, VIP, NFS and FUSE configuration files 389 from a proxyfs configuration 390 391 ### Bug Fixes: 392 393 Fix a bug in ProxyFS in retry of chunked put operations that caused 394 a panic. 395 396 Sundry PFSAgent bug fixes. 397 398 ## 1.12.2 (September 19, 2019) 399 400 ### Bug Fixes: 401 402 Removed an unnecessary checkpoint performed before each PFSAgent 403 RpcWrote operation that is generated as each LogSegment is PUT 404 to Swift. The prior behavior put a strain on the checkpointing 405 system when a large set of small files are uploaded via PFSAgent 406 exposed FUSE mount points. Note that explicit flushes (fsync() 407 or fdatasync() calls) will still trigger a checkpoint so that 408 ProxyFS/PFSAgent can honor the request faithfully. 409 410 ## 1.12.1 (September 12, 2019) 411 412 ### Bug Fixes: 413 414 The "mount retry" fix was actually misnamed. What it actually does is 415 attempt to issue multiple Checkpoint HEAD requests and achieve agreement 416 on what is returned by a majority quorum. In each such HEAD request, 417 a retry logic pre-existed where anything other than a "200 OK" would 418 be retried... hence the overloading of the term "retry". Anyway, in the 419 case where `mkproxyfs` is being run to format a previously empty Swift 420 Account, the retry logic could take a very long time (e.g. *minutes*) to 421 give up. Thus, by performing each of those Checkpoint HEAD requests *in 422 series*, the formatting process could take an excessive amount of time 423 just coming to the decision that the Swift Account needs to be formatted. 424 This fix simply performs those quorum Checkpoint HEAD requests (that each 425 will be retried a number of times) *in parallel* thus returning the total 426 time for the format back to what it was prior to the 1.12.0 "mount retry" 427 fix. 428 429 ## 1.12.0 (September 11, 2019) 430 431 ### Features: 432 433 Added an online FSCK tool (pfs-fsck). Note that the tool will not "stop the 434 world". As such, it is possible for it to report false positives for missing 435 objects (both metadata and file data). Re-running the tool will typically 436 note those false positives were false and move on... but perhaps find more. 437 As such, the tool is really only reliably able to avoid announcing false 438 positives on an otherwise idle volume. 439 440 Implemented support for a "Recycle Bin" mechanism where objects are simply 441 marked as being in the Recycle Bin rather than aggressively deleted. If an 442 attempt is made to later access the object, a log message will report this 443 and statistics bumped to indicate the condition that would have been a file 444 system corruption had the object actually been deleted. Note that this feature 445 is normally disabled. 446 447 Layout Report has been streamlined to, by default, report the "running count" 448 of the metrics rather than actually counting them (which would involve paging 449 in all the metadata...a potentially very time consuming activity that must 450 proceed while the "world is stopped"). It is still possible to do the brute 451 force "count"... and note any discrepencies with the "running count" previously 452 reported. 453 454 The pfs-swift-load tool has been enhanced to support arbitrarily deep paths. 455 This is provided to demonstrate specifically the impact of file path length 456 on SMB performance. 457 458 PFSAgent, like ProxyFS itself, now supports (at least) an HTTP query to report 459 the running version. This may also be used to confirm that PFSAgent is up and 460 operating on a particular Volume. 461 462 ### Bug Fixes: 463 464 Addressed stale/cached stat results for (FUSE and) NFS mount points by shortening 465 the timeout...with the tradeoff that this could increase overhead of certain 466 metadata querying operations somewhat. 467 468 Added support for HTTPS in ProxyFS Agent. Previously, Swift Proxies behind a 469 TLS-terminating Load Balancer would result in an attempt to use non-HTTPS 470 connections following authentication via the Swift API. 471 472 Added volume mount retry logic to handle the theoretical case where a stale 473 Checkpoint Header is returned during the Volume Mount process. 474 475 ### Notes: 476 477 Added logging of each Checkpoint Header operation. This amounts to a log message 478 generated once every ten seconds per Volume typically...though clients issuing 479 FLUSH/SYNC operations may accellerate this. 480 481 In addition, periodic stats are logged as well (in addition to the limited set 482 regularly reported earlier). Default interval is 10 minutes per Volume. 483 484 ## 1.11.2 (July 28, 2019) 485 486 ### Notes: 487 488 This is a small delta from 1.11.1 to temporarily disable deletion of thought-to-be 489 unreferenced objects in the checkpoint container. A working theory of one such 490 issue is that a object holding metadata for a volume was inadvertantly thought to 491 no longer be referenced. As such, the object was scheduled for deletion. Upon a 492 subsequent re-mount, the object could not be found and the re-mount failed. 493 494 A new (temporary) boolean setting in the FSGlobals section titled MetadataRecycleBin 495 will, for now, default to TRUE and, instead of issuing DELETEs on objects in the 496 checkpoint container thought to now be unreferenced, a new header will be applied 497 to them titled RecycleBin (with a value of true). 498 499 ## 1.11.1 (June 28, 2019) 500 501 ### Features: 502 503 Support for storing Volume Checkpoints in an ETCD instance are now supported. 504 This feature is enabled by setting an optional key to true along with other 505 keys specifying the endpoints of the ETCD instance and the name of the key to 506 use for each Volume's Checkpoint. 507 508 ### Bug Fixes: 509 510 Object PUTs over existing file system directories behavior has been corrected. 511 For instance, now a PUT over an empty directory will replace it. 512 513 ## 1.11.0 (June 13, 2019) 514 515 ### Features: 516 517 Adapted to latest (2.21.0.4) version of Swift. 518 519 Alpha version of PFSAgent is now available. This tool presents a FUSE mount point 520 of a Volume served by a ProxyFS instance by using the new PROXYFS HTTP Method of 521 (pfs_middleware in) Swift. 522 523 ### Bug Fixes: 524 525 The COALESCE HTTP method used by S3 Multi-Part Upload had a number of deficiencies. 526 Most visibly, when COALESCE was issued to overwrite an existing object, metadata 527 was not applied correctly. 528 529 Objects formed via COALESCE now have a correctly set non-MD5 ETAG. 530 531 GET issued to a Container specifying a marker now returns the objects in the 532 Container following the marker. Previously, if the marker indicated an Object 533 in the Container's directory (i.e. rather than a subdirectory of the Container), 534 the list of following Objects would be empty. 535 536 SnapShotPolicy changes now picked up during SIGHUP. 537 538 ### Notes: 539 540 ProxyFS typically should not expect errors coming from Swift. Indeed, the only 541 expected errors are during Volume/Account formatting as the "mkproxyfs" tool 542 attempts to verify that the underlying Account is, indeed, pristine. But other 543 unexpected errors were logged along with being retried. This release ensures 544 that any payload returned by Swift with bad HTTP Status is also logged. 545 546 ## 1.10.0 (March 12, 2019) 547 548 ### Bug Fixes: 549 550 A number of Swift and S3 operations are necessarily path-based. Hence, although 551 the RESTful Object-based APIs purport to be atomic, this is actually impossible 552 to honor in the presence of File-based access. In attempts to retain the atomicity 553 of the RESTful Object-based APIs, several deadlock conditions were unaddressed. 554 One such deadlock addressed in both 1.8.0.6 and 1.9.2/1.9.5 involved the contention 555 between the COALESCE method that references an unbounded number of file paths with 556 other path and Inode-based APIs. This release addresses in a much more encompassing 557 way all the contentions that could arise within and amongst all of the path and 558 Inode-based APIs. 559 560 The SnapShotPolicy feature introduced in 1.8.0 was inadvertantly disabled in 1.9.5 561 due to the conversion to the new transitions package mechanism introduction. As such, 562 only explicitly created SnapShots would be created. This release restores the ability 563 to schedule SnapShots via a per-Volume SnapShotPolicy. 564 565 ### Notes: 566 567 Lock tracking capabilities have been significantly enhanced that will provide 568 several mechanisms with which both deadlocks and unusually high latency conditions 569 can be examined with a performant mechanism that, when enabled, clearly reports 570 what contentions are at the root of the observed condition. This instrumentation 571 includes instrumenting both the various Mutexes that serialize data structure 572 access as well as per-Inode locks that serialize client operations. 573 574 A modern file system such as ProxyFS is obliged to support nearly unbounded 575 parameters governing nearly every enumerated aspect. To clients, this includes 576 support for things like extremely large files as well as a huge number of such 577 files. Internally, management of extremely fragmented files is also a demanding 578 requirement. Despite this, constrained compute environments should also be supported. 579 This release introduces support for particularly the 32-bit architecture of Arm7L 580 based systems. 581 582 ### Issues: 583 584 While not a new issue, the release focused on exposing the inherent incompatibility 585 between path-based RESTful Object APIs and File Access. Indeed, such issues are 586 impossible to fully accomodate. In addition, a key feature of the S3 API is support 587 for so-called multi-part uploads. This is accomplished by clients uploading each 588 part - perhaps simultaneously - to unique Objects. Once all parts have been uploaded, 589 a Multi-Part Put Complete operation is performed that requests that, logically, all 590 of the parts are combined to form the resultant single Object. Support for this fianl 591 step is implemented by means of a new COALESCE HTTP Method effectively added to the 592 OpenStack Swift API. Unfortunately, this is where the "impedance mismatch" between 593 the hierarchical nature of the File System clashes with the "flat" nature of an 594 Object API such as S3 (and, for that matter, OpenStack Swift). 595 596 The key issue is how to represent a Directory in the File System hierarchy. At this 597 point, a Container (or Bucket) listing (via GET) will report both Objects (Files) 598 and Directories. This is in conflict with an Object-only system that lacks any sort 599 of Directory Inode concept. Indeed, several typical client operations are confused 600 by the presence of Directories in the Container/Bucket listing (not the least of 601 which is the widely used BOTO Python library used in S3 access). This conflict 602 remains in the current release. 603 604 ## 1.9.5 (February 13, 2019) 605 606 ### Features: 607 608 Made StatVfs() responses configurable. Note that values 609 for space (total, free, available) continue to be set 610 to artificial values... but at least they are now settable. 611 612 ### Bug Fixes: 613 614 Re-worked logic for various path-based operations (i.e. 615 operations invoked via Swift or S3 APIs) could result in 616 a deadlock when combined with file-based operations. These 617 have now largely been resolved (see Issues section for 618 details on what is not). 619 620 Modified pfsconfjson{|packed} to auto-upgrade supplied 621 .conf files to report the VolumeGroup-translated form 622 (if up-conversion would be done by ProxyFS itself). 623 624 COALESCE method, when targeted at an existing Object, 625 would previously fail to recover the overwritten Object's 626 space. 627 628 Various pre-VolumeGroup->VolumeGroup auto-upgrade patterns 629 could result in false reporting of a Volume being moved to 630 a VolumeGroup where it already exists in response to SIGHUP. 631 Just restarting ProxyFS would not see this issue. 632 633 ### Issues: 634 635 While much work was completed towards avoiding deadlock 636 situations resulting from path-based (i.e. Swift/S3 API) 637 operations, the work is not yet complete. A GET on a 638 Container/Bucket that recurses could still result in a 639 deadlock but the bug fix for this case largely closes 640 that window. A PUT also has the potential for another 641 deadlock situation that is equally very unlikely. No test 642 case has been able to expose these remaining deadlocks 643 so they remain theoretical. 644 645 ## 1.9.2 (January 18, 2019) 646 647 ### Bug Fixes: 648 649 Resolved race condition when simultaneous first references to 650 an `Inode` are executed resulting in a lock blocking any further 651 access to the `Inode`. This condition was frequently seen when 652 attempting a multi-part upload via the S3 or Swift HTTP APIs 653 as it would be typical/expected that the uploading of all the 654 parts of a new `file` would begin roughly at the same time. 655 656 It is now possible to perform builds and unit tests on the 657 same node where an active ProxyFS session is in operation. 658 Previously, identical TCP and UDP Ports were being used by 659 default leading to bind() failures. 660 661 ### Features: 662 663 Updated to leverage Golang 1.11.4 features. 664 665 Added support for `X-Object-Sysmeta-Container-Update-Override-Etag`. 666 667 Added support for `end-marker` query params. 668 669 Added support for fetching a `ReadPlan` for a given file 670 via the HTTP interface. The `ReadPlan` may then be used to 671 HEAD or GET the individual `LogSegments` that, when stitched 672 together, represent the contents of the file. 673 674 Liveness monitoring now active among all ProxyFS instances 675 visable via JSON response to an HTTP Query on the embedded 676 HTTP Server's Port for `/liveness`. 677 678 Added support for VolumeGroups where the set of Volumes in a 679 VolumeGroup are what is assigned to a Peer/Node rather than 680 each individual Volume. 681 682 ## 1.8.0.7 (January 28, 2019) 683 684 ### Features: 685 686 The response to fsstat(1), statfs(2), and statvfs(3) returns 687 capacities that were hardcoded (e.g. Total Space of 1TiB). While 688 it is currently not possible for such reporting to represent 689 actual capacities, new config values are available to adjust the 690 reported values. In addition, the defaults have been increased 691 (e.g. Total Space is now reported as 100 TiB). 692 693 ## 1.8.0.6 (January 23, 2019) 694 695 ### Bug Fixes: 696 697 A race condition triggered by e.g. multi-part uploads could 698 render the targeted portion of a file system indefinitely 699 locked requiring a restart of the proxyfsd daemon. This release 700 prevents this race condition. 701 702 ## 1.8.0.5 (November 27, 2018) 703 704 ### Bug Fixes: 705 706 Fix a bug that caused proxyfsd to leak "connection slots" from 707 the ChunkedConnectionPool or the NonChunkedConnectionPool if 708 the noauth proxy was not running when proxyfs tried to open a 709 new connection (in swiftclient.acquireChunkedConnection() and 710 swiftclient.acquireNonChunkedConnection()). This could happen during 711 a reload of the noauth proxy when a new configuraiton is pushed. 712 713 ## 1.8.0.4 (November 12, 2018) 714 715 ### Bug Fixes: 716 717 Fix a bug that caused proxyfsd to exit if the SwiftStack controller 718 deleted a file system using its two step process of first inactivating 719 it (one reconfig event) and then deleting it (second reconfig event). 720 In that case, the UnregisterForEvents() would be called twice and it 721 would call logger.Fatalf() to complain, causing Samba (smbd) to exit 722 and making the SMB client unhappy. 723 724 ## 1.8.0.3 (November 9, 2018) 725 726 ### Bug Fixes: 727 728 Fix a bug that caused proxyfsd to exit if the noauth proxy server 729 restarted while file system i/o to Swift was in progress. In particular 730 this could happen during a reconfig triggered by a controller config push. 731 732 ## 1.8.0.2 (October 19, 2018) 733 734 ### Bug Fixes: 735 736 Fix a bug in the snapshot code that generated snapshot names that 737 contained a colon which Windows SMB clients find hard to cope with. 738 739 ## 1.8.0.1 (October 6, 2018) 740 741 ### Bug Fixes: 742 743 Fix a bug introduced in 1.8.0 that triggered a NULL pointer dereference 744 if an HTTP GET request specified a byte range without an ending offset. 745 746 ## 1.8.0 (September 30, 2018) 747 748 ### Features: 749 750 Add filesystem snapshots which are created and destroyed based on policies 751 that are applied to each file system. Snapshots are accessible via the 752 newly created "/.snapshot/<snapshot_name>" directory in the root of each 753 filesystem. 754 755 Add bucketized statistics for package swiftclient, fs, and headhunter. 756 The statistics can be queried via the built-in web server, using 757 the URL "//localhost:15346:/stats" in runway environments and 758 "//<private_IPaddr>:1534:/stats" in stand alone environments, where 759 <private_IPaddr> is the private IP address used for the backend network. 760 761 Note: the bucketized statistics API, output format, and URL is unstable 762 and changing quickly. It will be different in the next release. 763 764 Change config file format (for the *.conf files) so that user defined 765 names for: 766 Volume 767 PhysicalContainerLayout 768 FlowControl 769 Peer 770 are now preceeded by one of the strings "Volume:", 771 "PhysicalContainerLayout:", "FlowControl:", or "Peer:", as appropriate, 772 to insure that names are unique. 773 774 Initial work on a "liveness detector" to determine whether 775 ProxyFS/Samba/NFS are currently up and serving file requests. This is 776 part of an HA solution that we are developing. 777 778 ### Bug Fixes: 779 780 Fix a bug that could cause data corruption if a file write sent via 781 the FUSE or NFS interface failed on the first PUT attempt and had to be 782 retried, in which case the PUT could be retried with incorrect data for 783 the file. This bug was exacerbated by the next bug, which could cause 784 PUT requests to exceed the 60 sec server timeout deadline. 785 786 Fix a bug where "log segments" containing data written to files 787 were not closed and flushed after the 10 sec deadline (a bug in 788 inFlightFileInodeDataFlusher() that could extend the PUT request beyond 789 the Swift server's timeout deadline of 60 sec). 790 791 Fix a bug where ProxyFS would either move to a new container for file 792 log segments too quickly ("MaxObjectsPerContainer" was not being checked 793 against correctly). 794 795 Insure that ProxyFS will not accept requests via the FUSE interface 796 while its re-reading its configuration (could lead to corruption). 797 798 Add additional units tests for sequential writes. 799 800 Improvements to the mock Swift testing environment, ramswift, to free 801 memory when ramswift is restarted within a test. 802 803 Update generatedfiles Makefile target to make files that are now 804 necessary. 805 806 Reworked proxyfsd daemon startup logic to avoid possible race conditions 807 during startup. 808 809 Reworked ramswift daemon startup logic to avoid race conditions sometimes 810 hit when running tests. 811 812 ### Notes 813 814 * With the advent of Golang v1.11, new support for WebAssembly has arrived. To build for a 815 WebAssembly target, setting GOOS=js and GOARCH=wasm is required. Unfortunately, the arrival 816 of these two new values ("js" and "wasm"), Go source files ending with _js.go, in particular, 817 will only be compiled if GOOS=js has been set. Previously (Golang v1.10 and prior), a file 818 ending in _js.go would always be included. This release captures a name change to various 819 static files generated in package httpserver by adding an underscore ("_") just before ".go" 820 to avoid this new Golang behavior. 821 822 * Enhance the swiftclient chunked put unit tests to try and cover 823 concurrency and many more failure/retry scenarios. Add a new config file 824 variable in the `SwiftClient` section, `ChecksumChunkedPutChunks`` which 825 defaults to 'false'. If set to `true` then data cached in a chunked put 826 connection has a checksum computed when it is Sent and checked frequently 827 on subsequent operations. 828 829 ## 1.7 (September 6, 2018) 830 831 ### Bug Fixes: 832 833 * Fix panic in inode cache discard thread. 834 * Rework flush logic to fix deadlocks when connections are exhausted. 835 836 ## 1.6.4 (July 24, 2018) 837 838 ### Features: 839 840 * Added support for configuring the NoAuth Swift Proxy to an IP Address other than a default of (IPv4) localhost (127.0.0.1). Note that in the future, the defaulting to localhost may be removed, so users should take care to specify the NoAuth Swift Proxy IP Address in their configurations in the future. 841 * Added support for the "delimiter=" option for Swift API GET requests. This, along with the "prefix=" option, enables viewing of the directory hierarchy inside a Container in the same way one would view a file system. 842 * Added support for SnapShots. These are invoked via the RESTful API exposed by the embedded HTTP Server inside each proxyfsd instance. Note that this API is only reachable via the PrivateIPAddr and performs not authentication/authorization. You can find the new RESTful methods underneath the /Volume/<volumeName> resource (along with FSCK, Scrub, and LayoutMap). Both JSON (textual) and formatted HTML is available. 843 * Added support for viewing a FileInode's ExtentMap via this same RESTful API underneath the /Volume/<volumeName> resource (adjacent to the above-mentioned SnapShot resource). 844 * RPC Timeouts from pfs_middleware to proxyfsd have new defaults and are controllable via two distinct parameters optionally specified in the [filter:pfs] section of the proxy-server.conf: 845 846 > rpc_finder_timeout 847 > specified in (floating point) seconds 848 > defaults to 3.0 849 > applies when searching for a proxyfsd instance to ask where a particular Swift Account is being served 850 851 > rpc_timeout 852 > specified in (floating point) seconds 853 > defaults to 30.0 854 > applies to requests to the specific proxyfsd instance serving a particular BiModal Swift Account 855 856 ### Bug Fixes: 857 858 * SMB clients making multiple mounts to the same Samba/ProxyFS instance could encounter all sessions/mounts being closed when requesting any one of them to terminate. This has now been corrected. 859 * Previously, a cache of Inode structures did not support eviction. As a result, a very large number of Inodes accessed since a ProxyFS instance was started could exhaust memory. To address this, a new background thread discards non-dirty Inodes from the Inode Cache. The behavior of the Inode Cache eviction thread is tuned by: 860 > MaxBytesInodeCache - defaults to 10485760 (10MB) 861 > InodeCacheEvictInterval - defaults to 1s (disabled if 0s) 862 863 ### Known Issues: 864 865 * As of this version, the metadata format has been updated from V2 to V3 in support of the SnapShot functionality. Unfortunately there is no going back. Once a V2 volume is mounted, it is immediately upgraded to V3 despite not (yet) having any SnapShots declared. 866 867 ## 1.5.3 (April 3, 2018) 868 869 Ignore SIGPIPE, SIGCHLD, and some other signals that were causing 870 proxyfsd to exit when it shouldn't. 871 872 ## 1.5.2 873 874 there is no release 1.5.2. 875 876 ## 1.5.1 (March 30, 2018) 877 878 Partially botched 1.5.0 release packaging. 879 880 ## 1.5.0 (March 30, 2018) 881 882 Move to go version 1.10. 883 884 Significant improvements to fsck performance and what it validates. 885 Fsck now detects and cleans up unreferenced objects in the checkpoint 886 container. Fix a bug in fsck that caused sparse files to be flagged 887 corrupt and deleted. Complementary to fsck, add "scrub" jobs that 888 validate the object maps (extent maps) for files. 889 890 Fix a bug that caused B+Tree nodes to become quite large, which had 891 a significant performance impact on large file systems. 892 893 Prettify the http pages generated by proxyfs. 894 895 ## 1.4.1 (March 6, 2018) 896 897 Fix a bug in the B+Tree code that caused old objects in the 898 .__checkpoint__ container to become unreferenced instead of deleted. 899 900 Support Travis continuous integration testing on github. 901 902 ## 1.3.0 (February 16, 2018) 903 904 ## 1.2.0 (January 30, 2018) 905 906 ### Bug Fixes: 907 908 * Support for hidden SMB Shares now available 909 910 ### Notes: 911 912 * Development environment now enhanced with pin'd versions of dependencies 913 * Preliminary work for supporting an RPO of Zero in place but inactive 914 915 ## 1.1.1 (January 2, 2018) 916 917 ### Bug Fixes: 918 919 * Submodule vfs now no longer depends upon pre-installed submodule jrpcclient 920 921 ## 1.1.0 (January 2, 2018) 922 923 ### Notes: 924 925 * ProxyFS now built with a standard Makefile (obsoleting regression_test.py) 926 927 ## 1.0.3 (December 4, 2017) 928 929 ### Bug Fixes: 930 931 * Fix cross-container DLO authorization 932 933 ### Known Issues: 934 935 * Metadata Recovery Point Objective ("RPO") is non-zero (except for file flush operations) 936 937 ## 1.0.2 (December 4, 2017) 938 939 ### Bug Fixes: 940 941 * Segment fault while handling SIGHUP during log rotation and volume migration 942 943 ### Known Issues: 944 945 * Metadata Recovery Point Objective ("RPO") is non-zero (except for file flush operations) 946 947 ## 1.0.1 (December 1, 2017) 948 949 ### Features: 950 951 * Added support for "async" flush in SMB (allows multiple simultaneous flushes) 952 953 ### Bug Fixes: 954 955 * Above "async" flush resolves SMB 2 and above write issues with "strict sync = yes" setting in smb.conf 956 957 ### Known Issues: 958 959 * Metadata Recovery Point Objective ("RPO") is non-zero (except for file flush operations) 960 961 ## 1.0.0 (November 29, 2017) 962 963 ### Features: 964 965 * Source is now available on GitHub 966 * Added support for S3 Multi-part Uploads 967 * Formatting of a Volume File System now made explicit with `mkproxyfs` tool 968 * Volumes may be added and removed (via SIGHUP) without restarting ProxyFS 969 * Configuration files now allow identical section names for different types of sections 970 * Support added for a distinct Storage Policy for metadata 971 * New RESTful API added for FSCK management via HTTP 972 973 ### Bug Fixes: 974 975 * Recover trapped resources in Samba when ProxyFS halts 976 * Fixed memory leaks and slow performance during RoboCopy 977 * Resolved unbounded memory consumption as file systems grow 978 * Fix for ctime not being updated during various operations 979 * Fix for missing first file in a directory if it would sort before "." 980 * Specification of a Storage Policy for file data now honored 981 * Corruption following metadata checkpoint failures now halted 982 983 ### Known Issues: 984 985 * Metadata Recovery Point Objective ("RPO") is non-zero (except for file flush operations) 986 987 ## 0.55.0 (October 30, 2017) 988 989 ### Features: 990 991 * Caching of metadata in RAM now configurable 992 * Samba parameters now specified via the standard /etc/samba/smb.conf mechanism 993 994 ### Bug Fixes: 995 996 * Fixed memory leaks in readdir() APIs issued via SMB 997 * Fixed metadata on objects set via Swift/S3 API 998 999 ### Known Issues: 1000 1001 * Named Streams are disabled in SMB (enabling this is TBD) 1002 * Upgrading metadata checkpointing from V1 to V2 experiences process hangs in some cases 1003 1004 ## 0.54.1 (October 10, 2017) 1005 1006 ### Features: 1007 1008 * Updates to HTTP COALESCE Method 1009 * Improved flushing of affected Swift connections during SIGHUP (reload) 1010 * Improved dataflow during high number of unflushed open file traffic 1011 1012 ### Bug Fixes: 1013 1014 * Resolved memory leaks in Samba processes during heavy Robocopy activity 1015 * Resolved potential deadlock for unflushed files that are removed 1016 * Hardened error handling between Samba & ProxyFS processes 1017 1018 ### Known Issues: 1019 1020 * Named Streams are disabled in SMB (enabling this is TBD) 1021 * Upgrading metadata checkpointing from V1 to V2 experiences process hangs in some cases 1022 1023 ## 0.54.0 (October 3, 2017) 1024 1025 ### Features: 1026 1027 * Improved Object ETag MD5 handling 1028 * Object SLO uploads converted to COALESCE'd Objects/Files 1029 1030 ### Bug Fixes: 1031 1032 * Non BiModal Accounts remain accessible even when no ProxyFS nodes are available 1033 1034 ### Known Issues: 1035 1036 * Named Streams are disabled in SMB (enabling this is TBD) 1037 * Upgrading metadata checkpointing from V1 to V2 experiences process hangs in some cases 1038 1039 ## 0.53.0.3 (September 29, 2017) 1040 1041 ### Features: 1042 1043 * Added statistics logging 1044 1045 ### Bug Fixes: 1046 1047 * Fixed BiModal IP Address reporting following SIGHUP reload 1048 * Fixed issue with large transfers causing Swift API errors 1049 1050 ### Known Issues: 1051 1052 * Named Streams are disabled in SMB (enabling this is TBD) 1053 * Upgrading metadata checkpointing from V1 to V2 experiences process hangs in some cases 1054 1055 ## 0.53.0.2 (September 19, 2017) 1056 1057 Note: This was just a re-tagging of 0.53.0.1 1058 1059 ### Known Issues: 1060 1061 * Named Streams are disabled in SMB (enabling this is TBD) 1062 * Upgrading metadata checkpointing from V1 to V2 experiences process hangs in some cases 1063 1064 ## 0.53.0.1 (September 15, 2017) 1065 1066 ### Features: 1067 1068 * Added support for Samba version 4.6 1069 1070 ### Bug Fixes: 1071 1072 * Fixed memory leak in smbd resulting from a closed TCP connection to proxyfsd 1073 1074 ### Known Issues: 1075 1076 * Named Streams are disabled in SMB (enabling this is TBD) 1077 * Upgrading metadata checkpointing from V1 to V2 experiences process hangs in some cases 1078 1079 ## 0.53.0 (September 11, 2017) 1080 1081 ### Features: 1082 1083 * Added avaibility improvements for ProxyFS Swift clusters to continue when a ProxyFS node is down 1084 * Significantly improved logging during startup and shutdown 1085 * New `mkproxyfs` tool now available to format Volumes (Swift Accounts) 1086 1087 ### Bug Fixes: 1088 1089 * Embedded HTTP Server now reports current configuration once startup/restart (SIGHUP) completes 1090 * HTTP Head on ProxyFS-hosted Objects now returns proper HTTPStatus 1091 * Resolved incomplete file locking semantics for SMB 1092 * Resolved issue where a file being written is deleted before its data has been flushed 1093 * Corrected behavior of readdir() enabling callers to bound the size of the returned list 1094 * Corrected permissions checking & metadata updating 1095 * Resolved NFS (FUSE) issue where the underlying file system state failed to reset during restart 1096 * Resolved SMB (smbd) memory leak resulting from unmount/remount sequence 1097 1098 ### Known Issues: 1099 1100 * SMB (smbd) memory leaks resulting from restarting the ProxyFS process (proxyfsd) underneath it 1101 * Named Streams are disabled in SMB (enabling this is TBD) 1102 * Upgrading metadata checkpointing from V1 to V2 experiences process hangs in some cases 1103 1104 ## 0.52.0 (August 21, 2017) 1105 1106 ### Features: 1107 1108 * Support for disabling volumes added 1109 1110 ### Bug Fixes: 1111 1112 * Fixed missing flushing of modified files leading to zero-lengthed files 1113 1114 ### Known Issues: 1115 1116 * Named Streams are disabled in SMB (enabling this is TBD) 1117 * Upgrading metadata checkpointing from V1 to V2 experiences process hangs in some cases 1118 1119 ## 0.51.2 (August 15, 2017) 1120 1121 ### Features: 1122 1123 * Improved metadata checkpointing mechanism (V2) performs optimized garbage collection 1124 1125 ### Bug Fixes: 1126 1127 * Fixed clean-up of FUSE (and NFS) mount point upon restart after failure 1128 * Fixed memory leaks in SMBd for readdir(), getxattr(), chdir() and list xattr 1129 * Fixed race condition between time-based flushes and on-going write traffic 1130 * Fixed multi-threaded socket management code in resolving DNS names 1131 * Fixed missing support for file names containing special characters 1132 1133 ### Known Issues: 1134 1135 * Named Streams are disabled in SMB (enabling this is TBD) 1136 * Upgrading metadata checkpointing from V1 to V2 experiences process hangs in some cases 1137 1138 ## 0.51.1 (August 3, 2017) 1139 1140 ### Features: 1141 1142 * Enhanced tolerance for intermittent Swift errors 1143 * Read Cache now consumes a configurable percentage of available memory 1144 * Flow Controls now get a weighted fraction of total Read Cache memory 1145 * Configuration reload now supported via SIGHUP signal 1146 1147 ### Bug Fixes: 1148 1149 * Fixed embedded HTTP Server handling of "empty" URLs 1150 * Removed memory leaks in SMB handling 1151 * Resolved potential corruption when actively written files are flushed 1152 1153 ### Known Issues: 1154 1155 * Memory leak in SMB directory reading and extended attribute reading 1156 * Process restart may leave NFS mount point in a hung state 1157 * Named Streams are disabled in SMB (enabling this is TBD)