github.com/charlievieth/fastwalk@v1.0.3/README.md (about)

     1  [![GoDoc](https://img.shields.io/badge/godoc-reference-blue.svg)](https://pkg.go.dev/github.com/charlievieth/fastwalk)
     2  [![Test fastwalk on macOS](https://github.com/charlievieth/fastwalk/actions/workflows/macos.yml/badge.svg)](https://github.com/charlievieth/fastwalk/actions/workflows/macos.yml)
     3  [![Test fastwalk on Linux](https://github.com/charlievieth/fastwalk/actions/workflows/linux.yml/badge.svg)](https://github.com/charlievieth/fastwalk/actions/workflows/linux.yml)
     4  [![Test fastwalk on Windows](https://github.com/charlievieth/fastwalk/actions/workflows/windows.yml/badge.svg)](https://github.com/charlievieth/fastwalk/actions/workflows/windows.yml)
     5  
     6  # fastwalk
     7  
     8  Fast parallel directory traversal for Golang.
     9  
    10  Package fastwalk provides a fast parallel version of [`filepath.WalkDir`](https://pkg.go.dev/io/fs#WalkDirFunc)
    11  that is \~2x faster on macOS, \~4x faster on Linux, \~6x faster on Windows,
    12  allocates 50% less memory, and requires 25% fewer memory allocations.
    13  Additionally, it is \~4-5x faster than [godirwalk](https://github.com/karrick/godirwalk)
    14  across OSes.
    15  
    16  Inspired by and based off of [golang.org/x/tools/internal/fastwalk](https://pkg.go.dev/golang.org/x/tools@v0.1.9/internal/fastwalk).
    17  
    18  ## Features
    19  
    20  * Fast: multiple goroutines stat the filesystem and call the
    21    [`filepath.WalkDirFunc`](https://pkg.go.dev/io/fs#WalkDirFunc) callback concurrently
    22  * Safe symbolic link traversal ([`Config.Follow`](https://pkg.go.dev/github.com/charlievieth/fastwalk#Config))
    23  * Same behavior and callback signature as [`filepath.WalkDir`](https://pkg.go.dev/path/filepath@go1.17.7#WalkDir)
    24  * Wrapper functions are provided to ignore duplicate files and directories:
    25  	[`IgnoreDuplicateFiles()`](https://pkg.go.dev/github.com/charlievieth/fastwalk#IgnoreDuplicateFiles)
    26  	and
    27  	[`IgnoreDuplicateDirs()`](https://pkg.go.dev/github.com/charlievieth/fastwalk#IgnoreDuplicateDirs)
    28  * Extensively tested on macOS, Linux, and Windows
    29  
    30  ## Usage
    31  
    32  Usage is the same as [`filepath.WalkDir`](https://pkg.go.dev/io/fs#WalkDirFunc),
    33  but the [`walkFn`](https://pkg.go.dev/path/filepath@go1.17.7#WalkFunc)
    34  argument to [`fastwalk.Walk`](https://pkg.go.dev/github.com/charlievieth/fastwalk#Walk)
    35  must be safe for concurrent use.
    36  
    37  Examples can be found in the [examples](./examples) directory.
    38  
    39  <!-- TODO: this example is large move it to an examples folder -->
    40  
    41  The below example is a very simple version of the POSIX
    42  [find](https://pubs.opengroup.org/onlinepubs/007904975/utilities/find.html) utility:
    43  ```go
    44  // fwfind is a an example program that is similar to POSIX find,
    45  // but faster and worse (it's an example).
    46  package main
    47  
    48  import (
    49  	"flag"
    50  	"fmt"
    51  	"io/fs"
    52  	"os"
    53  	"path/filepath"
    54  
    55  	"github.com/charlievieth/fastwalk"
    56  )
    57  
    58  const usageMsg = `Usage: %[1]s [-L] [-name] [PATH...]:
    59  
    60  %[1]s is a poor replacement for the POSIX find utility
    61  
    62  `
    63  
    64  func main() {
    65  	flag.Usage = func() {
    66  		fmt.Fprintf(os.Stdout, usageMsg, filepath.Base(os.Args[0]))
    67  		flag.PrintDefaults()
    68  	}
    69  	pattern := flag.String("name", "", "Pattern to match file names against.")
    70  	followLinks := flag.Bool("L", false, "Follow symbolic links")
    71  	flag.Parse()
    72  
    73  	// If no paths are provided default to the current directory: "."
    74  	args := flag.Args()
    75  	if len(args) == 0 {
    76  		args = append(args, ".")
    77  	}
    78  
    79  	// Follow links if the "-L" flag is provided
    80  	conf := fastwalk.Config{
    81  		Follow: *followLinks,
    82  	}
    83  
    84  	walkFn := func(path string, d fs.DirEntry, err error) error {
    85  		if err != nil {
    86  			fmt.Fprintf(os.Stderr, "%s: %v\n", path, err)
    87  			return nil // returning the error stops iteration
    88  		}
    89  		if *pattern != "" {
    90  			if ok, err := filepath.Match(*pattern, d.Name()); !ok {
    91  				// invalid pattern (err != nil) or name does not match
    92  				return err
    93  			}
    94  		}
    95  		_, err = fmt.Println(path)
    96  		return err
    97  	}
    98  	for _, root := range args {
    99  		if err := fastwalk.Walk(&conf, root, walkFn); err != nil {
   100  			fmt.Fprintf(os.Stderr, "%s: %v\n", root, err)
   101  			os.Exit(1)
   102  		}
   103  	}
   104  }
   105  ```
   106  
   107  ## Benchmarks
   108  
   109  Benchmarks were created using `go1.17.6` and can be generated with the `bench_comp` make target:
   110  ```sh
   111  $ make bench_comp
   112  ```
   113  
   114  ### Darwin
   115  
   116  **Hardware:**
   117  ```
   118  goos: darwin
   119  goarch: arm64
   120  cpu: Apple M1 Max
   121  ```
   122  
   123  #### [`filepath.WalkDir`](https://pkg.go.dev/path/filepath@go1.17.7#WalkDir) vs. [`fastwalk.Walk()`](https://pkg.go.dev/github.com/charlievieth/fastwalk#Walk):
   124  ```
   125                filepath       fastwalk       delta
   126  time/op       27.9ms ± 1%    13.0ms ± 1%    -53.33%
   127  alloc/op      4.33MB ± 0%    2.14MB ± 0%    -50.55%
   128  allocs/op     50.9k ± 0%     37.7k ± 0%     -26.01%
   129  ```
   130  
   131  #### [`godirwalk.Walk()`](https://pkg.go.dev/github.com/karrick/godirwalk@v1.16.1#Walk) vs. [`fastwalk.Walk()`](https://pkg.go.dev/github.com/charlievieth/fastwalk#Walk):
   132  ```
   133                godirwalk      fastwalk       delta
   134  time/op       58.5ms ± 3%    18.0ms ± 2%    -69.30%
   135  alloc/op      25.3MB ± 0%    2.1MB ± 0%     -91.55%
   136  allocs/op     57.6k ± 0%     37.7k ± 0%     -34.59%
   137  ```
   138  
   139  ### Linux
   140  
   141  **Hardware:**
   142  ```
   143  goos: linux
   144  goarch: amd64
   145  cpu: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
   146  drive: Samsung SSD 970 PRO 1TB
   147  ```
   148  
   149  #### [`filepath.WalkDir`](https://pkg.go.dev/path/filepath@go1.17.7#WalkDir) vs. [`fastwalk.Walk()`](https://pkg.go.dev/github.com/charlievieth/fastwalk#Walk):
   150  
   151  ```
   152                filepath       fastwalk       delta
   153  time/op       10.1ms ± 2%    2.8ms ± 2%     -72.83%
   154  alloc/op      2.44MB ± 0%    1.70MB ± 0%    -30.46%
   155  allocs/op     47.2k ± 0%     36.9k ± 0%     -21.80%
   156  ```
   157  
   158  #### [`godirwalk.Walk()`](https://pkg.go.dev/github.com/karrick/godirwalk@v1.16.1#Walk) vs. [`fastwalk.Walk()`](https://pkg.go.dev/github.com/charlievieth/fastwalk#Walk):
   159  
   160  ```
   161                filepath       fastwalk       delta
   162  time/op       13.7ms ±16%    2.8ms ± 2%     -79.88%
   163  alloc/op      7.48MB ± 0%    1.70MB ± 0%    -77.34%
   164  allocs/op     53.8k ± 0%     36.9k ± 0%     -31.38%
   165  ```
   166  
   167  ### Windows
   168  
   169  **Hardware:**
   170  ```
   171  goos: windows
   172  goarch: amd64
   173  pkg: github.com/charlievieth/fastwalk
   174  cpu: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
   175  ```
   176  
   177  #### [`filepath.WalkDir`](https://pkg.go.dev/path/filepath@go1.17.7#WalkDir) vs. [`fastwalk.Walk()`](https://pkg.go.dev/github.com/charlievieth/fastwalk#Walk):
   178  
   179  ```
   180                filepath       fastwalk       delta
   181  time/op       88.0ms ± 1%    14.6ms ± 1%    -83.47%
   182  alloc/op      5.68MB ± 0%    6.76MB ± 0%    +19.01%
   183  allocs/op     69.6k ± 0%     90.4k ± 0%     +29.87%
   184  ```
   185  
   186  #### [`godirwalk.Walk()`](https://pkg.go.dev/github.com/karrick/godirwalk@v1.16.1#Walk) vs. [`fastwalk.Walk()`](https://pkg.go.dev/github.com/charlievieth/fastwalk#Walk):
   187  
   188  ```
   189                filepath       fastwalk       delta
   190  time/op       87.4ms ± 1%    14.6ms ± 1%    -83.34%
   191  alloc/op      6.14MB ± 0%    6.76MB ± 0%    +10.24%
   192  allocs/op     100k ± 0%      90k ± 0%       -9.59%
   193  ```
   194  
   195  ## Darwin: getdirentries64
   196  
   197  The `nogetdirentries` build tag can be used to prevent `fastwalk` from using
   198  and linking to the non-public `__getdirentries64` syscall. This is required
   199  if an app using `fastwalk` is to be distributed via Apple's App Store (see
   200  https://github.com/golang/go/issues/30933 for more details). When using
   201  `__getdirentries64` is disabled, `fastwalk` will use `readdir_r` instead,
   202  which is what the Go standard library uses for
   203  [`os.ReadDir`](https://pkg.go.dev/os#ReadDir) and is about \~10% slower than
   204  `__getdirentries64`
   205  ([benchmarks](https://github.com/charlievieth/fastwalk/blob/2e6a1b8a1ce88e578279e6e631b2129f7144ec87/fastwalk_darwin_test.go#L19-L57)).
   206  
   207  Example of how to build and test that your program is not linked to `__getdirentries64`:
   208  ```sh
   209  # NOTE: the following only applies to darwin (aka macOS)
   210  
   211  # Build binary that imports fastwalk without linking to __getdirentries64.
   212  $ go build -tags nogetdirentries -o YOUR_BINARY
   213  # Test that __getdirentries64 is not linked (this should print no output).
   214  $ ! otool -dyld_info YOUR_BINARY | grep -F getdirentries64
   215  ```
   216  
   217  There is a also a script [scripts/links2getdirentries.bash](scripts/links2getdirentries.bash)
   218  that can be used to check if a program binary links to getdirentries.