github.com/panmari/cuckoofilter@v1.0.7-0.20231223155748-763d1d471ee8/README.md (about)

     1  # Cuckoo Filter
     2  
     3  [![GitHub go.mod Go version of a Go module](https://img.shields.io/github/go-mod/go-version/panmari/cuckoofilter.svg)](https://github.com/panmari/cuckoofilter)
     4  [![GoDoc](https://godoc.org/github.com/panmari/cuckoofilter?status.svg)](https://godoc.org/github.com/panmari/cuckoofilter)
     5  [![GoReportCard](https://goreportcard.com/badge/github.com/panmari/cuckoofilter)](https://goreportcard.com/report/github.com/panmari/cuckoofilter)
     6  
     7  Well-tuned, production-ready cuckoo filter that performs best in class for low false positive rates (at around 0.01%). For details, see [full evaluation](https://panmari.github.io/2020/10/09/probabilistic-filter-golang.html).
     8  
     9  ## Background
    10  
    11  Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.
    12  
    13  Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).
    14  
    15  ["Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)
    16  
    17  ## Implementation details
    18  
    19  The paper cited above leaves several parameters to choose. In this implementation
    20  
    21  1. Every element has 2 possible bucket indices
    22  2. Buckets have a static size of 4 fingerprints
    23  3. Fingerprints have a static size of 16 bits
    24  
    25  1 and 2 are suggested to be the optimum by the authors. The choice of 3 comes down to the desired false positive rate. Given a target false positive rate of `r` and a bucket size `b`, they suggest choosing the fingerprint size `f` using
    26  
    27      f >= log2(2b/r) bits
    28  
    29  With the 16 bit fingerprint size in this repository, you can expect `r ~= 0.0001`.
    30  [Other implementations](https://github.com/seiflotfy/cuckoofilter) use 8 bit, which correspond to a false positive rate of `r ~= 0.03`.
    31  
    32  ## Example usage
    33  
    34  ```golang
    35  import (
    36  	"fmt"
    37  
    38  	cuckoo "github.com/panmari/cuckoofilter"
    39  )
    40  
    41  func Example() {
    42  	cf := cuckoo.NewFilter(1000)
    43  
    44  	cf.Insert([]byte("pizza"))
    45  	cf.Insert([]byte("tacos"))
    46  	cf.Insert([]byte("tacos")) // Re-insertion is possible.
    47  
    48  	fmt.Println(cf.Lookup([]byte("pizza")))
    49  	fmt.Println(cf.Lookup([]byte("missing")))
    50  
    51  	cf.Reset()
    52  	fmt.Println(cf.Lookup([]byte("pizza")))
    53  	// Output:
    54  	// true
    55  	// false
    56  	// false
    57  }
    58  ```
    59  
    60  For more examples, see [the example tests](https://github.com/panmari/cuckoofilter/blob/master/example_test.go).
    61  Operations on a filter are not thread safe by default. 
    62  See [this example](example_threadsafe_test.go) for using the filter concurrently.