github.com/symblcrowd/bloom@v2.0.5+incompatible/README.md (about)

     1  Bloom filters
     2  -------------
     3  
     4  [![Master Build Status](https://secure.travis-ci.org/willf/bloom.png?branch=master)](https://travis-ci.org/willf/bloom?branch=master)
     5  [![Coverage Status](https://coveralls.io/repos/github/willf/bloom/badge.svg?branch=master)](https://coveralls.io/github/willf/bloom?branch=master)
     6  [![Go Report Card](https://goreportcard.com/badge/github.com/willf/bloom)](https://goreportcard.com/report/github.com/willf/bloom)
     7  [![GoDoc](https://godoc.org/github.com/willf/bloom?status.svg)](http://godoc.org/github.com/willf/bloom)
     8  
     9  A Bloom filter is a representation of a set of _n_ items, where the main
    10  requirement is to make membership queries; _i.e._, whether an item is a
    11  member of a set.
    12  
    13  A Bloom filter has two parameters: _m_, a maximum size (typically a reasonably large multiple of the cardinality of the set to represent) and _k_, the number of hashing functions on elements of the set. (The actual hashing functions are important, too, but this is not a parameter for this implementation). A Bloom filter is backed by a [BitSet](https://github.com/willf/bitset); a key is represented in the filter by setting the bits at each value of the  hashing functions (modulo _m_). Set membership is done by _testing_ whether the bits at each value of the hashing functions (again, modulo _m_) are set. If so, the item is in the set. If the item is actually in the set, a Bloom filter will never fail (the true positive rate is 1.0); but it is susceptible to false positives. The art is to choose _k_ and _m_ correctly.
    14  
    15  In this implementation, the hashing functions used is [murmurhash](github.com/spaolacci/murmur3), a non-cryptographic hashing function.
    16  
    17  This implementation accepts keys for setting and testing as `[]byte`. Thus, to
    18  add a string item, `"Love"`:
    19  
    20      n := uint(1000)
    21      filter := bloom.New(20*n, 5) // load of 20, 5 keys
    22      filter.Add([]byte("Love"))
    23  
    24  Similarly, to test if `"Love"` is in bloom:
    25  
    26      if filter.Test([]byte("Love"))
    27  
    28  For numeric data, I recommend that you look into the encoding/binary library. But, for example, to add a `uint32` to the filter:
    29  
    30      i := uint32(100)
    31      n1 := make([]byte, 4)
    32      binary.BigEndian.PutUint32(n1, i)
    33      filter.Add(n1)
    34  
    35  Finally, there is a method to estimate the false positive rate of a particular
    36  bloom filter for a set of size _n_:
    37  
    38      if filter.EstimateFalsePositiveRate(1000) > 0.001
    39  
    40  Given the particular hashing scheme, it's best to be empirical about this. Note
    41  that estimating the FP rate will clear the Bloom filter.
    42  
    43  Discussion here: [Bloom filter](https://groups.google.com/d/topic/golang-nuts/6MktecKi1bE/discussion)
    44  
    45  Godoc documentation: https://godoc.org/github.com/willf/bloom
    46  
    47  ## Installation
    48  
    49  ```bash
    50  go get -u github.com/willf/bloom
    51  ```
    52  
    53  ## Contributing
    54  
    55  If you wish to contribute to this project, please branch and issue a pull request against master ("[GitHub Flow](https://guides.github.com/introduction/flow/)")
    56  
    57  This project include a Makefile that allows you to test and build the project with simple commands.
    58  To see all available options:
    59  ```bash
    60  make help
    61  ```
    62  
    63  ## Running all tests
    64  
    65  Before committing the code, please check if it passes all tests using (note: this will install some dependencies):
    66  ```bash
    67  make deps
    68  make qa
    69  ```