github.com/AlexanderZh/ahocorasick@v0.1.8/README.md (about)

     1  # ahocorasick
     2  
     3  The fastest Golang implementation of the Aho-Corasick algorithm for string-searching.
     4  
     5  ## Usage
     6  
     7  ```bash
     8  go get github.com/AlexanderZh/ahocorasick@v0.1.8
     9  ```
    10  
    11  [Documentation](https://godoc.org/github.com/RRethy/ahocorasick)
    12  
    13  ```go
    14  m := CompileByteSlices([][]byte{
    15    []byte("he"),
    16    []byte("she"),
    17    []byte("his"),
    18    []byte("hers"),
    19    []byte("she"),
    20  })
    21  m.FindAllByteSlice([]byte("ushers")) // => { "she" 1 }, { "he" 2 }, { "hers" 2 }
    22  
    23  m := CompileStrings([]string{
    24    "he",
    25    "she",
    26    "his",
    27    "hers",
    28    "she",
    29  )
    30  m.FindAllString("ushers") // => { "she" 1 }, { "he" 2 }, { "hers" 2 }
    31  ```
    32  
    33  ## Benchmarks
    34  
    35  *macOS Mojave version 10.14.6*
    36  
    37  *MacBook Pro (Retina, 13-inch, Early 2015)*
    38  
    39  *Processor 3.1 GHz Intel Core i7*
    40  
    41  
    42  ```
    43  $ git co d7354e5e7912add9c2c602aae74c508bca3b2f4d; go test -bench=Benchmark
    44  ```
    45  
    46  The two basic operations are the compilation of the state machine from an array of patterns (`Compile`), and the usage of this state machine to find each pattern in text (`FindAll`). Other implementations call these operations under different names.
    47  
    48  | Operation | Input Size | rrethy/ahocorasick | [BobuSumisu/aho-corasick](https://github.com/BobuSumisu/aho-corasick) | [anknown/ahocorasick](https://github.com/anknown/ahocorasick) |
    49  | - | - | - | - | - |
    50  | - | - | Double-Array Trie | LinkedList Trie | Double-Array Trie |
    51  | - | - | - | - | - |
    52  | `Compile` | 235886 patterns | **133 ms** | 214 ms | 1408 ms |
    53  | `Compile` | 23589 patterns  | **20 ms** | 50 ms  | 137 ms |
    54  | `Compile` | 2359 patterns   | **3320 µs** | 11026 µs | 10506 µs |
    55  | `Compile` | 236 patterns    | **229 µs**| 1377 µs| 867s µs |
    56  | `Compile` | 24 patterns     | **43 µs**| 144 µs| 82s µs |
    57  | - | - | - | - | - |
    58  | `FindAll` | 3227439 bytes | **36 ms** | 38 ms | 116 ms |
    59  | `FindAll` | 318647 bytes  | **3641 µs** | 3764 µs | 11335 µs |
    60  | `FindAll` | 31626 bytes   | **359 µs** | 370 µs | 1103 µs |
    61  | `FindAll` | 3657 bytes    | **31 µs** | 40 µs | 131 µs |
    62  
    63  **NOTE**: `FindAll` uses a state machine compiled from 2359 patterns.
    64  
    65  **NOTE**: `FindAll` time does **not** include the `Compile` time for the state machine.
    66  
    67  ### Reference Papers
    68  
    69  [1] A. V. Aho, M. J. Corasick, "Efficient String Matching: An Aid to Bibliographic Search," Communications of the ACM, vol. 18, no. 6, pp. 333-340, June 1975.
    70  
    71  [2] J.I. Aoe, "An Efficient Digital Search Algorithm by Using a Doble-Array Structure," IEEE Transactions on Software Engineering, vol. 15, no. 9, pp. 1066-1077, September 1989.
    72  
    73  [3] J.I. Aoe, K. Morimoto, T. Sato, "An Efficient Implementation of Trie Stuctures," Software - Practice and Experience, vol. 22, no.9, pp. 695-721, September 1992.
    74  
    75  ## License
    76  
    77  `MIT`