github.com/AlexanderZh/ahocorasick@v0.1.8/README.md (about) 1 # ahocorasick 2 3 The fastest Golang implementation of the Aho-Corasick algorithm for string-searching. 4 5 ## Usage 6 7 ```bash 8 go get github.com/AlexanderZh/ahocorasick@v0.1.8 9 ``` 10 11 [Documentation](https://godoc.org/github.com/RRethy/ahocorasick) 12 13 ```go 14 m := CompileByteSlices([][]byte{ 15 []byte("he"), 16 []byte("she"), 17 []byte("his"), 18 []byte("hers"), 19 []byte("she"), 20 }) 21 m.FindAllByteSlice([]byte("ushers")) // => { "she" 1 }, { "he" 2 }, { "hers" 2 } 22 23 m := CompileStrings([]string{ 24 "he", 25 "she", 26 "his", 27 "hers", 28 "she", 29 ) 30 m.FindAllString("ushers") // => { "she" 1 }, { "he" 2 }, { "hers" 2 } 31 ``` 32 33 ## Benchmarks 34 35 *macOS Mojave version 10.14.6* 36 37 *MacBook Pro (Retina, 13-inch, Early 2015)* 38 39 *Processor 3.1 GHz Intel Core i7* 40 41 42 ``` 43 $ git co d7354e5e7912add9c2c602aae74c508bca3b2f4d; go test -bench=Benchmark 44 ``` 45 46 The two basic operations are the compilation of the state machine from an array of patterns (`Compile`), and the usage of this state machine to find each pattern in text (`FindAll`). Other implementations call these operations under different names. 47 48 | Operation | Input Size | rrethy/ahocorasick | [BobuSumisu/aho-corasick](https://github.com/BobuSumisu/aho-corasick) | [anknown/ahocorasick](https://github.com/anknown/ahocorasick) | 49 | - | - | - | - | - | 50 | - | - | Double-Array Trie | LinkedList Trie | Double-Array Trie | 51 | - | - | - | - | - | 52 | `Compile` | 235886 patterns | **133 ms** | 214 ms | 1408 ms | 53 | `Compile` | 23589 patterns | **20 ms** | 50 ms | 137 ms | 54 | `Compile` | 2359 patterns | **3320 µs** | 11026 µs | 10506 µs | 55 | `Compile` | 236 patterns | **229 µs**| 1377 µs| 867s µs | 56 | `Compile` | 24 patterns | **43 µs**| 144 µs| 82s µs | 57 | - | - | - | - | - | 58 | `FindAll` | 3227439 bytes | **36 ms** | 38 ms | 116 ms | 59 | `FindAll` | 318647 bytes | **3641 µs** | 3764 µs | 11335 µs | 60 | `FindAll` | 31626 bytes | **359 µs** | 370 µs | 1103 µs | 61 | `FindAll` | 3657 bytes | **31 µs** | 40 µs | 131 µs | 62 63 **NOTE**: `FindAll` uses a state machine compiled from 2359 patterns. 64 65 **NOTE**: `FindAll` time does **not** include the `Compile` time for the state machine. 66 67 ### Reference Papers 68 69 [1] A. V. Aho, M. J. Corasick, "Efficient String Matching: An Aid to Bibliographic Search," Communications of the ACM, vol. 18, no. 6, pp. 333-340, June 1975. 70 71 [2] J.I. Aoe, "An Efficient Digital Search Algorithm by Using a Doble-Array Structure," IEEE Transactions on Software Engineering, vol. 15, no. 9, pp. 1066-1077, September 1989. 72 73 [3] J.I. Aoe, K. Morimoto, T. Sato, "An Efficient Implementation of Trie Stuctures," Software - Practice and Experience, vol. 22, no.9, pp. 695-721, September 1992. 74 75 ## License 76 77 `MIT`