github.com/bytedance/gopkg@v0.0.0-20240514070511-01b2cbcf35e1/collection/hashset/README.md (about) 1 # hashset 2 3 ## Introduction 4 In this repository, we implemented one foundational data structure: Set based on Map in golang. We have: 5 `Add(value int64)`: Adds the specified element to this set. 6 `Contains(value int64) bool`: Returns true if this set contains the specified element. 7 `Remove(value int64)`: Removes the specified element from this set. 8 `Range(f func(value int64) bool)`: Function f executes by taking element in the set as parameter sequentially until f returns false 9 `Len() int`: Returns the number of elements of this set. 10 11 We made two experiments in order to measure the overall performance of the new hashset: 12 1. the chosen value's type: empty struct vs. bool 13 2. the impact of checking the existence of the key before add/remove an item 14 15 ## Features 16 - The API of hashset is totally compatible with skipset [link](https://github.com/zhangyunhao116/skipset/) 17 - Usually, developers implement the set in golang by setting the value of <key,value> pair to `bool` or `int`. However, We proved that using empty struct is more space efficiency and slightly time efficiency. 18 19 20 ## When to use hashset 21 Hashset **doesnt** guarantee concurrent safe. If you do need a concurrent safe set, go for skipset [link] -> https://github.com/bytedance/gopkg/tree/develop/collection/skipset 22 23 ## Quickstart 24 ```go 25 package main 26 27 import ( 28 "fmt" 29 "github.com/bytedance/gopkg/collection/hashset" 30 ) 31 32 func main() { 33 l := hashset.NewInt() 34 35 for _, v := range []int{10, 12, 15} { 36 if l.Add(v) { 37 fmt.Println("hashset add", v) 38 } 39 } 40 41 if l.Contains(10) { 42 fmt.Println("hashset contains 10") 43 } 44 45 l.Range(func(value int) bool { 46 fmt.Println("hashset range found ", value) 47 return true 48 }) 49 50 l.Remove(15) 51 fmt.Printf("hashset contains %d items\r\n", l.Len()) 52 } 53 ``` 54 55 ## Benchmark 56 go version: go1.15.10 linux/amd64 57 CPU: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz (4C8T) 58 OS: Debian 4.14.81.bm.15 59 MEMORY: 16G 60 61 ``` 62 $ go test -run=None -bench=. -benchtime=1000000x -benchmem -count=10 -cpu=4 > 1000000x20x4.txt 63 $ benchstat 1000000x20x4.txt 64 name time/op 65 ValueAsBool-4 301ns ± 7% 66 ValueAsEmptyStruct-4 300ns ± 7% 67 AddAfterContains-4 334ns ± 5% 68 AddWithoutContains-4 303ns ± 9% 69 RemoveAfterContains_Missing-4 177ns ± 4% 70 RemoveWithoutContains_Missing-4 176ns ± 7% 71 RemoveAfterContains_Hitting-4 205ns ± 2% 72 RemoveWithoutContains_Hitting-4 135ns ±16% 73 74 name alloc/op 75 ValueAsBool-4 54.0B ± 0% 76 ValueAsEmptyStruct-4 49.0B ± 0% 77 AddAfterContains-4 49.0B ± 0% 78 AddWithoutContains-4 49.0B ± 0% 79 RemoveAfterContains_Missing-4 0.00B 80 RemoveWithoutContains_Missing-4 0.00B 81 RemoveAfterContains_Hitting-4 0.00B 82 RemoveWithoutContains_Hitting-4 0.00B 83 ```