github.com/bytedance/gopkg@v0.0.0-20240514070511-01b2cbcf35e1/collection/hashset/README.md (about)

     1  # hashset
     2  
     3  ## Introduction
     4  In this repository, we implemented one foundational data structure: Set based on Map in golang. We have:  
     5  `Add(value int64)`: Adds the specified element to this set.  
     6  `Contains(value int64) bool`: Returns true if this set contains the specified element.  
     7  `Remove(value int64)`: Removes the specified element from this set.  
     8  `Range(f func(value int64) bool)`: Function f executes by taking element in the set as parameter sequentially until f returns false  
     9  `Len() int`: Returns the number of elements of this set.  
    10  
    11  We made two experiments in order to measure the overall performance of the new hashset:  
    12  1. the chosen value's type: empty struct vs. bool  
    13  2. the impact of checking the existence of the key before add/remove an item  
    14  
    15  ## Features
    16  - The API of hashset is totally compatible with skipset [link](https://github.com/zhangyunhao116/skipset/)
    17  - Usually, developers implement the set in golang by setting the value of <key,value> pair to `bool` or `int`. However, We proved that using empty struct is more space efficiency and slightly time efficiency. 
    18  
    19  
    20  ## When to use hashset
    21  Hashset **doesnt** guarantee concurrent safe. If you do need a concurrent safe set, go for skipset [link] -> https://github.com/bytedance/gopkg/tree/develop/collection/skipset
    22  
    23  ## Quickstart
    24  ```go
    25  package main
    26  
    27  import (
    28  	"fmt"
    29  	"github.com/bytedance/gopkg/collection/hashset"
    30  )
    31  
    32  func main() {
    33  	l := hashset.NewInt()
    34  
    35  	for _, v := range []int{10, 12, 15} {
    36  		if l.Add(v) {
    37  			fmt.Println("hashset add", v)
    38  		}
    39  	}
    40  
    41  	if l.Contains(10) {
    42  		fmt.Println("hashset contains 10")
    43  	}
    44  
    45  	l.Range(func(value int) bool {
    46  		fmt.Println("hashset range found ", value)
    47  		return true
    48  	})
    49  
    50  	l.Remove(15)
    51  	fmt.Printf("hashset contains %d items\r\n", l.Len())
    52  }
    53  ```
    54  
    55  ## Benchmark
    56  go version: go1.15.10 linux/amd64  
    57  CPU: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz (4C8T)  
    58  OS: Debian 4.14.81.bm.15  
    59  MEMORY: 16G  
    60  
    61  ```
    62  $ go test -run=None -bench=. -benchtime=1000000x -benchmem -count=10 -cpu=4 > 1000000x20x4.txt
    63  $ benchstat 1000000x20x4.txt
    64  name                             time/op
    65  ValueAsBool-4                    301ns ± 7%
    66  ValueAsEmptyStruct-4             300ns ± 7%
    67  AddAfterContains-4               334ns ± 5%
    68  AddWithoutContains-4             303ns ± 9%
    69  RemoveAfterContains_Missing-4    177ns ± 4%
    70  RemoveWithoutContains_Missing-4  176ns ± 7%
    71  RemoveAfterContains_Hitting-4    205ns ± 2%
    72  RemoveWithoutContains_Hitting-4  135ns ±16%
    73  
    74  name                             alloc/op
    75  ValueAsBool-4                    54.0B ± 0%
    76  ValueAsEmptyStruct-4             49.0B ± 0%
    77  AddAfterContains-4               49.0B ± 0%
    78  AddWithoutContains-4             49.0B ± 0%
    79  RemoveAfterContains_Missing-4    0.00B
    80  RemoveWithoutContains_Missing-4  0.00B
    81  RemoveAfterContains_Hitting-4    0.00B
    82  RemoveWithoutContains_Hitting-4  0.00B
    83  ```