github.com/cockroachdb/swiss@v0.0.0-20240303172742-c161743eb608/map.go (about) 1 // Copyright 2024 The Cockroach Authors 2 // 3 // Licensed under the Apache License, Version 2.0 (the "License"); 4 // you may not use this file except in compliance with the License. 5 // You may obtain a copy of the License at 6 // 7 // http://www.apache.org/licenses/LICENSE-2.0 8 // 9 // Unless required by applicable law or agreed to in writing, software 10 // distributed under the License is distributed on an "AS IS" BASIS, 11 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 // See the License for the specific language governing permissions and 13 // limitations under the License. 14 15 // Package swiss is a Go implementation of Swiss Tables as described in 16 // https://abseil.io/about/design/swisstables. See also: 17 // https://faultlore.com/blah/hashbrown-tldr/. 18 // 19 // Google's C++ implementation: 20 // 21 // https://github.com/abseil/abseil-cpp/blob/master/absl/container/internal/raw_hash_set.h 22 // 23 // # Swiss Tables 24 // 25 // Swiss tables are hash tables that map keys to values, similar to Go's 26 // builtin map type. Swiss tables use open-addressing rather than chaining to 27 // handle collisions. If you're not familiar with open-addressing see 28 // https://en.wikipedia.org/wiki/Open_addressing. A hybrid between linear and 29 // quadratic probing is used - linear probing within groups of small fixed 30 // size and quadratic probing at the group level. The key design choice of 31 // Swiss tables is the usage of a separate metadata array that stores 1 byte 32 // per slot in the table. 7-bits of this "control byte" are taken from 33 // hash(key) and the remaining bit is used to indicate whether the slot is 34 // empty, full, or deleted. The metadata array allows quick probes. The Google 35 // implementation of Swiss tables uses SIMD on x86 CPUs in order to quickly 36 // check 16 slots at a time for a match. Neon on arm64 CPUs is apparently too 37 // high latency, but the generic version is still able to compare 8 bytes at 38 // time through bit tricks (SWAR, SIMD Within A Register). 39 // 40 // Google's Swiss Tables layout is N-1 slots where N is a power of 2 and 41 // N+groupSize control bytes. The [N:N+groupSize] control bytes mirror the 42 // first groupSize control bytes so that probe operations at the end of the 43 // control bytes array do not have to perform additional checks. The 44 // separation of control bytes and slots implies 2 cache misses for a large 45 // map (larger than L2 cache size) or a cold map. The swiss.Map implementation 46 // differs from Google's layout: it groups together 8 control bytes and 8 47 // slots which often results in 1 cache miss for a large or cold map rather 48 // than separate accesses for the controls and slots. The mirrored control 49 // bytes are no longer needed and and groups no longer start at arbitrary slot 50 // index, but only at those that are multiples of 8. 51 // 52 // Probing is done by taking the top 57 bits of hash(key)%N as the index into 53 // the groups slice and then performing a check of the groupSize control bytes 54 // within the group. Probing walks through groups in the table using quadratic 55 // probing until it finds a group that has at least one empty slot. See the 56 // comments on probeSeq for more details on the order in which groups are 57 // probed and the guarantee that every group is examined which means that in 58 // the worst case probing will end when an empty slot is encountered (the map 59 // can never be 100% full). 60 // 61 // Deletion is performed using tombstones (ctrlDeleted) with an optimization 62 // to mark a slot as empty if we can prove that doing so would not violate the 63 // probing behavior that a group of full slots causes probing to continue. It 64 // is invalid to take a group of full slots and mark one as empty as doing so 65 // would cause subsequent lookups to terminate at that group rather than 66 // continue to probe. We prove a slot was never part of a full group by 67 // looking for whether any of the groupSize-1 neighbors to the left and right 68 // of the deleting slot are empty which indicates that the slot was never part 69 // of a full group. 70 // 71 // # Extendible Hashing 72 // 73 // The Swiss table design has a significant caveat: resizing of the table is 74 // done all at once rather than incrementally. This can cause long-tail 75 // latency blips in some use cases. To address this caveat, extendible hashing 76 // (https://en.wikipedia.org/wiki/Extendible_hashing) is applied on top of the 77 // Swiss table foundation. In extendible hashing, there is a top-level 78 // directory containing entries pointing to buckets. In swiss.Map each bucket 79 // is a Swiss table as described above. 80 // 81 // The high bits of hash(key) are used to index into the bucket directory 82 // which is effectively a trie. The number of bits used is the globalDepth, 83 // resulting in 2^globalDepth directory entries. Adjacent entries in the 84 // directory are allowed to point to the same bucket which enables resizing to 85 // be done incrementally, one bucket at a time. Each bucket has a localDepth 86 // which is less than or equal to the globalDepth. If the localDepth for a 87 // bucket equals the globalDepth then only a single directory entry points to 88 // the bucket. Otherwise, more than one directory entry points to the bucket. 89 // 90 // The diagram below shows one possible scenario for the directory and 91 // buckets. With a globalDepth of 2 the directory contains 4 entries. The 92 // first 2 entries point to the same bucket which has a localDepth of 1, while 93 // the last 2 entries point to different buckets. 94 // 95 // dir(globalDepth=2) 96 // +----+ 97 // | 00 | --\ 98 // +----+ +--> bucket[localDepth=1] 99 // | 01 | --/ 100 // +----+ 101 // | 10 | ------> bucket[localDepth=2] 102 // +----+ 103 // | 11 | ------> bucket[localDepth=2] 104 // +----+ 105 // 106 // The index into the directory is "hash(key) >> (64 - globalDepth)". 107 // 108 // When a bucket gets too large (specified by a configurable threshold) it is 109 // split. When a bucket is split its localDepth is incremented. If its 110 // localDepth is less than or equal to its globalDepth then the newly split 111 // bucket can be installed in the directory. If the bucket's localDepth is 112 // greater than the globalDepth then the globalDepth is incremented and the 113 // directory is reallocated at twice its current size. In the diagram above, 114 // consider what happens if the bucket at dir[3] is split: 115 // 116 // dir(globalDepth=3) 117 // +-----+ 118 // | 000 | --\ 119 // +-----+ \ 120 // | 001 | ----\ 121 // +-----+ +--> bucket[localDepth=1] 122 // | 010 | ----/ 123 // +-----+ / 124 // | 011 | --/ 125 // +-----+ 126 // | 100 | --\ 127 // +-----+ +----> bucket[localDepth=2] 128 // | 101 | --/ 129 // +-----+ 130 // | 110 | --------> bucket[localDepth=3] 131 // +-----+ 132 // | 111 | --------> bucket[localDepth=3] 133 // +-----+ 134 // 135 // Note that the diagram above is very unlikely with a good hash function as 136 // the buckets will tend to fill at a similar rate. 137 // 138 // The split operation redistributes the records in a bucket into two buckets. 139 // This is done by walking over the records in the bucket to be split, 140 // computing hash(key) and using localDepth to extract the bit which 141 // determines whether to leave the record in the current bucket or to move it 142 // to the new bucket. 143 // 144 // Maps containing only a single bucket are optimized to avoid the directory 145 // indexing resulting in performance that is equivalent to a Swiss table 146 // without extendible hashing. A single bucket can be guaranteed by 147 // configuring a very large bucket size threshold via the 148 // WithMaxBucketCapacity option. 149 // 150 // In order to avoid a level of indirection when accessing a bucket, the 151 // bucket directory points to buckets by value rather than by pointer. 152 // Adjacent bucket[K,V]'s which share are logically the same bucket share the 153 // bucket.groups slice and have the same values for 154 // bucket.{groupMask,localDepth,index}. The other fields of a bucket are only 155 // valid for buckets where &m.dir[bucket.index] = &bucket (i.e. the first 156 // bucket in the directory with the specified index). During Get operations, 157 // any of the buckets with the same index may be used for retrieval. During 158 // Put and Delete operations an additional indirection is performed, though 159 // the common case is that this indirection is within the same cache line as 160 // it is to the immediately preceding bucket in the directory. 161 // 162 // # Implementation 163 // 164 // The implementation follows Google's Abseil implementation of Swiss Tables, 165 // and is heavily tuned, using unsafe and raw pointer arithmentic rather than 166 // Go slices to squeeze out every drop of performance. In order to support 167 // hashing of arbitrary keys, a hack is performed to extract the hash function 168 // from Go's implementation of map[K]struct{} by reaching into the internals 169 // of the type. (This might break in future version of Go, but is likely 170 // fixable unless the Go runtime does something drastic). 171 // 172 // # Performance 173 // 174 // A swiss.Map has similar or slightly better performance than Go's builtin 175 // map for small map sizes, and is much faster at large map sizes. See 176 // [README.md] for details. 177 // 178 // [README.md] https://github.com/cockroachdb/swiss/blob/main/README.md 179 package swiss 180 181 import ( 182 "fmt" 183 "io" 184 "math" 185 "math/bits" 186 "strings" 187 "unsafe" 188 ) 189 190 const ( 191 groupSize = 8 192 maxAvgGroupLoad = 7 193 194 ctrlEmpty ctrl = 0b10000000 195 ctrlDeleted ctrl = 0b11111110 196 197 bitsetLSB = 0x0101010101010101 198 bitsetMSB = 0x8080808080808080 199 bitsetEmpty = bitsetLSB * uint64(ctrlEmpty) 200 bitsetDeleted = bitsetLSB * uint64(ctrlDeleted) 201 202 // The default maximum capacity a bucket is allowed to grow to before it 203 // will be split. 204 defaultMaxBucketCapacity uint32 = 4096 205 206 // ptrSize and shiftMask are used to optimize code generation for 207 // Map.bucket(), Map.bucketCount(), and bucketStep(). This technique was 208 // lifted from the Go runtime's runtime/map.go:bucketShift() routine. Note 209 // that ptrSize will be either 4 on 32-bit archs or 8 on 64-bit archs. 210 ptrSize = 4 << (^uintptr(0) >> 63) 211 ptrBits = ptrSize * 8 212 shiftMask = ptrSize*8 - 1 213 214 expectedBucketSize = ptrSize + 6*4 215 ) 216 217 // Don't add fields to the bucket unnecessarily. It is packed for efficiency so 218 // that we can fit 2 buckets into a 64-byte cache line on 64-bit architectures. 219 // This will cause a type error if the size of a bucket changes. 220 var _ [0]struct{} = [unsafe.Sizeof(bucket[int, int]{}) - expectedBucketSize]struct{}{} 221 222 // slot holds a key and value. 223 type slot[K comparable, V any] struct { 224 key K 225 value V 226 } 227 228 // Group holds groupSize control bytes and slots. 229 type Group[K comparable, V any] struct { 230 ctrls ctrlGroup 231 slots slotGroup[K, V] 232 } 233 234 // bucket implements Google's Swiss Tables hash table design. A Map is 235 // composed of 1 or more buckets that are addressed using extendible hashing. 236 type bucket[K comparable, V any] struct { 237 // groups is groupMask+1 in length and holds groupSize key/value slots and 238 // their control bytes. 239 groups unsafeSlice[Group[K, V]] 240 // groupMask is the number of groups minus 1 which is used to quickly 241 // compute i%N using a bitwise & operation. The groupMask only changes 242 // when a bucket is resized. 243 groupMask uint32 244 245 // Capacity, used, and growthLeft are only updated on mutation operations 246 // (Put, Delete). Read operations (Get) only access the groups and 247 // groupMask fields. 248 249 // The total number (always 2^N). Equal to `(groupMask+1)*groupSize` 250 // (unless the bucket is empty, when capacity is 0). 251 capacity uint32 252 // The number of filled slots (i.e. the number of elements in the bucket). 253 used uint32 254 // The number of slots we can still fill without needing to rehash. 255 // 256 // This is stored separately due to tombstones: we do not include 257 // tombstones in the growth capacity because we'd like to rehash when the 258 // table is filled with tombstones as otherwise probe sequences might get 259 // unacceptably long without triggering a rehash. 260 growthLeft uint32 261 262 // localDepth is the number of high bits from hash(key) used to generate 263 // an index for the global directory to locate this bucket. If localDepth 264 // is 0 this bucket is Map.bucket0. LocalDepth is only updated when a 265 // bucket splits. 266 localDepth uint32 267 // The index of the bucket within Map.dir. The buckets in 268 // Map.dir[index:index+2^(globalDepth-localDepth)] all share the same 269 // groups (and are logically the same bucket). Only the bucket at 270 // Map.dir[index] can be used for mutation operations (Put, Delete). The 271 // other buckets can be used for Get operations. Index is only updated 272 // when a bucket splits or the directory grows. 273 index uint32 274 } 275 276 // Map is an unordered map from keys to values with Put, Get, Delete, and All 277 // operations. Map is inspired by Google's Swiss Tables design as implemented 278 // in Abseil's flat_hash_map, combined with extendible hashing. By default, a 279 // Map[K,V] uses the same hash function as Go's builtin map[K]V, though a 280 // different hash function can be specified using the WithHash option. 281 // 282 // A Map is NOT goroutine-safe. 283 type Map[K comparable, V any] struct { 284 // The hash function to each keys of type K. The hash function is 285 // extracted from the Go runtime's implementation of map[K]struct{}. 286 hash hashFn 287 seed uintptr 288 // The allocator to use for the ctrls and slots slices. 289 allocator Allocator[K, V] 290 // bucket0 is always present and inlined in the Map to avoid a pointer 291 // indirection during the common case that the map contains a single 292 // bucket. bucket0 is also used during split operations as a temporary 293 // bucket to split into before the bucket is installed in the directory. 294 bucket0 bucket[K, V] 295 // The directory of buckets. See the comment on bucket.index for details 296 // on how the physical bucket values map to logical buckets. 297 dir unsafeSlice[bucket[K, V]] 298 // The number of filled slots across all buckets (i.e. the number of 299 // elements in the map). 300 used int 301 // globalShift is the number of bits to right shift a hash value to 302 // generate an index for the global directory. As a special case, if 303 // globalShift==0 then bucket0 is used and the directory is not accessed. 304 // Note that globalShift==(64-globalDepth). globalShift is used rather 305 // than globalDepth because the shifting is the more common operation than 306 // needing to compare globalDepth to a bucket's localDepth. 307 globalShift uint32 308 // The maximum capacity a bucket is allowed to grow to before it will be 309 // split. 310 maxBucketCapacity uint32 311 _ noCopy 312 } 313 314 func normalizeCapacity(capacity uint32) uint32 { 315 v := (uint32(1) << bits.Len32(uint32(capacity-1))) 316 if v != 0 { 317 return v 318 } 319 return uint32(1) << 31 320 } 321 322 // New constructs a new Map with the specified initial capacity. If 323 // initialCapacity is 0 the map will start out with zero capacity and will 324 // grow on the first insert. The zero value for a Map is not usable. 325 func New[K comparable, V any](initialCapacity int, options ...Option[K, V]) *Map[K, V] { 326 m := &Map[K, V]{} 327 m.Init(initialCapacity, options...) 328 return m 329 } 330 331 // Init initializes a Map with the specified initial capacity. If 332 // initialCapacity is 0 the map will start out with zero capacity and will 333 // grow on the first insert. The zero value for a Map is not usable and Init 334 // must be called before using the map. 335 // 336 // Init is intended for usage when a Map is embedded by value in another 337 // structure. 338 func (m *Map[K, V]) Init(initialCapacity int, options ...Option[K, V]) { 339 *m = Map[K, V]{ 340 hash: getRuntimeHasher[K](), 341 seed: uintptr(fastrand64()), 342 allocator: defaultAllocator[K, V]{}, 343 bucket0: bucket[K, V]{ 344 // The groups slice for bucket0 in an empty map points to a single 345 // group where the controls are all marked as empty. This 346 // simplifies the logic for probing in Get, Put, and Delete. The 347 // empty controls will never match a probe operation, and if 348 // insertion is performed growthLeft==0 will trigger a resize of 349 // the bucket. 350 groups: makeUnsafeSlice(unsafeConvertSlice[Group[K, V]](emptyCtrls[:])), 351 }, 352 maxBucketCapacity: defaultMaxBucketCapacity, 353 } 354 355 // Initialize the directory to point to bucket0. 356 m.dir = makeUnsafeSlice(unsafe.Slice(&m.bucket0, 1)) 357 358 for _, op := range options { 359 op.apply(m) 360 } 361 362 if m.maxBucketCapacity < groupSize { 363 m.maxBucketCapacity = groupSize 364 } 365 m.maxBucketCapacity = normalizeCapacity(m.maxBucketCapacity) 366 367 if initialCapacity > 0 { 368 // We consider initialCapacity to be an indication from the caller 369 // about the number of records the map should hold. The realized 370 // capacity of a map is 7/8 of the number of slots, so we set the 371 // target capacity to initialCapacity*8/7. 372 targetCapacity := uintptr((initialCapacity * groupSize) / maxAvgGroupLoad) 373 if targetCapacity <= uintptr(m.maxBucketCapacity) { 374 // Normalize targetCapacity to the smallest value of the form 2^k. 375 m.bucket0.init(m, normalizeCapacity(uint32(targetCapacity))) 376 } else { 377 // If targetCapacity is larger than maxBucketCapacity we need to 378 // size the directory appropriately. We'll size each bucket to 379 // maxBucketCapacity and create enough buckets to hold 380 // initialCapacity. 381 nBuckets := (targetCapacity + uintptr(m.maxBucketCapacity) - 1) / uintptr(m.maxBucketCapacity) 382 globalDepth := uint32(bits.Len32(uint32(nBuckets) - 1)) 383 m.growDirectory(globalDepth, 0 /* index */) 384 385 n := m.bucketCount() 386 for i := uint32(0); i < n; i++ { 387 b := m.dir.At(uintptr(i)) 388 b.init(m, m.maxBucketCapacity) 389 b.localDepth = globalDepth 390 b.index = i 391 } 392 393 m.checkInvariants() 394 } 395 } 396 397 m.buckets(0, func(b *bucket[K, V]) bool { 398 b.checkInvariants(m) 399 return true 400 }) 401 } 402 403 // Close closes the map, releasing any memory back to its configured 404 // allocator. It is unnecessary to close a map using the default allocator. It 405 // is invalid to use a Map after it has been closed, though Close itself is 406 // idempotent. 407 func (m *Map[K, V]) Close() { 408 m.buckets(0, func(b *bucket[K, V]) bool { 409 b.close(m.allocator) 410 return true 411 }) 412 413 m.allocator = nil 414 } 415 416 // Put inserts an entry into the map, overwriting an existing value if an 417 // entry with the same key already exists. 418 func (m *Map[K, V]) Put(key K, value V) { 419 // Put is find composed with uncheckedPut. We perform find to see if the 420 // key is already present. If it is, we're done and overwrite the existing 421 // value. If the value isn't present we perform an uncheckedPut which 422 // inserts an entry known not to be in the table (violating this 423 // requirement will cause the table to behave erratically). 424 h := m.hash(noescape(unsafe.Pointer(&key)), m.seed) 425 b := m.mutableBucket(h) 426 427 // NB: Unlike the abseil swiss table implementation which uses a common 428 // find routine for Get, Put, and Delete, we have to manually inline the 429 // find routine for performance. 430 seq := makeProbeSeq(h1(h), b.groupMask) 431 startOffset := seq.offset 432 433 for ; ; seq = seq.next() { 434 g := b.groups.At(uintptr(seq.offset)) 435 match := g.ctrls.matchH2(h2(h)) 436 437 for match != 0 { 438 i := match.first() 439 slot := g.slots.At(i) 440 if key == slot.key { 441 slot.value = value 442 b.checkInvariants(m) 443 return 444 } 445 match = match.removeFirst() 446 } 447 448 match = g.ctrls.matchEmpty() 449 if match != 0 { 450 // Finding an empty slot means we've reached the end of the probe 451 // sequence. 452 453 // If there is room left to grow in the bucket and we're at the 454 // start of the probe sequence we can just insert the new entry. 455 if b.growthLeft > 0 && seq.offset == startOffset { 456 i := match.first() 457 slot := g.slots.At(i) 458 slot.key = key 459 slot.value = value 460 g.ctrls.Set(i, ctrl(h2(h))) 461 b.growthLeft-- 462 b.used++ 463 m.used++ 464 b.checkInvariants(m) 465 return 466 } 467 468 // Find the first empty or deleted slot in the key's probe 469 // sequence. 470 seq := makeProbeSeq(h1(h), b.groupMask) 471 for ; ; seq = seq.next() { 472 g := b.groups.At(uintptr(seq.offset)) 473 match = g.ctrls.matchEmptyOrDeleted() 474 if match != 0 { 475 i := match.first() 476 // If there is room left to grow in the table or the slot 477 // is deleted (and thus we're overwriting it and not 478 // changing growthLeft) we can insert the entry here. 479 // Otherwise we need to rehash the bucket. 480 if b.growthLeft > 0 || g.ctrls.Get(i) == ctrlDeleted { 481 slot := g.slots.At(i) 482 slot.key = key 483 slot.value = value 484 if g.ctrls.Get(i) == ctrlEmpty { 485 b.growthLeft-- 486 } 487 g.ctrls.Set(i, ctrl(h2(h))) 488 b.used++ 489 m.used++ 490 b.checkInvariants(m) 491 return 492 } 493 break 494 } 495 } 496 497 if invariants && b.growthLeft != 0 { 498 panic(fmt.Sprintf("invariant failed: growthLeft is unexpectedly non-zero: %d\n%#v", b.growthLeft, b)) 499 } 500 501 b.rehash(m) 502 503 // We may have split the bucket in which case we have to 504 // re-determine which bucket the key resides on. This 505 // determination is quick in comparison to rehashing, resizing, 506 // and splitting, so just always do it. 507 b = m.mutableBucket(h) 508 509 // Note that we don't have to restart the entire Put process as we 510 // know the key doesn't exist in the map. 511 b.uncheckedPut(h, key, value) 512 b.used++ 513 m.used++ 514 b.checkInvariants(m) 515 return 516 } 517 } 518 } 519 520 // Get retrieves the value from the map for the specified key, returning 521 // ok=false if the key is not present. 522 func (m *Map[K, V]) Get(key K) (value V, ok bool) { 523 h := m.hash(noescape(unsafe.Pointer(&key)), m.seed) 524 b := m.bucket(h) 525 526 // NB: Unlike the abseil swiss table implementation which uses a common 527 // find routine for Get, Put, and Delete, we have to manually inline the 528 // find routine for performance. 529 530 // To find the location of a key in the table, we compute hash(key). From 531 // h1(hash(key)) and the capacity, we construct a probeSeq that visits 532 // every group of slots in some interesting order. 533 // 534 // We walk through these indices. At each index, we select the entire group 535 // starting with that index and extract potential candidates: occupied slots 536 // with a control byte equal to h2(hash(key)). If we find an empty slot in the 537 // group, we stop and return an error. The key at candidate slot y is compared 538 // with key; if key == m.slots[y].key we are done and return y; otherwise we 539 // continue to the next probe index. Tombstones (ctrlDeleted) effectively 540 // behave like full slots that never match the value we're looking for. 541 // 542 // The h2 bits ensure when we compare a key we are likely to have actually 543 // found the object. That is, the chance is low that keys compare false. Thus, 544 // when we search for an object, we are unlikely to call == many times. This 545 // likelyhood can be analyzed as follows (assuming that h2 is a random enough 546 // hash function). 547 // 548 // Let's assume that there are k "wrong" objects that must be examined in a 549 // probe sequence. For example, when doing a find on an object that is in the 550 // table, k is the number of objects between the start of the probe sequence 551 // and the final found object (not including the final found object). The 552 // expected number of objects with an h2 match is then k/128. Measurements and 553 // analysis indicate that even at high load factors, k is less than 32, 554 // meaning that the number of false positive comparisons we must perform is 555 // less than 1/8 per find. 556 seq := makeProbeSeq(h1(h), b.groupMask) 557 for ; ; seq = seq.next() { 558 g := b.groups.At(uintptr(seq.offset)) 559 match := g.ctrls.matchH2(h2(h)) 560 561 for match != 0 { 562 i := match.first() 563 slot := g.slots.At(i) 564 if key == slot.key { 565 return slot.value, true 566 } 567 match = match.removeFirst() 568 } 569 570 match = g.ctrls.matchEmpty() 571 if match != 0 { 572 return value, false 573 } 574 } 575 } 576 577 // Delete deletes the entry corresponding to the specified key from the map. 578 // It is a noop to delete a non-existent key. 579 func (m *Map[K, V]) Delete(key K) { 580 // Delete is find composed with "deleted at": we perform find(key), and 581 // then delete at the resulting slot if found. 582 h := m.hash(noescape(unsafe.Pointer(&key)), m.seed) 583 b := m.mutableBucket(h) 584 585 // NB: Unlike the abseil swiss table implementation which uses a common 586 // find routine for Get, Put, and Delete, we have to manually inline the 587 // find routine for performance. 588 seq := makeProbeSeq(h1(h), b.groupMask) 589 for ; ; seq = seq.next() { 590 g := b.groups.At(uintptr(seq.offset)) 591 match := g.ctrls.matchH2(h2(h)) 592 593 for match != 0 { 594 i := match.first() 595 s := g.slots.At(i) 596 if key == s.key { 597 b.used-- 598 m.used-- 599 *s = slot[K, V]{} 600 601 // Only a full group can appear in the middle of a probe 602 // sequence (a group with at least one empty slot terminates 603 // probing). Once a group becomes full, it stays full until 604 // rehashing/resizing. So if the group isn't full now, we can 605 // simply remove the element. Otherwise, we create a tombstone 606 // to mark the slot as deleted. 607 if g.ctrls.matchEmpty() != 0 { 608 g.ctrls.Set(i, ctrlEmpty) 609 b.growthLeft++ 610 } else { 611 g.ctrls.Set(i, ctrlDeleted) 612 } 613 b.checkInvariants(m) 614 return 615 } 616 match = match.removeFirst() 617 } 618 619 match = g.ctrls.matchEmpty() 620 if match != 0 { 621 b.checkInvariants(m) 622 return 623 } 624 } 625 } 626 627 // Clear deletes all entries from the map resulting in an empty map. 628 func (m *Map[K, V]) Clear() { 629 m.buckets(0, func(b *bucket[K, V]) bool { 630 for i := uint32(0); i <= b.groupMask; i++ { 631 g := b.groups.At(uintptr(i)) 632 g.ctrls.SetEmpty() 633 for j := uint32(0); j < groupSize; j++ { 634 *g.slots.At(j) = slot[K, V]{} 635 } 636 } 637 638 b.used = 0 639 b.resetGrowthLeft() 640 return true 641 }) 642 643 // Reset the hash seed to make it more difficult for attackers to 644 // repeatedly trigger hash collisions. See issue 645 // https://github.com/golang/go/issues/25237. 646 m.seed = uintptr(fastrand64()) 647 m.used = 0 648 } 649 650 // All calls yield sequentially for each key and value present in the map. If 651 // yield returns false, range stops the iteration. The map can be mutated 652 // during iteration, though there is no guarantee that the mutations will be 653 // visible to the iteration. 654 // 655 // TODO(peter): The naming of All and its signature are meant to conform to 656 // the range-over-function Go proposal. When that proposal is accepted (which 657 // seems likely), we'll be able to iterate over the map by doing: 658 // 659 // for k, v := range m.All { 660 // fmt.Printf("%v: %v\n", k, v) 661 // } 662 // 663 // See https://github.com/golang/go/issues/61897. 664 func (m *Map[K, V]) All(yield func(key K, value V) bool) { 665 // Randomize iteration order by starting iteration at a random bucket and 666 // within each bucket at a random offset. 667 offset := uintptr(fastrand64()) 668 m.buckets(offset>>32, func(b *bucket[K, V]) bool { 669 if b.used == 0 { 670 return true 671 } 672 673 // Snapshot the groups, and groupMask so that iteration remains valid 674 // if the map is resized during iteration. 675 groups := b.groups 676 groupMask := b.groupMask 677 678 offset32 := uint32(offset) 679 for i := uint32(0); i <= groupMask; i++ { 680 g := groups.At(uintptr((i + offset32) & groupMask)) 681 // TODO(peter): Skip over groups that are composed of only empty 682 // or deleted slots using matchEmptyOrDeleted() and counting the 683 // number of bits set. 684 for j := uint32(0); j < groupSize; j++ { 685 k := (j + offset32) & (groupSize - 1) 686 // Match full entries which have a high-bit of zero. 687 if (g.ctrls.Get(k) & ctrlEmpty) != ctrlEmpty { 688 slot := g.slots.At(k) 689 if !yield(slot.key, slot.value) { 690 return false 691 } 692 } 693 } 694 } 695 return true 696 }) 697 } 698 699 // GoString implements the fmt.GoStringer interface which is used when 700 // formatting using the "%#v" format specifier. 701 func (m *Map[K, V]) GoString() string { 702 var buf strings.Builder 703 fmt.Fprintf(&buf, "used=%d global-depth=%d bucket-count=%d\n", m.used, m.globalDepth(), m.bucketCount()) 704 m.buckets(0, func(b *bucket[K, V]) bool { 705 fmt.Fprintf(&buf, "bucket %d (%p): local-depth=%d\n", b.index, b, b.localDepth) 706 b.goFormat(&buf) 707 return true 708 }) 709 return buf.String() 710 } 711 712 // Len returns the number of entries in the map. 713 func (m *Map[K, V]) Len() int { 714 return m.used 715 } 716 717 // capacity returns the total capacity of all map buckets. 718 func (m *Map[K, V]) capacity() int { 719 var capacity int 720 m.buckets(0, func(b *bucket[K, V]) bool { 721 capacity += int(b.capacity) 722 return true 723 }) 724 return capacity 725 } 726 727 // bucket returns the bucket corresponding to hash value h. 728 func (m *Map[K, V]) bucket(h uintptr) *bucket[K, V] { 729 // NB: It is faster to check for the single bucket case using a 730 // conditional than to index into the directory. 731 if m.globalShift == 0 { 732 return &m.bucket0 733 } 734 // When shifting by a variable amount the Go compiler inserts overflow 735 // checks that the shift is less than the maximum allowed (32 or 64). 736 // Masking the shift amount allows overflow checks to be elided. 737 return m.dir.At(h >> (m.globalShift & shiftMask)) 738 } 739 740 func (m *Map[K, V]) mutableBucket(h uintptr) *bucket[K, V] { 741 // NB: It is faster to check for the single bucket case using a 742 // conditional than to to index into the directory. 743 if m.globalShift == 0 { 744 return &m.bucket0 745 } 746 // When shifting by a variable amount the Go compiler inserts overflow 747 // checks that the shift is less than the maximum allowed (32 or 64). 748 // Masking the shift amount allows overflow checks to be elided. 749 b := m.dir.At(h >> (m.globalShift & shiftMask)) 750 // The mutable bucket is the one located at m.dir[b.index]. This will 751 // usually be either the current bucket b, or the immediately preceding 752 // bucket which is usually in the same cache line. 753 return m.dir.At(uintptr(b.index)) 754 } 755 756 // buckets calls yield sequentially for each bucket in the map. If yield 757 // returns false, iteration stops. Offset specifies the bucket to start 758 // iteration at (used to randomize iteration order). 759 func (m *Map[K, V]) buckets(offset uintptr, yield func(b *bucket[K, V]) bool) { 760 b := m.dir.At(offset & uintptr(m.bucketCount()-1)) 761 // We iterate over the first bucket in a logical group of buckets (i.e. 762 // buckets which share bucket.groups). The first bucket has the accurate 763 // bucket.used field and those are also the buckets that are stepped 764 // through using bucketStep(). 765 b = m.dir.At(uintptr(b.index)) 766 767 // Loop termination is handled by remembering the start bucket index and 768 // exiting when it is reached again. Note that the startIndex needs to be 769 // adjusted to account for the directory growing during iteration (i.e. 770 // due to a mutation), so we remember the starting global depth as well in 771 // order to perform that adjustment. Whenever the directory grows by 772 // doubling, every existing bucket index will be doubled. 773 startIndex := b.index 774 startGlobalDepth := m.globalDepth() 775 776 for { 777 originalGlobalDepth := m.globalDepth() 778 originalLocalDepth := b.localDepth 779 originalIndex := b.index 780 781 if !yield(b) { 782 break 783 } 784 785 // The size of the directory can grow if the yield function mutates 786 // the map. We want to iterate over each bucket once, and if a bucket 787 // splits while we're iterating over it we want to skip over all of 788 // the buckets newly split from the one we're iterating over. We do 789 // this by snapshotting the bucket's local depth and using the 790 // snapshotted local depth to compute the bucket step. 791 // 792 // Note that b.index will also change if the directory grows. Consider 793 // the directory below with a globalDepth of 2 containing 4 buckets, 794 // each of which has a localDepth of 2. 795 // 796 // dir b.index b.localDepth 797 // +-----+---------+--------------+ 798 // | 00 | 0 | 2 | 799 // +-----+---------+--------------+ 800 // | 01 | 1 | 2 | 801 // +-----+---------+--------------+ 802 // | 10 | 2 | 2 | <--- iteration point 803 // +-----+---------+--------------+ 804 // | 11 | 3 | 2 | 805 // +-----+---------+--------------+ 806 // 807 // If the directory grows during iteration, the index of the bucket 808 // we're iterating over will change. If the bucket we're iterating 809 // over split, then the local depth will have increased. Notice how 810 // the bucket that was previously at index 1 now is at index 2 and is 811 // pointed to by 2 directory entries: 010 and 011. The bucket being 812 // iterated over which was previously at index 2 is now at index 4. 813 // Iteration within a bucket takes a snapshot of the controls and 814 // slots to make sure we don't miss keys during iteration or iterate 815 // over keys more than once. But we also need to take care of the case 816 // where the bucket we're iterating over splits. In this case, we need 817 // to skip over the bucket at index 5 which can be done by computing 818 // the bucketStep using the bucket's depth prior to calling yield 819 // which in this example will be 1<<(3-2)==2. 820 // 821 // dir b.index b.localDepth 822 // +-----+---------+--------------+ 823 // | 000 | 0 | 2 | 824 // +-----+ | | 825 // | 001 | | | 826 // +-----+---------+--------------+ 827 // | 010 | 2 | 2 | 828 // +-----+ | | 829 // | 011 | | | 830 // +-----+---------+--------------+ 831 // | 100 | 4 | 3 | 832 // +-----+---------+--------------+ 833 // | 101 | 5 | 3 | 834 // +-----+---------+--------------+ 835 // | 110 | 6 | 2 | 836 // +-----+ | | 837 // | 111 | | | 838 // +-----+---------+--------------+ 839 840 // After calling yield, b is no longer valid. We determine the next 841 // bucket to iterate over using the b.index we cached before calling 842 // yield and adjusting for any directory growth that happened during 843 // the yield call. 844 i := adjustBucketIndex(originalIndex, m.globalDepth(), originalGlobalDepth) 845 i += bucketStep(m.globalDepth(), originalLocalDepth) 846 i &= (m.bucketCount() - 1) 847 848 // Similar to the adjustment for b's index, we compute the starting 849 // bucket's new index accounting for directory growth. 850 adjustedStartIndex := adjustBucketIndex(startIndex, m.globalDepth(), startGlobalDepth) 851 if i == adjustedStartIndex { 852 break 853 } 854 855 b = m.dir.At(uintptr(i)) 856 } 857 } 858 859 // globalDepth returns the number of bits from the top of the hash to use for 860 // indexing in the buckets directory. 861 func (m *Map[K, V]) globalDepth() uint32 { 862 if m.globalShift == 0 { 863 return 0 864 } 865 return ptrBits - m.globalShift 866 } 867 868 // bucketCount returns the number of buckets in the buckets directory. 869 func (m *Map[K, V]) bucketCount() uint32 { 870 const shiftMask = 31 871 return uint32(1) << (m.globalDepth() & shiftMask) 872 } 873 874 // bucketStep is the number of buckets to step over in the buckets directory 875 // to reach the next different bucket. A bucket occupies 1 or more contiguous 876 // entries in the buckets directory specified by the range: 877 // 878 // [b.index:b.index+bucketStep(m.globalDepth(), b.localDepth)] 879 func bucketStep(globalDepth, localDepth uint32) uint32 { 880 const shiftMask = 31 881 return uint32(1) << ((globalDepth - localDepth) & shiftMask) 882 } 883 884 // adjustBucketIndex adjusts the index of a bucket to account for the growth 885 // of the directory where index was captured at originalGlobalDepth and we're 886 // computing where that index will reside in the directory at 887 // currentGlobalDepth. 888 func adjustBucketIndex(index, currentGlobalDepth, originalGlobalDepth uint32) uint32 { 889 return index * (1 << (currentGlobalDepth - originalGlobalDepth)) 890 } 891 892 // installBucket installs a bucket into the buckets directory, overwriting 893 // every index in the range of entries the bucket occupies. 894 func (m *Map[K, V]) installBucket(b *bucket[K, V]) *bucket[K, V] { 895 step := bucketStep(m.globalDepth(), b.localDepth) 896 for i := uint32(0); i < step; i++ { 897 *m.dir.At(uintptr(b.index + i)) = *b 898 } 899 return m.dir.At(uintptr(b.index)) 900 } 901 902 // growDirectory grows the directory slice to 1<<newGlobalDepth buckets. Grow 903 // directory returns the new index location for the bucket specified by index. 904 func (m *Map[K, V]) growDirectory(newGlobalDepth, index uint32) (newIndex uint32) { 905 if invariants && newGlobalDepth > 32 { 906 panic(fmt.Sprintf("invariant failed: expectedly large newGlobalDepth %d->%d", 907 m.globalDepth(), newGlobalDepth)) 908 } 909 910 newDir := makeUnsafeSlice(make([]bucket[K, V], 1<<newGlobalDepth)) 911 912 // NB: It would be more natural to use Map.buckets() here, but that 913 // routine uses b.index during iteration which we're mutating in the loop 914 // below. 915 916 lastIndex := uint32(math.MaxUint32) 917 setNewIndex := true 918 for i, j, n := uint32(0), uint32(0), m.bucketCount(); i < n; i++ { 919 b := m.dir.At(uintptr(i)) 920 if b.index == lastIndex { 921 continue 922 } 923 lastIndex = b.index 924 925 if b.index == index && setNewIndex { 926 newIndex = j 927 setNewIndex = false 928 } 929 b.index = j 930 step := bucketStep(newGlobalDepth, b.localDepth) 931 for k := uint32(0); k < step; k++ { 932 *newDir.At(uintptr(j + k)) = *b 933 } 934 j += step 935 } 936 937 // Zero out bucket0 if we're growing from 1 bucket (which uses bucket0) to 938 // more than 1 bucket. 939 if m.globalShift == 0 { 940 m.bucket0 = bucket[K, V]{} 941 } 942 m.dir = newDir 943 m.globalShift = ptrBits - newGlobalDepth 944 945 m.checkInvariants() 946 return newIndex 947 } 948 949 // checkInvariants verifies the internal consistency of the map's structure, 950 // checking conditions that should always be true for a correctly functioning 951 // map. If any of these invariants are violated, it panics, indicating a bug 952 // in the map implementation. 953 func (m *Map[K, V]) checkInvariants() { 954 if invariants { 955 if m.globalShift == 0 { 956 if m.dir.ptr != unsafe.Pointer(&m.bucket0) { 957 panic(fmt.Sprintf("directory (%p) does not point to bucket0 (%p)", m.dir.ptr, &m.bucket0)) 958 } 959 if m.bucket0.localDepth != 0 { 960 panic(fmt.Sprintf("expected local-depth=0, but found %d", m.bucket0.localDepth)) 961 } 962 } else { 963 for i, n := uint32(0), m.bucketCount(); i < n; i++ { 964 b := m.dir.At(uintptr(i)) 965 if b == nil { 966 panic(fmt.Sprintf("dir[%d]: nil bucket", i)) 967 } 968 if b.localDepth > m.globalDepth() { 969 panic(fmt.Sprintf("dir[%d]: local-depth=%d is greater than global-depth=%d", 970 i, b.localDepth, m.globalDepth())) 971 } 972 n := uint32(1) << (m.globalDepth() - b.localDepth) 973 if i < b.index || i >= b.index+n { 974 panic(fmt.Sprintf("dir[%d]: out of expected range [%d,%d)", i, b.index, b.index+n)) 975 } 976 } 977 } 978 } 979 } 980 981 func (b *bucket[K, V]) close(allocator Allocator[K, V]) { 982 if b.capacity > 0 { 983 allocator.Free(b.groups.Slice(0, uintptr(b.groupMask+1))) 984 b.capacity = 0 985 b.used = 0 986 } 987 b.groups = makeUnsafeSlice([]Group[K, V](nil)) 988 b.groupMask = 0 989 } 990 991 // tombstones returns the number of deleted (tombstone) entries in the bucket. 992 // A tombstone is a slot that has been deleted but is still considered 993 // occupied so as not to violate the probing invariant. 994 func (b *bucket[K, V]) tombstones() uint32 { 995 return (b.capacity*maxAvgGroupLoad)/groupSize - b.used - b.growthLeft 996 } 997 998 // uncheckedPut inserts an entry known not to be in the table. Used by Put 999 // after it has failed to find an existing entry to overwrite duration 1000 // insertion. 1001 func (b *bucket[K, V]) uncheckedPut(h uintptr, key K, value V) { 1002 if invariants && b.growthLeft == 0 { 1003 panic(fmt.Sprintf("invariant failed: growthLeft is unexpectedly 0\n%#v", b)) 1004 } 1005 1006 // Given key and its hash hash(key), to insert it, we construct a 1007 // probeSeq, and use it to find the first group with an unoccupied (empty 1008 // or deleted) slot. We place the key/value into the first such slot in 1009 // the group and mark it as full with key's H2. 1010 seq := makeProbeSeq(h1(h), b.groupMask) 1011 for ; ; seq = seq.next() { 1012 g := b.groups.At(uintptr(seq.offset)) 1013 match := g.ctrls.matchEmptyOrDeleted() 1014 if match != 0 { 1015 i := match.first() 1016 slot := g.slots.At(i) 1017 slot.key = key 1018 slot.value = value 1019 if g.ctrls.Get(i) == ctrlEmpty { 1020 b.growthLeft-- 1021 } 1022 g.ctrls.Set(i, ctrl(h2(h))) 1023 return 1024 } 1025 } 1026 } 1027 1028 func (b *bucket[K, V]) rehash(m *Map[K, V]) { 1029 // Rehash in place if we can recover >= 1/3 of the capacity. Note that 1030 // this heuristic differs from Abseil's and was experimentally determined 1031 // to balance performance on the PutDelete benchmark vs achieving a 1032 // reasonable load-factor. 1033 // 1034 // Abseil notes that in the worst case it takes ~4 Put/Delete pairs to 1035 // create a single tombstone. Rehashing in place is significantly faster 1036 // than resizing because the common case is that elements remain in their 1037 // current location. The performance of rehashInPlace is dominated by 1038 // recomputing the hash of every key. We know how much space we're going 1039 // to reclaim because every tombstone will be dropped and we're only 1040 // called if we've reached the thresold of capacity/8 empty slots. So the 1041 // number of tomstones is capacity*7/8 - used. 1042 if b.capacity > groupSize && b.tombstones() >= b.capacity/3 { 1043 b.rehashInPlace(m) 1044 return 1045 } 1046 1047 // If the newCapacity is larger than the maxBucketCapacity split the 1048 // bucket instead of resizing. Each of the new buckets will be the same 1049 // size as the current bucket. 1050 newCapacity := 2 * b.capacity 1051 if newCapacity > m.maxBucketCapacity { 1052 b.split(m) 1053 return 1054 } 1055 1056 b.resize(m, newCapacity) 1057 } 1058 1059 func (b *bucket[K, V]) init(m *Map[K, V], newCapacity uint32) { 1060 if newCapacity < groupSize { 1061 newCapacity = groupSize 1062 } 1063 1064 if invariants && newCapacity&(newCapacity-1) != 0 { 1065 panic(fmt.Sprintf("invariant failed: bucket size %d is not a power of 2", newCapacity)) 1066 } 1067 1068 b.capacity = newCapacity 1069 b.groupMask = b.capacity/groupSize - 1 1070 b.groups = makeUnsafeSlice(m.allocator.Alloc(int(b.groupMask + 1))) 1071 1072 for i := uint32(0); i <= b.groupMask; i++ { 1073 g := b.groups.At(uintptr(i)) 1074 g.ctrls.SetEmpty() 1075 } 1076 1077 b.resetGrowthLeft() 1078 } 1079 1080 // resize the capacity of the table by allocating a bigger array and 1081 // uncheckedPutting each element of the table into the new array (we know that 1082 // no insertion here will Put an already-present value), and discard the old 1083 // backing array. 1084 func (b *bucket[K, V]) resize(m *Map[K, V], newCapacity uint32) { 1085 if invariants && b != m.dir.At(uintptr(b.index)) { 1086 panic(fmt.Sprintf("invariant failed: attempt to resize bucket %p, but it is not at Map.dir[%d/%p]", 1087 b, b.index, m.dir.At(uintptr(b.index)))) 1088 } 1089 1090 oldGroups := b.groups 1091 oldGroupMask := b.groupMask 1092 oldCapacity := b.capacity 1093 b.init(m, newCapacity) 1094 1095 if oldCapacity > 0 { 1096 for i := uint32(0); i <= oldGroupMask; i++ { 1097 g := oldGroups.At(uintptr(i)) 1098 for j := uint32(0); j < groupSize; j++ { 1099 if (g.ctrls.Get(j) & ctrlEmpty) == ctrlEmpty { 1100 continue 1101 } 1102 slot := g.slots.At(j) 1103 h := m.hash(noescape(unsafe.Pointer(&slot.key)), m.seed) 1104 b.uncheckedPut(h, slot.key, slot.value) 1105 } 1106 } 1107 1108 m.allocator.Free(oldGroups.Slice(0, uintptr(oldGroupMask+1))) 1109 } 1110 1111 b = m.installBucket(b) 1112 b.checkInvariants(m) 1113 } 1114 1115 // split divides the entries in a bucket between the receiver and a new bucket 1116 // of the same size, and then installs the new bucket into the buckets 1117 // directory, growing the buckets directory if necessary. 1118 func (b *bucket[K, V]) split(m *Map[K, V]) { 1119 if invariants && b != m.dir.At(uintptr(b.index)) { 1120 panic(fmt.Sprintf("invariant failed: attempt to split bucket %p, but it is not at Map.dir[%d/%p]", 1121 b, b.index, m.dir.At(uintptr(b.index)))) 1122 } 1123 1124 // Create the new bucket as a clone of the bucket being split. If we're 1125 // splitting bucket0 we need to allocate a *bucket[K, V] for scratch 1126 // space. Otherwise we use bucket0 as the scratch space. 1127 var newb *bucket[K, V] 1128 if m.globalShift == 0 { 1129 newb = &bucket[K, V]{} 1130 } else { 1131 newb = &m.bucket0 1132 } 1133 *newb = bucket[K, V]{ 1134 localDepth: b.localDepth, 1135 index: b.index, 1136 } 1137 newb.init(m, b.capacity) 1138 1139 // Divide the records between the 2 buckets (b and newb). This is done by 1140 // examining the new bit in the hash that will be added to the bucket 1141 // index. If that bit is 0 the record stays in bucket b. If that bit is 1 1142 // the record is moved to bucket newb. We're relying on the bucket b 1143 // staying earlier in the directory than newb after the directory is 1144 // grown. 1145 mask := uintptr(1) << (ptrBits - (b.localDepth + 1)) 1146 for i := uint32(0); i <= b.groupMask; i++ { 1147 g := b.groups.At(uintptr(i)) 1148 for j := uint32(0); j < groupSize; j++ { 1149 if (g.ctrls.Get(j) & ctrlEmpty) == ctrlEmpty { 1150 continue 1151 } 1152 1153 s := g.slots.At(j) 1154 h := m.hash(noescape(unsafe.Pointer(&s.key)), m.seed) 1155 if (h & mask) == 0 { 1156 // Nothing to do, the record is staying in b. 1157 continue 1158 } 1159 1160 // Insert the record into newb. 1161 newb.uncheckedPut(h, s.key, s.value) 1162 newb.used++ 1163 1164 // Delete the record from b. 1165 if g.ctrls.matchEmpty() != 0 { 1166 g.ctrls.Set(j, ctrlEmpty) 1167 b.growthLeft++ 1168 } else { 1169 g.ctrls.Set(j, ctrlDeleted) 1170 } 1171 1172 *s = slot[K, V]{} 1173 b.used-- 1174 } 1175 } 1176 1177 if newb.used == 0 { 1178 // We didn't move any records to the new bucket. Either 1179 // maxBucketCapacity is too small and we got unlucky, or we have a 1180 // degenerate hash function (e.g. one that returns a constant in the 1181 // high bits). 1182 m.maxBucketCapacity = 2 * m.maxBucketCapacity 1183 newb.close(m.allocator) 1184 *newb = bucket[K, V]{} 1185 b.resize(m, 2*b.capacity) 1186 return 1187 } 1188 1189 if b.used == 0 { 1190 // We moved all of the records to the new bucket (note the two 1191 // conditions are equivalent and both are present merely for clarity). 1192 // Similar to the above, bump maxBucketCapacity and resize the bucket 1193 // rather than splitting. We'll replace the old bucket with the new 1194 // bucket in the directory. 1195 m.maxBucketCapacity = 2 * m.maxBucketCapacity 1196 b.close(m.allocator) 1197 newb = m.installBucket(newb) 1198 m.checkInvariants() 1199 newb.resize(m, 2*newb.capacity) 1200 return 1201 } 1202 1203 // We need to ensure bucket b, which we evacuated records from, has empty 1204 // slots as we may be inserting into it. We also want to drop any 1205 // tombstones that may have been left in bucket to ensure lookups for 1206 // non-existent keys don't have to traverse long probe chains. With a good 1207 // hash function, 50% of the entries in b should have been moved to newb, 1208 // so we should be able to drop tombstones corresponding to ~50% of the 1209 // entries. 1210 b.rehashInPlace(m) 1211 1212 // Grow the directory if necessary. 1213 if b.localDepth >= m.globalDepth() { 1214 // When the directory grows b will be invalidated. We pass in b's 1215 // index so that growDirectory will return the new index it resides 1216 // at. 1217 i := m.growDirectory(b.localDepth+1, b.index) 1218 b = m.dir.At(uintptr(i)) 1219 } 1220 1221 // Complete the split by incrementing the local depth for the 2 buckets 1222 // and installing the new bucket in the directory. 1223 b.localDepth++ 1224 m.installBucket(b) 1225 newb.localDepth = b.localDepth 1226 newb.index = b.index + bucketStep(m.globalDepth(), b.localDepth) 1227 m.installBucket(newb) 1228 *newb = bucket[K, V]{} 1229 1230 if invariants { 1231 m.checkInvariants() 1232 m.buckets(0, func(b *bucket[K, V]) bool { 1233 b.checkInvariants(m) 1234 return true 1235 }) 1236 } 1237 } 1238 1239 func (b *bucket[K, V]) rehashInPlace(m *Map[K, V]) { 1240 if invariants && b != m.dir.At(uintptr(b.index)) { 1241 panic(fmt.Sprintf("invariant failed: attempt to rehash bucket %p, but it is not at Map.dir[%d/%p]", 1242 b, b.index, m.dir.At(uintptr(b.index)))) 1243 } 1244 if b.capacity == 0 { 1245 return 1246 } 1247 1248 // We want to drop all of the deletes in place. We first walk over the 1249 // control bytes and mark every DELETED slot as EMPTY and every FULL slot 1250 // as DELETED. Marking the DELETED slots as EMPTY has effectively dropped 1251 // the tombstones, but we fouled up the probe invariant. Marking the FULL 1252 // slots as DELETED gives us a marker to locate the previously FULL slots. 1253 1254 // Mark all DELETED slots as EMPTY and all FULL slots as DELETED. 1255 for i := uint32(0); i <= b.groupMask; i++ { 1256 b.groups.At(uintptr(i)).ctrls.convertNonFullToEmptyAndFullToDeleted() 1257 } 1258 1259 // Now we walk over all of the DELETED slots (a.k.a. the previously FULL 1260 // slots). For each slot we find the first probe group we can place the 1261 // element in which reestablishes the probe invariant. Note that as this 1262 // loop proceeds we have the invariant that there are no DELETED slots in 1263 // the range [0, i). We may move the element at i to the range [0, i) if 1264 // that is where the first group with an empty slot in its probe chain 1265 // resides, but we never set a slot in [0, i) to DELETED. 1266 for i := uint32(0); i <= b.groupMask; i++ { 1267 g := b.groups.At(uintptr(i)) 1268 for j := uint32(0); j < groupSize; j++ { 1269 if g.ctrls.Get(j) != ctrlDeleted { 1270 continue 1271 } 1272 1273 s := g.slots.At(j) 1274 h := m.hash(noescape(unsafe.Pointer(&s.key)), m.seed) 1275 seq := makeProbeSeq(h1(h), b.groupMask) 1276 desiredOffset := seq.offset 1277 1278 var targetGroup *Group[K, V] 1279 var target uint32 1280 for ; ; seq = seq.next() { 1281 targetGroup = b.groups.At(uintptr(seq.offset)) 1282 if match := targetGroup.ctrls.matchEmptyOrDeleted(); match != 0 { 1283 target = match.first() 1284 break 1285 } 1286 } 1287 1288 switch { 1289 case i == desiredOffset: 1290 // If the target index falls within the first probe group 1291 // then we don't need to move the element as it already 1292 // falls in the best probe position. 1293 g.ctrls.Set(j, ctrl(h2(h))) 1294 1295 case targetGroup.ctrls.Get(target) == ctrlEmpty: 1296 // The target slot is empty. Transfer the element to the 1297 // empty slot and mark the slot at index i as empty. 1298 targetGroup.ctrls.Set(target, ctrl(h2(h))) 1299 *targetGroup.slots.At(target) = *s 1300 *s = slot[K, V]{} 1301 g.ctrls.Set(j, ctrlEmpty) 1302 1303 case targetGroup.ctrls.Get(target) == ctrlDeleted: 1304 // The slot at target has an element (i.e. it was FULL). 1305 // We're going to swap our current element with that 1306 // element and then repeat processing of index i which now 1307 // holds the element which was at target. 1308 targetGroup.ctrls.Set(target, ctrl(h2(h))) 1309 t := targetGroup.slots.At(target) 1310 *s, *t = *t, *s 1311 // Repeat processing of the j'th slot which now holds a 1312 // new key/value. 1313 j-- 1314 1315 default: 1316 panic(fmt.Sprintf("ctrl at position %d (%02x) should be empty or deleted", 1317 target, targetGroup.ctrls.Get(target))) 1318 } 1319 } 1320 } 1321 1322 b.resetGrowthLeft() 1323 b.growthLeft -= b.used 1324 1325 b.checkInvariants(m) 1326 } 1327 1328 func (b *bucket[K, V]) resetGrowthLeft() { 1329 var growthLeft int 1330 if b.capacity <= groupSize { 1331 // If the map fits in a single group then we're able to fill all of 1332 // the slots except 1 (an empty slot is needed to terminate find 1333 // operations). 1334 growthLeft = int(b.capacity - 1) 1335 } else { 1336 growthLeft = int((b.capacity * maxAvgGroupLoad) / groupSize) 1337 } 1338 if growthLeft < 0 { 1339 growthLeft = 0 1340 } 1341 b.growthLeft = uint32(growthLeft) 1342 } 1343 1344 // TODO(peter): Should this be removed? It was useful for debugging a 1345 // performance problem with BenchmarkGetMiss. 1346 func (b *bucket[K, V]) fullGroups() uint32 { 1347 var full uint32 1348 for i := uint32(0); i <= b.groupMask; i++ { 1349 g := b.groups.At(uintptr(i)) 1350 if g.ctrls.matchEmpty() == 0 { 1351 full++ 1352 } 1353 } 1354 return full 1355 } 1356 1357 func (b *bucket[K, V]) checkInvariants(m *Map[K, V]) { 1358 if invariants { 1359 // For every non-empty slot, verify we can retrieve the key using Get. 1360 // Count the number of used and deleted slots. 1361 var used uint32 1362 var deleted uint32 1363 var empty uint32 1364 for i := uint32(0); i <= b.groupMask; i++ { 1365 g := b.groups.At(uintptr(i)) 1366 for j := uint32(0); j < groupSize; j++ { 1367 c := g.ctrls.Get(j) 1368 switch { 1369 case c == ctrlDeleted: 1370 deleted++ 1371 case c == ctrlEmpty: 1372 empty++ 1373 default: 1374 slot := g.slots.At(j) 1375 if _, ok := m.Get(slot.key); !ok { 1376 h := m.hash(noescape(unsafe.Pointer(&slot.key)), m.seed) 1377 panic(fmt.Sprintf("invariant failed: slot(%d/%d): %v not found [h2=%02x h1=%07x]\n%#v", 1378 i, j, slot.key, h2(h), h1(h), b)) 1379 } 1380 used++ 1381 } 1382 } 1383 } 1384 1385 if used != b.used { 1386 panic(fmt.Sprintf("invariant failed: found %d used slots, but used count is %d\n%#v", 1387 used, b.used, b)) 1388 } 1389 1390 growthLeft := (b.capacity*maxAvgGroupLoad)/groupSize - b.used - deleted 1391 if growthLeft != b.growthLeft { 1392 panic(fmt.Sprintf("invariant failed: found %d growthLeft, but expected %d\n%#v", 1393 b.growthLeft, growthLeft, b)) 1394 } 1395 if deleted != b.tombstones() { 1396 panic(fmt.Sprintf("invariant failed: found %d tombstones, but expected %d\n%#v", 1397 deleted, b.tombstones(), b)) 1398 } 1399 1400 if empty == 0 { 1401 panic(fmt.Sprintf("invariant failed: found no empty slots (violates probe invariant)\n%#v", b)) 1402 } 1403 } 1404 } 1405 1406 // GoString implements the fmt.GoStringer interface which is used when 1407 // formatting using the "%#v" format specifier. 1408 func (b *bucket[K, V]) GoString() string { 1409 var buf strings.Builder 1410 b.goFormat(&buf) 1411 return buf.String() 1412 } 1413 1414 func (b *bucket[K, V]) goFormat(w io.Writer) { 1415 fmt.Fprintf(w, "capacity=%d used=%d growth-left=%d\n", b.capacity, b.used, b.growthLeft) 1416 for i := uint32(0); i <= b.groupMask; i++ { 1417 g := b.groups.At(uintptr(i)) 1418 fmt.Fprintf(w, " group %d\n", i) 1419 for j := uint32(0); j < groupSize; j++ { 1420 switch c := g.ctrls.Get(j); c { 1421 case ctrlEmpty: 1422 fmt.Fprintf(w, " %d: %02x [empty]\n", j, c) 1423 case ctrlDeleted: 1424 fmt.Fprintf(w, " %d: %02x [deleted]\n", j, c) 1425 default: 1426 slot := g.slots.At(j) 1427 fmt.Fprintf(w, " %d: %02x [%v:%v]\n", j, c, slot.key, slot.value) 1428 } 1429 } 1430 } 1431 } 1432 1433 // bitset represents a set of slots within a group. 1434 // 1435 // The underlying representation uses one byte per slot, where each byte is 1436 // either 0x80 if the slot is part of the set or 0x00 otherwise. This makes it 1437 // convenient to calculate for an entire group at once (e.g. see matchEmpty). 1438 type bitset uint64 1439 1440 // first assumes that only the MSB of each control byte can be set (e.g. bitset 1441 // is the result of matchEmpty or similar) and returns the relative index of the 1442 // first control byte in the group that has the MSB set. 1443 // 1444 // Returns 8 if the bitset is 0. 1445 // Returns groupSize if the bitset is empty. 1446 func (b bitset) first() uint32 { 1447 return uint32(bits.TrailingZeros64(uint64(b))) >> 3 1448 } 1449 1450 // removeFirst removes the first set bit (that is, resets the least significant set bit to 0). 1451 func (b bitset) removeFirst() bitset { 1452 return b & (b - 1) 1453 } 1454 1455 func (b bitset) String() string { 1456 var buf strings.Builder 1457 buf.Grow(groupSize) 1458 for i := 0; i < groupSize; i++ { 1459 if (b & (bitset(0x80) << (i << 3))) != 0 { 1460 buf.WriteString("1") 1461 } else { 1462 buf.WriteString("0") 1463 } 1464 } 1465 return buf.String() 1466 } 1467 1468 // Each slot in the hash table has a control byte which can have one of three 1469 // states: empty, deleted, and full. They have the following bit patterns: 1470 // 1471 // empty: 1 0 0 0 0 0 0 0 1472 // deleted: 1 1 1 1 1 1 1 0 1473 // full: 0 h h h h h h h // h represents the H1 hash bits 1474 type ctrl uint8 1475 1476 // ctrlGroup is a fixed size array of groupSize control bytes stored in a 1477 // uint64. 1478 type ctrlGroup uint64 1479 1480 // Get returns the i-th control byte. 1481 func (g *ctrlGroup) Get(i uint32) ctrl { 1482 return *(*ctrl)(unsafe.Add(unsafe.Pointer(g), i)) 1483 } 1484 1485 // Set sets the i-th control byte. 1486 func (g *ctrlGroup) Set(i uint32, c ctrl) { 1487 *(*ctrl)(unsafe.Add(unsafe.Pointer(g), i)) = c 1488 } 1489 1490 // SetEmpty sets all the control bytes to empty. 1491 func (g *ctrlGroup) SetEmpty() { 1492 *g = ctrlGroup(bitsetEmpty) 1493 } 1494 1495 // matchH2 returns the set of slots which are full and for which the 7-bit hash 1496 // matches the given value. May return false positives. 1497 func (g *ctrlGroup) matchH2(h uintptr) bitset { 1498 // NB: This generic matching routine produces false positive matches when 1499 // h is 2^N and the control bytes have a seq of 2^N followed by 2^N+1. For 1500 // example: if ctrls==0x0302 and h=02, we'll compute v as 0x0100. When we 1501 // subtract off 0x0101 the first 2 bytes we'll become 0xffff and both be 1502 // considered matches of h. The false positive matches are not a problem, 1503 // just a rare inefficiency. Note that they only occur if there is a real 1504 // match and never occur on ctrlEmpty, or ctrlDeleted. The subsequent key 1505 // comparisons ensure that there is no correctness issue. 1506 v := uint64(*g) ^ (bitsetLSB * uint64(h)) 1507 return bitset(((v - bitsetLSB) &^ v) & bitsetMSB) 1508 } 1509 1510 // matchEmpty returns the set of slots in the group that are empty. 1511 func (g *ctrlGroup) matchEmpty() bitset { 1512 // An empty slot is 1000 0000 1513 // A deleted slot is 1111 1110 1514 // A full slot is 0??? ???? 1515 // 1516 // A slot is empty iff bit 7 is set and bit 1 is not. We could select any 1517 // of the other bits here (e.g. v << 1 would also work). 1518 v := uint64(*g) 1519 return bitset((v &^ (v << 6)) & bitsetMSB) 1520 } 1521 1522 // matchEmptyOrDeleted returns the set of slots in the group that are empty or 1523 // deleted. 1524 func (g *ctrlGroup) matchEmptyOrDeleted() bitset { 1525 // An empty slot is 1000 0000 1526 // A deleted slot is 1111 1110 1527 // A full slot is 0??? ???? 1528 // 1529 // A slot is empty or deleted iff bit 7 is set and bit 0 is not. 1530 v := uint64(*g) 1531 return bitset((v &^ (v << 7)) & bitsetMSB) 1532 } 1533 1534 // convertNonFullToEmptyAndFullToDeleted converts deleted control bytes in a 1535 // group to empty control bytes, and control bytes indicating full slots to 1536 // deleted control bytes. 1537 func (g *ctrlGroup) convertNonFullToEmptyAndFullToDeleted() { 1538 // An empty slot is 1000 0000 1539 // A deleted slot is 1111 1110 1540 // A full slot is 0??? ???? 1541 // 1542 // We select the MSB, invert, add 1 if the MSB was set and zero out the low 1543 // bit. 1544 // 1545 // - if the MSB was set (i.e. slot was empty, or deleted): 1546 // v: 1000 0000 1547 // ^v: 0111 1111 1548 // ^v + (v >> 7): 1000 0000 1549 // &^ bitsetLSB: 1000 0000 = empty slot. 1550 // 1551 // - if the MSB was not set (i.e. full slot): 1552 // v: 0000 0000 1553 // ^v: 1111 1111 1554 // ^v + (v >> 7): 1111 1111 1555 // &^ bitsetLSB: 1111 1110 = deleted slot. 1556 // 1557 v := uint64(*g) & bitsetMSB 1558 *g = ctrlGroup((^v + (v >> 7)) &^ bitsetLSB) 1559 } 1560 1561 func (g *ctrlGroup) String() string { 1562 var buf strings.Builder 1563 buf.Grow(groupSize) 1564 1565 for i := uint32(0); i < groupSize; i++ { 1566 fmt.Fprintf(&buf, "%02x ", g.Get(i)) 1567 } 1568 return buf.String() 1569 } 1570 1571 // slotGroup is a fixed size array of groupSize slots. 1572 // 1573 // The keys and values are stored interleaved in slots with a memory layout 1574 // that looks like K/V/K/V/K/V/K/V.../K/V. An alternate layout would be have 1575 // parallel arrays of keys and values: K/K/K/K.../K/V/V/V/V.../V. The latter 1576 // has better space utilization if you have something like uint64 keys and 1577 // bool values, but it is measurably slower at large map sizes. Below shows 1578 // this perf hit with a Map[uint64,uint64]. The problem is due to the key and 1579 // value being stored in separate cache lines. 1580 // 1581 // MapGetHit/swissMap/Int64/262144-10 9.31ns ± 3% 10.04ns ± 3% +7.81% (p=0.008 n=5+5) 1582 // MapGetHit/swissMap/Int64/524288-10 16.7ns ± 1% 18.2ns ± 3% +8.98% (p=0.008 n=5+5) 1583 // MapGetHit/swissMap/Int64/1048576-10 24.7ns ± 2% 27.6ns ± 0% +11.74% (p=0.008 n=5+5) 1584 // MapGetHit/swissMap/Int64/2097152-10 33.3ns ± 1% 37.7ns ± 1% +13.12% (p=0.008 n=5+5) 1585 // MapGetHit/swissMap/Int64/4194304-10 36.6ns ± 0% 43.0ns ± 1% +17.37% (p=0.008 n=5+5) 1586 type slotGroup[K comparable, V any] struct { 1587 slots [groupSize]slot[K, V] 1588 } 1589 1590 func (g *slotGroup[K, V]) At(i uint32) *slot[K, V] { 1591 return (*slot[K, V])(unsafe.Add(unsafe.Pointer(&g.slots[0]), uintptr(i)*unsafe.Sizeof(g.slots[0]))) 1592 } 1593 1594 // emptyCtrls is a singleton for a single empty groupSize set of controls. 1595 var emptyCtrls = func() []ctrl { 1596 var v [groupSize]ctrl 1597 for i := uint32(0); i < groupSize; i++ { 1598 v[i] = ctrlEmpty 1599 } 1600 return v[:] 1601 }() 1602 1603 // probeSeq maintains the state for a probe sequence that iterates through the 1604 // groups in a bucket. The sequence is a triangular progression of the form 1605 // 1606 // p(i) := (i^2 + i)/2 + hash (mod mask+1) 1607 // 1608 // The sequence effectively outputs the indexes of *groups*. The group 1609 // machinery allows us to check an entire group with minimal branching. 1610 // 1611 // It turns out that this probe sequence visits every group exactly once if 1612 // the number of groups is a power of two, since (i^2+i)/2 is a bijection in 1613 // Z/(2^m). See https://en.wikipedia.org/wiki/Quadratic_probing 1614 type probeSeq struct { 1615 mask uint32 1616 offset uint32 1617 index uint32 1618 } 1619 1620 func makeProbeSeq(hash uintptr, mask uint32) probeSeq { 1621 return probeSeq{ 1622 mask: mask, 1623 offset: uint32(hash) & mask, 1624 index: 0, 1625 } 1626 } 1627 1628 func (s probeSeq) next() probeSeq { 1629 s.index++ 1630 s.offset = (s.offset + s.index) & s.mask 1631 return s 1632 } 1633 1634 func (s probeSeq) String() string { 1635 return fmt.Sprintf("mask=%d offset=%d index=%d", s.mask, s.offset, s.index) 1636 } 1637 1638 // Extracts the H1 portion of a hash: the 57 upper bits. 1639 func h1(h uintptr) uintptr { 1640 return h >> 7 1641 } 1642 1643 // Extracts the H2 portion of a hash: the 7 bits not used for h1. 1644 // 1645 // These are used as an occupied control byte. 1646 func h2(h uintptr) uintptr { 1647 return h & 0x7f 1648 } 1649 1650 // noescape hides a pointer from escape analysis. noescape is 1651 // the identity function but escape analysis doesn't think the 1652 // output depends on the input. noescape is inlined and currently 1653 // compiles down to zero instructions. 1654 // USE CAREFULLY! 1655 // 1656 //go:nosplit 1657 //go:nocheckptr 1658 func noescape(p unsafe.Pointer) unsafe.Pointer { 1659 x := uintptr(p) 1660 return unsafe.Pointer(x ^ 0) 1661 } 1662 1663 // unsafeSlice provides semi-ergonomic limited slice-like functionality 1664 // without bounds checking for fixed sized slices. 1665 type unsafeSlice[T any] struct { 1666 ptr unsafe.Pointer 1667 } 1668 1669 func makeUnsafeSlice[T any](s []T) unsafeSlice[T] { 1670 return unsafeSlice[T]{ptr: unsafe.Pointer(unsafe.SliceData(s))} 1671 } 1672 1673 // At returns a pointer to the element at index i. 1674 // 1675 // The go:nocheckptr declaration is need to silence the runtime check in race 1676 // builds that the memory for the returned *T is entirely contained within a 1677 // single memory allocation. We are "safely" violating this requirement when 1678 // access Groups.ctrls for the empty group. See unsafeConvertSlice for 1679 // additional commentary. 1680 // 1681 //go:nocheckptr 1682 func (s unsafeSlice[T]) At(i uintptr) *T { 1683 var t T 1684 return (*T)(unsafe.Add(s.ptr, unsafe.Sizeof(t)*i)) 1685 } 1686 1687 // Slice returns a Go slice akin to slice[start:end] for a Go builtin slice. 1688 func (s unsafeSlice[T]) Slice(start, end uintptr) []T { 1689 return unsafe.Slice((*T)(s.ptr), end)[start:end] 1690 } 1691 1692 // unsafeConvertSlice (unsafely) casts a []Src to a []Dest. The go:nocheckptr 1693 // declaration is needed to silence the runtime check in race builds that the 1694 // memory for the []Dest is entirely contained within a single memory 1695 // allocation. We are "safely" violating this requirement when casting 1696 // emptyCtrls (a []ctrl) to an empty group ([]*Group[K, V]). The reason this 1697 // is safe is that we're never accessing Group.slots because the controls are 1698 // all marked as empty. 1699 // 1700 //go:nocheckptr 1701 func unsafeConvertSlice[Dest any, Src any](s []Src) []Dest { 1702 return unsafe.Slice((*Dest)(unsafe.Pointer(unsafe.SliceData(s))), len(s)) 1703 } 1704 1705 // noCopy may be added to structs which must not be copied 1706 // after the first use. 1707 // 1708 // See https://golang.org/issues/8005#issuecomment-190753527 1709 // for details. 1710 // 1711 // Note that it must not be embedded, due to the Lock and Unlock methods. 1712 type noCopy struct{} 1713 1714 // Lock is a no-op used by -copylocks checker from `go vet`. 1715 func (*noCopy) Lock() {} 1716 func (*noCopy) Unlock() {}