github.com/twelsh-aw/go/src@v0.0.0-20230516233729-a56fe86a7c81/runtime/mgc.go (about) 1 // Copyright 2009 The Go Authors. All rights reserved. 2 // Use of this source code is governed by a BSD-style 3 // license that can be found in the LICENSE file. 4 5 // Garbage collector (GC). 6 // 7 // The GC runs concurrently with mutator threads, is type accurate (aka precise), allows multiple 8 // GC thread to run in parallel. It is a concurrent mark and sweep that uses a write barrier. It is 9 // non-generational and non-compacting. Allocation is done using size segregated per P allocation 10 // areas to minimize fragmentation while eliminating locks in the common case. 11 // 12 // The algorithm decomposes into several steps. 13 // This is a high level description of the algorithm being used. For an overview of GC a good 14 // place to start is Richard Jones' gchandbook.org. 15 // 16 // The algorithm's intellectual heritage includes Dijkstra's on-the-fly algorithm, see 17 // Edsger W. Dijkstra, Leslie Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens. 1978. 18 // On-the-fly garbage collection: an exercise in cooperation. Commun. ACM 21, 11 (November 1978), 19 // 966-975. 20 // For journal quality proofs that these steps are complete, correct, and terminate see 21 // Hudson, R., and Moss, J.E.B. Copying Garbage Collection without stopping the world. 22 // Concurrency and Computation: Practice and Experience 15(3-5), 2003. 23 // 24 // 1. GC performs sweep termination. 25 // 26 // a. Stop the world. This causes all Ps to reach a GC safe-point. 27 // 28 // b. Sweep any unswept spans. There will only be unswept spans if 29 // this GC cycle was forced before the expected time. 30 // 31 // 2. GC performs the mark phase. 32 // 33 // a. Prepare for the mark phase by setting gcphase to _GCmark 34 // (from _GCoff), enabling the write barrier, enabling mutator 35 // assists, and enqueueing root mark jobs. No objects may be 36 // scanned until all Ps have enabled the write barrier, which is 37 // accomplished using STW. 38 // 39 // b. Start the world. From this point, GC work is done by mark 40 // workers started by the scheduler and by assists performed as 41 // part of allocation. The write barrier shades both the 42 // overwritten pointer and the new pointer value for any pointer 43 // writes (see mbarrier.go for details). Newly allocated objects 44 // are immediately marked black. 45 // 46 // c. GC performs root marking jobs. This includes scanning all 47 // stacks, shading all globals, and shading any heap pointers in 48 // off-heap runtime data structures. Scanning a stack stops a 49 // goroutine, shades any pointers found on its stack, and then 50 // resumes the goroutine. 51 // 52 // d. GC drains the work queue of grey objects, scanning each grey 53 // object to black and shading all pointers found in the object 54 // (which in turn may add those pointers to the work queue). 55 // 56 // e. Because GC work is spread across local caches, GC uses a 57 // distributed termination algorithm to detect when there are no 58 // more root marking jobs or grey objects (see gcMarkDone). At this 59 // point, GC transitions to mark termination. 60 // 61 // 3. GC performs mark termination. 62 // 63 // a. Stop the world. 64 // 65 // b. Set gcphase to _GCmarktermination, and disable workers and 66 // assists. 67 // 68 // c. Perform housekeeping like flushing mcaches. 69 // 70 // 4. GC performs the sweep phase. 71 // 72 // a. Prepare for the sweep phase by setting gcphase to _GCoff, 73 // setting up sweep state and disabling the write barrier. 74 // 75 // b. Start the world. From this point on, newly allocated objects 76 // are white, and allocating sweeps spans before use if necessary. 77 // 78 // c. GC does concurrent sweeping in the background and in response 79 // to allocation. See description below. 80 // 81 // 5. When sufficient allocation has taken place, replay the sequence 82 // starting with 1 above. See discussion of GC rate below. 83 84 // Concurrent sweep. 85 // 86 // The sweep phase proceeds concurrently with normal program execution. 87 // The heap is swept span-by-span both lazily (when a goroutine needs another span) 88 // and concurrently in a background goroutine (this helps programs that are not CPU bound). 89 // At the end of STW mark termination all spans are marked as "needs sweeping". 90 // 91 // The background sweeper goroutine simply sweeps spans one-by-one. 92 // 93 // To avoid requesting more OS memory while there are unswept spans, when a 94 // goroutine needs another span, it first attempts to reclaim that much memory 95 // by sweeping. When a goroutine needs to allocate a new small-object span, it 96 // sweeps small-object spans for the same object size until it frees at least 97 // one object. When a goroutine needs to allocate large-object span from heap, 98 // it sweeps spans until it frees at least that many pages into heap. There is 99 // one case where this may not suffice: if a goroutine sweeps and frees two 100 // nonadjacent one-page spans to the heap, it will allocate a new two-page 101 // span, but there can still be other one-page unswept spans which could be 102 // combined into a two-page span. 103 // 104 // It's critical to ensure that no operations proceed on unswept spans (that would corrupt 105 // mark bits in GC bitmap). During GC all mcaches are flushed into the central cache, 106 // so they are empty. When a goroutine grabs a new span into mcache, it sweeps it. 107 // When a goroutine explicitly frees an object or sets a finalizer, it ensures that 108 // the span is swept (either by sweeping it, or by waiting for the concurrent sweep to finish). 109 // The finalizer goroutine is kicked off only when all spans are swept. 110 // When the next GC starts, it sweeps all not-yet-swept spans (if any). 111 112 // GC rate. 113 // Next GC is after we've allocated an extra amount of memory proportional to 114 // the amount already in use. The proportion is controlled by GOGC environment variable 115 // (100 by default). If GOGC=100 and we're using 4M, we'll GC again when we get to 8M 116 // (this mark is computed by the gcController.heapGoal method). This keeps the GC cost in 117 // linear proportion to the allocation cost. Adjusting GOGC just changes the linear constant 118 // (and also the amount of extra memory used). 119 120 // Oblets 121 // 122 // In order to prevent long pauses while scanning large objects and to 123 // improve parallelism, the garbage collector breaks up scan jobs for 124 // objects larger than maxObletBytes into "oblets" of at most 125 // maxObletBytes. When scanning encounters the beginning of a large 126 // object, it scans only the first oblet and enqueues the remaining 127 // oblets as new scan jobs. 128 129 package runtime 130 131 import ( 132 "internal/cpu" 133 "runtime/internal/atomic" 134 "unsafe" 135 ) 136 137 const ( 138 _DebugGC = 0 139 _ConcurrentSweep = true 140 _FinBlockSize = 4 * 1024 141 142 // debugScanConservative enables debug logging for stack 143 // frames that are scanned conservatively. 144 debugScanConservative = false 145 146 // sweepMinHeapDistance is a lower bound on the heap distance 147 // (in bytes) reserved for concurrent sweeping between GC 148 // cycles. 149 sweepMinHeapDistance = 1024 * 1024 150 ) 151 152 func gcinit() { 153 if unsafe.Sizeof(workbuf{}) != _WorkbufSize { 154 throw("size of Workbuf is suboptimal") 155 } 156 // No sweep on the first cycle. 157 sweep.active.state.Store(sweepDrainedMask) 158 159 // Initialize GC pacer state. 160 // Use the environment variable GOGC for the initial gcPercent value. 161 // Use the environment variable GOMEMLIMIT for the initial memoryLimit value. 162 gcController.init(readGOGC(), readGOMEMLIMIT()) 163 164 work.startSema = 1 165 work.markDoneSema = 1 166 lockInit(&work.sweepWaiters.lock, lockRankSweepWaiters) 167 lockInit(&work.assistQueue.lock, lockRankAssistQueue) 168 lockInit(&work.wbufSpans.lock, lockRankWbufSpans) 169 } 170 171 // gcenable is called after the bulk of the runtime initialization, 172 // just before we're about to start letting user code run. 173 // It kicks off the background sweeper goroutine, the background 174 // scavenger goroutine, and enables GC. 175 func gcenable() { 176 // Kick off sweeping and scavenging. 177 c := make(chan int, 2) 178 go bgsweep(c) 179 go bgscavenge(c) 180 <-c 181 <-c 182 memstats.enablegc = true // now that runtime is initialized, GC is okay 183 } 184 185 // Garbage collector phase. 186 // Indicates to write barrier and synchronization task to perform. 187 var gcphase uint32 188 189 // The compiler knows about this variable. 190 // If you change it, you must change builtin/runtime.go, too. 191 // If you change the first four bytes, you must also change the write 192 // barrier insertion code. 193 var writeBarrier struct { 194 enabled bool // compiler emits a check of this before calling write barrier 195 pad [3]byte // compiler uses 32-bit load for "enabled" field 196 needed bool // identical to enabled, for now (TODO: dedup) 197 alignme uint64 // guarantee alignment so that compiler can use a 32 or 64-bit load 198 } 199 200 // gcBlackenEnabled is 1 if mutator assists and background mark 201 // workers are allowed to blacken objects. This must only be set when 202 // gcphase == _GCmark. 203 var gcBlackenEnabled uint32 204 205 const ( 206 _GCoff = iota // GC not running; sweeping in background, write barrier disabled 207 _GCmark // GC marking roots and workbufs: allocate black, write barrier ENABLED 208 _GCmarktermination // GC mark termination: allocate black, P's help GC, write barrier ENABLED 209 ) 210 211 //go:nosplit 212 func setGCPhase(x uint32) { 213 atomic.Store(&gcphase, x) 214 writeBarrier.needed = gcphase == _GCmark || gcphase == _GCmarktermination 215 writeBarrier.enabled = writeBarrier.needed 216 } 217 218 // gcMarkWorkerMode represents the mode that a concurrent mark worker 219 // should operate in. 220 // 221 // Concurrent marking happens through four different mechanisms. One 222 // is mutator assists, which happen in response to allocations and are 223 // not scheduled. The other three are variations in the per-P mark 224 // workers and are distinguished by gcMarkWorkerMode. 225 type gcMarkWorkerMode int 226 227 const ( 228 // gcMarkWorkerNotWorker indicates that the next scheduled G is not 229 // starting work and the mode should be ignored. 230 gcMarkWorkerNotWorker gcMarkWorkerMode = iota 231 232 // gcMarkWorkerDedicatedMode indicates that the P of a mark 233 // worker is dedicated to running that mark worker. The mark 234 // worker should run without preemption. 235 gcMarkWorkerDedicatedMode 236 237 // gcMarkWorkerFractionalMode indicates that a P is currently 238 // running the "fractional" mark worker. The fractional worker 239 // is necessary when GOMAXPROCS*gcBackgroundUtilization is not 240 // an integer and using only dedicated workers would result in 241 // utilization too far from the target of gcBackgroundUtilization. 242 // The fractional worker should run until it is preempted and 243 // will be scheduled to pick up the fractional part of 244 // GOMAXPROCS*gcBackgroundUtilization. 245 gcMarkWorkerFractionalMode 246 247 // gcMarkWorkerIdleMode indicates that a P is running the mark 248 // worker because it has nothing else to do. The idle worker 249 // should run until it is preempted and account its time 250 // against gcController.idleMarkTime. 251 gcMarkWorkerIdleMode 252 ) 253 254 // gcMarkWorkerModeStrings are the strings labels of gcMarkWorkerModes 255 // to use in execution traces. 256 var gcMarkWorkerModeStrings = [...]string{ 257 "Not worker", 258 "GC (dedicated)", 259 "GC (fractional)", 260 "GC (idle)", 261 } 262 263 // pollFractionalWorkerExit reports whether a fractional mark worker 264 // should self-preempt. It assumes it is called from the fractional 265 // worker. 266 func pollFractionalWorkerExit() bool { 267 // This should be kept in sync with the fractional worker 268 // scheduler logic in findRunnableGCWorker. 269 now := nanotime() 270 delta := now - gcController.markStartTime 271 if delta <= 0 { 272 return true 273 } 274 p := getg().m.p.ptr() 275 selfTime := p.gcFractionalMarkTime + (now - p.gcMarkWorkerStartTime) 276 // Add some slack to the utilization goal so that the 277 // fractional worker isn't behind again the instant it exits. 278 return float64(selfTime)/float64(delta) > 1.2*gcController.fractionalUtilizationGoal 279 } 280 281 var work workType 282 283 type workType struct { 284 full lfstack // lock-free list of full blocks workbuf 285 empty lfstack // lock-free list of empty blocks workbuf 286 pad0 cpu.CacheLinePad // prevents false-sharing between full/empty and nproc/nwait 287 288 wbufSpans struct { 289 lock mutex 290 // free is a list of spans dedicated to workbufs, but 291 // that don't currently contain any workbufs. 292 free mSpanList 293 // busy is a list of all spans containing workbufs on 294 // one of the workbuf lists. 295 busy mSpanList 296 } 297 298 // Restore 64-bit alignment on 32-bit. 299 _ uint32 300 301 // bytesMarked is the number of bytes marked this cycle. This 302 // includes bytes blackened in scanned objects, noscan objects 303 // that go straight to black, and permagrey objects scanned by 304 // markroot during the concurrent scan phase. This is updated 305 // atomically during the cycle. Updates may be batched 306 // arbitrarily, since the value is only read at the end of the 307 // cycle. 308 // 309 // Because of benign races during marking, this number may not 310 // be the exact number of marked bytes, but it should be very 311 // close. 312 // 313 // Put this field here because it needs 64-bit atomic access 314 // (and thus 8-byte alignment even on 32-bit architectures). 315 bytesMarked uint64 316 317 markrootNext uint32 // next markroot job 318 markrootJobs uint32 // number of markroot jobs 319 320 nproc uint32 321 tstart int64 322 nwait uint32 323 324 // Number of roots of various root types. Set by gcMarkRootPrepare. 325 // 326 // nStackRoots == len(stackRoots), but we have nStackRoots for 327 // consistency. 328 nDataRoots, nBSSRoots, nSpanRoots, nStackRoots int 329 330 // Base indexes of each root type. Set by gcMarkRootPrepare. 331 baseData, baseBSS, baseSpans, baseStacks, baseEnd uint32 332 333 // stackRoots is a snapshot of all of the Gs that existed 334 // before the beginning of concurrent marking. The backing 335 // store of this must not be modified because it might be 336 // shared with allgs. 337 stackRoots []*g 338 339 // Each type of GC state transition is protected by a lock. 340 // Since multiple threads can simultaneously detect the state 341 // transition condition, any thread that detects a transition 342 // condition must acquire the appropriate transition lock, 343 // re-check the transition condition and return if it no 344 // longer holds or perform the transition if it does. 345 // Likewise, any transition must invalidate the transition 346 // condition before releasing the lock. This ensures that each 347 // transition is performed by exactly one thread and threads 348 // that need the transition to happen block until it has 349 // happened. 350 // 351 // startSema protects the transition from "off" to mark or 352 // mark termination. 353 startSema uint32 354 // markDoneSema protects transitions from mark to mark termination. 355 markDoneSema uint32 356 357 bgMarkReady note // signal background mark worker has started 358 bgMarkDone uint32 // cas to 1 when at a background mark completion point 359 // Background mark completion signaling 360 361 // mode is the concurrency mode of the current GC cycle. 362 mode gcMode 363 364 // userForced indicates the current GC cycle was forced by an 365 // explicit user call. 366 userForced bool 367 368 // initialHeapLive is the value of gcController.heapLive at the 369 // beginning of this GC cycle. 370 initialHeapLive uint64 371 372 // assistQueue is a queue of assists that are blocked because 373 // there was neither enough credit to steal or enough work to 374 // do. 375 assistQueue struct { 376 lock mutex 377 q gQueue 378 } 379 380 // sweepWaiters is a list of blocked goroutines to wake when 381 // we transition from mark termination to sweep. 382 sweepWaiters struct { 383 lock mutex 384 list gList 385 } 386 387 // cycles is the number of completed GC cycles, where a GC 388 // cycle is sweep termination, mark, mark termination, and 389 // sweep. This differs from memstats.numgc, which is 390 // incremented at mark termination. 391 cycles atomic.Uint32 392 393 // Timing/utilization stats for this cycle. 394 stwprocs, maxprocs int32 395 tSweepTerm, tMark, tMarkTerm, tEnd int64 // nanotime() of phase start 396 397 pauseNS int64 // total STW time this cycle 398 pauseStart int64 // nanotime() of last STW 399 400 // debug.gctrace heap sizes for this cycle. 401 heap0, heap1, heap2 uint64 402 403 // Cumulative estimated CPU usage. 404 cpuStats 405 } 406 407 // GC runs a garbage collection and blocks the caller until the 408 // garbage collection is complete. It may also block the entire 409 // program. 410 func GC() { 411 // We consider a cycle to be: sweep termination, mark, mark 412 // termination, and sweep. This function shouldn't return 413 // until a full cycle has been completed, from beginning to 414 // end. Hence, we always want to finish up the current cycle 415 // and start a new one. That means: 416 // 417 // 1. In sweep termination, mark, or mark termination of cycle 418 // N, wait until mark termination N completes and transitions 419 // to sweep N. 420 // 421 // 2. In sweep N, help with sweep N. 422 // 423 // At this point we can begin a full cycle N+1. 424 // 425 // 3. Trigger cycle N+1 by starting sweep termination N+1. 426 // 427 // 4. Wait for mark termination N+1 to complete. 428 // 429 // 5. Help with sweep N+1 until it's done. 430 // 431 // This all has to be written to deal with the fact that the 432 // GC may move ahead on its own. For example, when we block 433 // until mark termination N, we may wake up in cycle N+2. 434 435 // Wait until the current sweep termination, mark, and mark 436 // termination complete. 437 n := work.cycles.Load() 438 gcWaitOnMark(n) 439 440 // We're now in sweep N or later. Trigger GC cycle N+1, which 441 // will first finish sweep N if necessary and then enter sweep 442 // termination N+1. 443 gcStart(gcTrigger{kind: gcTriggerCycle, n: n + 1}) 444 445 // Wait for mark termination N+1 to complete. 446 gcWaitOnMark(n + 1) 447 448 // Finish sweep N+1 before returning. We do this both to 449 // complete the cycle and because runtime.GC() is often used 450 // as part of tests and benchmarks to get the system into a 451 // relatively stable and isolated state. 452 for work.cycles.Load() == n+1 && sweepone() != ^uintptr(0) { 453 sweep.nbgsweep++ 454 Gosched() 455 } 456 457 // Callers may assume that the heap profile reflects the 458 // just-completed cycle when this returns (historically this 459 // happened because this was a STW GC), but right now the 460 // profile still reflects mark termination N, not N+1. 461 // 462 // As soon as all of the sweep frees from cycle N+1 are done, 463 // we can go ahead and publish the heap profile. 464 // 465 // First, wait for sweeping to finish. (We know there are no 466 // more spans on the sweep queue, but we may be concurrently 467 // sweeping spans, so we have to wait.) 468 for work.cycles.Load() == n+1 && !isSweepDone() { 469 Gosched() 470 } 471 472 // Now we're really done with sweeping, so we can publish the 473 // stable heap profile. Only do this if we haven't already hit 474 // another mark termination. 475 mp := acquirem() 476 cycle := work.cycles.Load() 477 if cycle == n+1 || (gcphase == _GCmark && cycle == n+2) { 478 mProf_PostSweep() 479 } 480 releasem(mp) 481 } 482 483 // gcWaitOnMark blocks until GC finishes the Nth mark phase. If GC has 484 // already completed this mark phase, it returns immediately. 485 func gcWaitOnMark(n uint32) { 486 for { 487 // Disable phase transitions. 488 lock(&work.sweepWaiters.lock) 489 nMarks := work.cycles.Load() 490 if gcphase != _GCmark { 491 // We've already completed this cycle's mark. 492 nMarks++ 493 } 494 if nMarks > n { 495 // We're done. 496 unlock(&work.sweepWaiters.lock) 497 return 498 } 499 500 // Wait until sweep termination, mark, and mark 501 // termination of cycle N complete. 502 work.sweepWaiters.list.push(getg()) 503 goparkunlock(&work.sweepWaiters.lock, waitReasonWaitForGCCycle, traceEvGoBlock, 1) 504 } 505 } 506 507 // gcMode indicates how concurrent a GC cycle should be. 508 type gcMode int 509 510 const ( 511 gcBackgroundMode gcMode = iota // concurrent GC and sweep 512 gcForceMode // stop-the-world GC now, concurrent sweep 513 gcForceBlockMode // stop-the-world GC now and STW sweep (forced by user) 514 ) 515 516 // A gcTrigger is a predicate for starting a GC cycle. Specifically, 517 // it is an exit condition for the _GCoff phase. 518 type gcTrigger struct { 519 kind gcTriggerKind 520 now int64 // gcTriggerTime: current time 521 n uint32 // gcTriggerCycle: cycle number to start 522 } 523 524 type gcTriggerKind int 525 526 const ( 527 // gcTriggerHeap indicates that a cycle should be started when 528 // the heap size reaches the trigger heap size computed by the 529 // controller. 530 gcTriggerHeap gcTriggerKind = iota 531 532 // gcTriggerTime indicates that a cycle should be started when 533 // it's been more than forcegcperiod nanoseconds since the 534 // previous GC cycle. 535 gcTriggerTime 536 537 // gcTriggerCycle indicates that a cycle should be started if 538 // we have not yet started cycle number gcTrigger.n (relative 539 // to work.cycles). 540 gcTriggerCycle 541 ) 542 543 // test reports whether the trigger condition is satisfied, meaning 544 // that the exit condition for the _GCoff phase has been met. The exit 545 // condition should be tested when allocating. 546 func (t gcTrigger) test() bool { 547 if !memstats.enablegc || panicking.Load() != 0 || gcphase != _GCoff { 548 return false 549 } 550 switch t.kind { 551 case gcTriggerHeap: 552 // Non-atomic access to gcController.heapLive for performance. If 553 // we are going to trigger on this, this thread just 554 // atomically wrote gcController.heapLive anyway and we'll see our 555 // own write. 556 trigger, _ := gcController.trigger() 557 return gcController.heapLive.Load() >= trigger 558 case gcTriggerTime: 559 if gcController.gcPercent.Load() < 0 { 560 return false 561 } 562 lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime)) 563 return lastgc != 0 && t.now-lastgc > forcegcperiod 564 case gcTriggerCycle: 565 // t.n > work.cycles, but accounting for wraparound. 566 return int32(t.n-work.cycles.Load()) > 0 567 } 568 return true 569 } 570 571 // gcStart starts the GC. It transitions from _GCoff to _GCmark (if 572 // debug.gcstoptheworld == 0) or performs all of GC (if 573 // debug.gcstoptheworld != 0). 574 // 575 // This may return without performing this transition in some cases, 576 // such as when called on a system stack or with locks held. 577 func gcStart(trigger gcTrigger) { 578 // Since this is called from malloc and malloc is called in 579 // the guts of a number of libraries that might be holding 580 // locks, don't attempt to start GC in non-preemptible or 581 // potentially unstable situations. 582 mp := acquirem() 583 if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != "" { 584 releasem(mp) 585 return 586 } 587 releasem(mp) 588 mp = nil 589 590 // Pick up the remaining unswept/not being swept spans concurrently 591 // 592 // This shouldn't happen if we're being invoked in background 593 // mode since proportional sweep should have just finished 594 // sweeping everything, but rounding errors, etc, may leave a 595 // few spans unswept. In forced mode, this is necessary since 596 // GC can be forced at any point in the sweeping cycle. 597 // 598 // We check the transition condition continuously here in case 599 // this G gets delayed in to the next GC cycle. 600 for trigger.test() && sweepone() != ^uintptr(0) { 601 sweep.nbgsweep++ 602 } 603 604 // Perform GC initialization and the sweep termination 605 // transition. 606 semacquire(&work.startSema) 607 // Re-check transition condition under transition lock. 608 if !trigger.test() { 609 semrelease(&work.startSema) 610 return 611 } 612 613 // In gcstoptheworld debug mode, upgrade the mode accordingly. 614 // We do this after re-checking the transition condition so 615 // that multiple goroutines that detect the heap trigger don't 616 // start multiple STW GCs. 617 mode := gcBackgroundMode 618 if debug.gcstoptheworld == 1 { 619 mode = gcForceMode 620 } else if debug.gcstoptheworld == 2 { 621 mode = gcForceBlockMode 622 } 623 624 // Ok, we're doing it! Stop everybody else 625 semacquire(&gcsema) 626 semacquire(&worldsema) 627 628 // For stats, check if this GC was forced by the user. 629 // Update it under gcsema to avoid gctrace getting wrong values. 630 work.userForced = trigger.kind == gcTriggerCycle 631 632 if traceEnabled() { 633 traceGCStart() 634 } 635 636 // Check that all Ps have finished deferred mcache flushes. 637 for _, p := range allp { 638 if fg := p.mcache.flushGen.Load(); fg != mheap_.sweepgen { 639 println("runtime: p", p.id, "flushGen", fg, "!= sweepgen", mheap_.sweepgen) 640 throw("p mcache not flushed") 641 } 642 } 643 644 gcBgMarkStartWorkers() 645 646 systemstack(gcResetMarkState) 647 648 work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs 649 if work.stwprocs > ncpu { 650 // This is used to compute CPU time of the STW phases, 651 // so it can't be more than ncpu, even if GOMAXPROCS is. 652 work.stwprocs = ncpu 653 } 654 work.heap0 = gcController.heapLive.Load() 655 work.pauseNS = 0 656 work.mode = mode 657 658 now := nanotime() 659 work.tSweepTerm = now 660 work.pauseStart = now 661 if traceEnabled() { 662 traceGCSTWStart(1) 663 } 664 systemstack(stopTheWorldWithSema) 665 // Finish sweep before we start concurrent scan. 666 systemstack(func() { 667 finishsweep_m() 668 }) 669 670 // clearpools before we start the GC. If we wait they memory will not be 671 // reclaimed until the next GC cycle. 672 clearpools() 673 674 work.cycles.Add(1) 675 676 // Assists and workers can start the moment we start 677 // the world. 678 gcController.startCycle(now, int(gomaxprocs), trigger) 679 680 // Notify the CPU limiter that assists may begin. 681 gcCPULimiter.startGCTransition(true, now) 682 683 // In STW mode, disable scheduling of user Gs. This may also 684 // disable scheduling of this goroutine, so it may block as 685 // soon as we start the world again. 686 if mode != gcBackgroundMode { 687 schedEnableUser(false) 688 } 689 690 // Enter concurrent mark phase and enable 691 // write barriers. 692 // 693 // Because the world is stopped, all Ps will 694 // observe that write barriers are enabled by 695 // the time we start the world and begin 696 // scanning. 697 // 698 // Write barriers must be enabled before assists are 699 // enabled because they must be enabled before 700 // any non-leaf heap objects are marked. Since 701 // allocations are blocked until assists can 702 // happen, we want enable assists as early as 703 // possible. 704 setGCPhase(_GCmark) 705 706 gcBgMarkPrepare() // Must happen before assist enable. 707 gcMarkRootPrepare() 708 709 // Mark all active tinyalloc blocks. Since we're 710 // allocating from these, they need to be black like 711 // other allocations. The alternative is to blacken 712 // the tiny block on every allocation from it, which 713 // would slow down the tiny allocator. 714 gcMarkTinyAllocs() 715 716 // At this point all Ps have enabled the write 717 // barrier, thus maintaining the no white to 718 // black invariant. Enable mutator assists to 719 // put back-pressure on fast allocating 720 // mutators. 721 atomic.Store(&gcBlackenEnabled, 1) 722 723 // In STW mode, we could block the instant systemstack 724 // returns, so make sure we're not preemptible. 725 mp = acquirem() 726 727 // Concurrent mark. 728 systemstack(func() { 729 now = startTheWorldWithSema(traceEnabled()) 730 work.pauseNS += now - work.pauseStart 731 work.tMark = now 732 memstats.gcPauseDist.record(now - work.pauseStart) 733 734 // Release the CPU limiter. 735 gcCPULimiter.finishGCTransition(now) 736 }) 737 738 // Release the world sema before Gosched() in STW mode 739 // because we will need to reacquire it later but before 740 // this goroutine becomes runnable again, and we could 741 // self-deadlock otherwise. 742 semrelease(&worldsema) 743 releasem(mp) 744 745 // Make sure we block instead of returning to user code 746 // in STW mode. 747 if mode != gcBackgroundMode { 748 Gosched() 749 } 750 751 semrelease(&work.startSema) 752 } 753 754 // gcMarkDoneFlushed counts the number of P's with flushed work. 755 // 756 // Ideally this would be a captured local in gcMarkDone, but forEachP 757 // escapes its callback closure, so it can't capture anything. 758 // 759 // This is protected by markDoneSema. 760 var gcMarkDoneFlushed uint32 761 762 // gcMarkDone transitions the GC from mark to mark termination if all 763 // reachable objects have been marked (that is, there are no grey 764 // objects and can be no more in the future). Otherwise, it flushes 765 // all local work to the global queues where it can be discovered by 766 // other workers. 767 // 768 // This should be called when all local mark work has been drained and 769 // there are no remaining workers. Specifically, when 770 // 771 // work.nwait == work.nproc && !gcMarkWorkAvailable(p) 772 // 773 // The calling context must be preemptible. 774 // 775 // Flushing local work is important because idle Ps may have local 776 // work queued. This is the only way to make that work visible and 777 // drive GC to completion. 778 // 779 // It is explicitly okay to have write barriers in this function. If 780 // it does transition to mark termination, then all reachable objects 781 // have been marked, so the write barrier cannot shade any more 782 // objects. 783 func gcMarkDone() { 784 // Ensure only one thread is running the ragged barrier at a 785 // time. 786 semacquire(&work.markDoneSema) 787 788 top: 789 // Re-check transition condition under transition lock. 790 // 791 // It's critical that this checks the global work queues are 792 // empty before performing the ragged barrier. Otherwise, 793 // there could be global work that a P could take after the P 794 // has passed the ragged barrier. 795 if !(gcphase == _GCmark && work.nwait == work.nproc && !gcMarkWorkAvailable(nil)) { 796 semrelease(&work.markDoneSema) 797 return 798 } 799 800 // forEachP needs worldsema to execute, and we'll need it to 801 // stop the world later, so acquire worldsema now. 802 semacquire(&worldsema) 803 804 // Flush all local buffers and collect flushedWork flags. 805 gcMarkDoneFlushed = 0 806 systemstack(func() { 807 gp := getg().m.curg 808 // Mark the user stack as preemptible so that it may be scanned. 809 // Otherwise, our attempt to force all P's to a safepoint could 810 // result in a deadlock as we attempt to preempt a worker that's 811 // trying to preempt us (e.g. for a stack scan). 812 casGToWaiting(gp, _Grunning, waitReasonGCMarkTermination) 813 forEachP(func(pp *p) { 814 // Flush the write barrier buffer, since this may add 815 // work to the gcWork. 816 wbBufFlush1(pp) 817 818 // Flush the gcWork, since this may create global work 819 // and set the flushedWork flag. 820 // 821 // TODO(austin): Break up these workbufs to 822 // better distribute work. 823 pp.gcw.dispose() 824 // Collect the flushedWork flag. 825 if pp.gcw.flushedWork { 826 atomic.Xadd(&gcMarkDoneFlushed, 1) 827 pp.gcw.flushedWork = false 828 } 829 }) 830 casgstatus(gp, _Gwaiting, _Grunning) 831 }) 832 833 if gcMarkDoneFlushed != 0 { 834 // More grey objects were discovered since the 835 // previous termination check, so there may be more 836 // work to do. Keep going. It's possible the 837 // transition condition became true again during the 838 // ragged barrier, so re-check it. 839 semrelease(&worldsema) 840 goto top 841 } 842 843 // There was no global work, no local work, and no Ps 844 // communicated work since we took markDoneSema. Therefore 845 // there are no grey objects and no more objects can be 846 // shaded. Transition to mark termination. 847 now := nanotime() 848 work.tMarkTerm = now 849 work.pauseStart = now 850 getg().m.preemptoff = "gcing" 851 if traceEnabled() { 852 traceGCSTWStart(0) 853 } 854 systemstack(stopTheWorldWithSema) 855 // The gcphase is _GCmark, it will transition to _GCmarktermination 856 // below. The important thing is that the wb remains active until 857 // all marking is complete. This includes writes made by the GC. 858 859 // There is sometimes work left over when we enter mark termination due 860 // to write barriers performed after the completion barrier above. 861 // Detect this and resume concurrent mark. This is obviously 862 // unfortunate. 863 // 864 // See issue #27993 for details. 865 // 866 // Switch to the system stack to call wbBufFlush1, though in this case 867 // it doesn't matter because we're non-preemptible anyway. 868 restart := false 869 systemstack(func() { 870 for _, p := range allp { 871 wbBufFlush1(p) 872 if !p.gcw.empty() { 873 restart = true 874 break 875 } 876 } 877 }) 878 if restart { 879 getg().m.preemptoff = "" 880 systemstack(func() { 881 now := startTheWorldWithSema(traceEnabled()) 882 work.pauseNS += now - work.pauseStart 883 memstats.gcPauseDist.record(now - work.pauseStart) 884 }) 885 semrelease(&worldsema) 886 goto top 887 } 888 889 gcComputeStartingStackSize() 890 891 // Disable assists and background workers. We must do 892 // this before waking blocked assists. 893 atomic.Store(&gcBlackenEnabled, 0) 894 895 // Notify the CPU limiter that GC assists will now cease. 896 gcCPULimiter.startGCTransition(false, now) 897 898 // Wake all blocked assists. These will run when we 899 // start the world again. 900 gcWakeAllAssists() 901 902 // Likewise, release the transition lock. Blocked 903 // workers and assists will run when we start the 904 // world again. 905 semrelease(&work.markDoneSema) 906 907 // In STW mode, re-enable user goroutines. These will be 908 // queued to run after we start the world. 909 schedEnableUser(true) 910 911 // endCycle depends on all gcWork cache stats being flushed. 912 // The termination algorithm above ensured that up to 913 // allocations since the ragged barrier. 914 gcController.endCycle(now, int(gomaxprocs), work.userForced) 915 916 // Perform mark termination. This will restart the world. 917 gcMarkTermination() 918 } 919 920 // World must be stopped and mark assists and background workers must be 921 // disabled. 922 func gcMarkTermination() { 923 // Start marktermination (write barrier remains enabled for now). 924 setGCPhase(_GCmarktermination) 925 926 work.heap1 = gcController.heapLive.Load() 927 startTime := nanotime() 928 929 mp := acquirem() 930 mp.preemptoff = "gcing" 931 mp.traceback = 2 932 curgp := mp.curg 933 casGToWaiting(curgp, _Grunning, waitReasonGarbageCollection) 934 935 // Run gc on the g0 stack. We do this so that the g stack 936 // we're currently running on will no longer change. Cuts 937 // the root set down a bit (g0 stacks are not scanned, and 938 // we don't need to scan gc's internal state). We also 939 // need to switch to g0 so we can shrink the stack. 940 systemstack(func() { 941 gcMark(startTime) 942 // Must return immediately. 943 // The outer function's stack may have moved 944 // during gcMark (it shrinks stacks, including the 945 // outer function's stack), so we must not refer 946 // to any of its variables. Return back to the 947 // non-system stack to pick up the new addresses 948 // before continuing. 949 }) 950 951 systemstack(func() { 952 work.heap2 = work.bytesMarked 953 if debug.gccheckmark > 0 { 954 // Run a full non-parallel, stop-the-world 955 // mark using checkmark bits, to check that we 956 // didn't forget to mark anything during the 957 // concurrent mark process. 958 startCheckmarks() 959 gcResetMarkState() 960 gcw := &getg().m.p.ptr().gcw 961 gcDrain(gcw, 0) 962 wbBufFlush1(getg().m.p.ptr()) 963 gcw.dispose() 964 endCheckmarks() 965 } 966 967 // marking is complete so we can turn the write barrier off 968 setGCPhase(_GCoff) 969 gcSweep(work.mode) 970 }) 971 972 mp.traceback = 0 973 casgstatus(curgp, _Gwaiting, _Grunning) 974 975 if traceEnabled() { 976 traceGCDone() 977 } 978 979 // all done 980 mp.preemptoff = "" 981 982 if gcphase != _GCoff { 983 throw("gc done but gcphase != _GCoff") 984 } 985 986 // Record heapInUse for scavenger. 987 memstats.lastHeapInUse = gcController.heapInUse.load() 988 989 // Update GC trigger and pacing, as well as downstream consumers 990 // of this pacing information, for the next cycle. 991 systemstack(gcControllerCommit) 992 993 // Update timing memstats 994 now := nanotime() 995 sec, nsec, _ := time_now() 996 unixNow := sec*1e9 + int64(nsec) 997 work.pauseNS += now - work.pauseStart 998 work.tEnd = now 999 memstats.gcPauseDist.record(now - work.pauseStart) 1000 atomic.Store64(&memstats.last_gc_unix, uint64(unixNow)) // must be Unix time to make sense to user 1001 atomic.Store64(&memstats.last_gc_nanotime, uint64(now)) // monotonic time for us 1002 memstats.pause_ns[memstats.numgc%uint32(len(memstats.pause_ns))] = uint64(work.pauseNS) 1003 memstats.pause_end[memstats.numgc%uint32(len(memstats.pause_end))] = uint64(unixNow) 1004 memstats.pause_total_ns += uint64(work.pauseNS) 1005 1006 sweepTermCpu := int64(work.stwprocs) * (work.tMark - work.tSweepTerm) 1007 // We report idle marking time below, but omit it from the 1008 // overall utilization here since it's "free". 1009 markAssistCpu := gcController.assistTime.Load() 1010 markDedicatedCpu := gcController.dedicatedMarkTime.Load() 1011 markFractionalCpu := gcController.fractionalMarkTime.Load() 1012 markIdleCpu := gcController.idleMarkTime.Load() 1013 markTermCpu := int64(work.stwprocs) * (work.tEnd - work.tMarkTerm) 1014 scavAssistCpu := scavenge.assistTime.Load() 1015 scavBgCpu := scavenge.backgroundTime.Load() 1016 1017 // Update cumulative GC CPU stats. 1018 work.cpuStats.gcAssistTime += markAssistCpu 1019 work.cpuStats.gcDedicatedTime += markDedicatedCpu + markFractionalCpu 1020 work.cpuStats.gcIdleTime += markIdleCpu 1021 work.cpuStats.gcPauseTime += sweepTermCpu + markTermCpu 1022 work.cpuStats.gcTotalTime += sweepTermCpu + markAssistCpu + markDedicatedCpu + markFractionalCpu + markIdleCpu + markTermCpu 1023 1024 // Update cumulative scavenge CPU stats. 1025 work.cpuStats.scavengeAssistTime += scavAssistCpu 1026 work.cpuStats.scavengeBgTime += scavBgCpu 1027 work.cpuStats.scavengeTotalTime += scavAssistCpu + scavBgCpu 1028 1029 // Update total CPU. 1030 work.cpuStats.totalTime = sched.totaltime + (now-sched.procresizetime)*int64(gomaxprocs) 1031 work.cpuStats.idleTime += sched.idleTime.Load() 1032 1033 // Compute userTime. We compute this indirectly as everything that's not the above. 1034 // 1035 // Since time spent in _Pgcstop is covered by gcPauseTime, and time spent in _Pidle 1036 // is covered by idleTime, what we're left with is time spent in _Prunning and _Psyscall, 1037 // the latter of which is fine because the P will either go idle or get used for something 1038 // else via sysmon. Meanwhile if we subtract GC time from whatever's left, we get non-GC 1039 // _Prunning time. Note that this still leaves time spent in sweeping and in the scheduler, 1040 // but that's fine. The overwhelming majority of this time will be actual user time. 1041 work.cpuStats.userTime = work.cpuStats.totalTime - (work.cpuStats.gcTotalTime + 1042 work.cpuStats.scavengeTotalTime + work.cpuStats.idleTime) 1043 1044 // Compute overall GC CPU utilization. 1045 // Omit idle marking time from the overall utilization here since it's "free". 1046 memstats.gc_cpu_fraction = float64(work.cpuStats.gcTotalTime-work.cpuStats.gcIdleTime) / float64(work.cpuStats.totalTime) 1047 1048 // Reset assist time and background time stats. 1049 // 1050 // Do this now, instead of at the start of the next GC cycle, because 1051 // these two may keep accumulating even if the GC is not active. 1052 scavenge.assistTime.Store(0) 1053 scavenge.backgroundTime.Store(0) 1054 1055 // Reset idle time stat. 1056 sched.idleTime.Store(0) 1057 1058 // Reset sweep state. 1059 sweep.nbgsweep = 0 1060 sweep.npausesweep = 0 1061 1062 if work.userForced { 1063 memstats.numforcedgc++ 1064 } 1065 1066 // Bump GC cycle count and wake goroutines waiting on sweep. 1067 lock(&work.sweepWaiters.lock) 1068 memstats.numgc++ 1069 injectglist(&work.sweepWaiters.list) 1070 unlock(&work.sweepWaiters.lock) 1071 1072 // Increment the scavenge generation now. 1073 // 1074 // This moment represents peak heap in use because we're 1075 // about to start sweeping. 1076 mheap_.pages.scav.index.nextGen() 1077 1078 // Release the CPU limiter. 1079 gcCPULimiter.finishGCTransition(now) 1080 1081 // Finish the current heap profiling cycle and start a new 1082 // heap profiling cycle. We do this before starting the world 1083 // so events don't leak into the wrong cycle. 1084 mProf_NextCycle() 1085 1086 // There may be stale spans in mcaches that need to be swept. 1087 // Those aren't tracked in any sweep lists, so we need to 1088 // count them against sweep completion until we ensure all 1089 // those spans have been forced out. 1090 sl := sweep.active.begin() 1091 if !sl.valid { 1092 throw("failed to set sweep barrier") 1093 } 1094 1095 systemstack(func() { startTheWorldWithSema(traceEnabled()) }) 1096 1097 // Flush the heap profile so we can start a new cycle next GC. 1098 // This is relatively expensive, so we don't do it with the 1099 // world stopped. 1100 mProf_Flush() 1101 1102 // Prepare workbufs for freeing by the sweeper. We do this 1103 // asynchronously because it can take non-trivial time. 1104 prepareFreeWorkbufs() 1105 1106 // Free stack spans. This must be done between GC cycles. 1107 systemstack(freeStackSpans) 1108 1109 // Ensure all mcaches are flushed. Each P will flush its own 1110 // mcache before allocating, but idle Ps may not. Since this 1111 // is necessary to sweep all spans, we need to ensure all 1112 // mcaches are flushed before we start the next GC cycle. 1113 // 1114 // While we're here, flush the page cache for idle Ps to avoid 1115 // having pages get stuck on them. These pages are hidden from 1116 // the scavenger, so in small idle heaps a significant amount 1117 // of additional memory might be held onto. 1118 systemstack(func() { 1119 forEachP(func(pp *p) { 1120 pp.mcache.prepareForSweep() 1121 if pp.status == _Pidle { 1122 systemstack(func() { 1123 lock(&mheap_.lock) 1124 pp.pcache.flush(&mheap_.pages) 1125 unlock(&mheap_.lock) 1126 }) 1127 } 1128 }) 1129 }) 1130 // Now that we've swept stale spans in mcaches, they don't 1131 // count against unswept spans. 1132 sweep.active.end(sl) 1133 1134 // Print gctrace before dropping worldsema. As soon as we drop 1135 // worldsema another cycle could start and smash the stats 1136 // we're trying to print. 1137 if debug.gctrace > 0 { 1138 util := int(memstats.gc_cpu_fraction * 100) 1139 1140 var sbuf [24]byte 1141 printlock() 1142 print("gc ", memstats.numgc, 1143 " @", string(itoaDiv(sbuf[:], uint64(work.tSweepTerm-runtimeInitTime)/1e6, 3)), "s ", 1144 util, "%: ") 1145 prev := work.tSweepTerm 1146 for i, ns := range []int64{work.tMark, work.tMarkTerm, work.tEnd} { 1147 if i != 0 { 1148 print("+") 1149 } 1150 print(string(fmtNSAsMS(sbuf[:], uint64(ns-prev)))) 1151 prev = ns 1152 } 1153 print(" ms clock, ") 1154 for i, ns := range []int64{ 1155 sweepTermCpu, 1156 gcController.assistTime.Load(), 1157 gcController.dedicatedMarkTime.Load() + gcController.fractionalMarkTime.Load(), 1158 gcController.idleMarkTime.Load(), 1159 markTermCpu, 1160 } { 1161 if i == 2 || i == 3 { 1162 // Separate mark time components with /. 1163 print("/") 1164 } else if i != 0 { 1165 print("+") 1166 } 1167 print(string(fmtNSAsMS(sbuf[:], uint64(ns)))) 1168 } 1169 print(" ms cpu, ", 1170 work.heap0>>20, "->", work.heap1>>20, "->", work.heap2>>20, " MB, ", 1171 gcController.lastHeapGoal>>20, " MB goal, ", 1172 gcController.lastStackScan.Load()>>20, " MB stacks, ", 1173 gcController.globalsScan.Load()>>20, " MB globals, ", 1174 work.maxprocs, " P") 1175 if work.userForced { 1176 print(" (forced)") 1177 } 1178 print("\n") 1179 printunlock() 1180 } 1181 1182 // Set any arena chunks that were deferred to fault. 1183 lock(&userArenaState.lock) 1184 faultList := userArenaState.fault 1185 userArenaState.fault = nil 1186 unlock(&userArenaState.lock) 1187 for _, lc := range faultList { 1188 lc.mspan.setUserArenaChunkToFault() 1189 } 1190 1191 // Enable huge pages on some metadata if we cross a heap threshold. 1192 if gcController.heapGoal() > minHeapForMetadataHugePages { 1193 mheap_.enableMetadataHugePages() 1194 } 1195 1196 semrelease(&worldsema) 1197 semrelease(&gcsema) 1198 // Careful: another GC cycle may start now. 1199 1200 releasem(mp) 1201 mp = nil 1202 1203 // now that gc is done, kick off finalizer thread if needed 1204 if !concurrentSweep { 1205 // give the queued finalizers, if any, a chance to run 1206 Gosched() 1207 } 1208 } 1209 1210 // gcBgMarkStartWorkers prepares background mark worker goroutines. These 1211 // goroutines will not run until the mark phase, but they must be started while 1212 // the work is not stopped and from a regular G stack. The caller must hold 1213 // worldsema. 1214 func gcBgMarkStartWorkers() { 1215 // Background marking is performed by per-P G's. Ensure that each P has 1216 // a background GC G. 1217 // 1218 // Worker Gs don't exit if gomaxprocs is reduced. If it is raised 1219 // again, we can reuse the old workers; no need to create new workers. 1220 for gcBgMarkWorkerCount < gomaxprocs { 1221 go gcBgMarkWorker() 1222 1223 notetsleepg(&work.bgMarkReady, -1) 1224 noteclear(&work.bgMarkReady) 1225 // The worker is now guaranteed to be added to the pool before 1226 // its P's next findRunnableGCWorker. 1227 1228 gcBgMarkWorkerCount++ 1229 } 1230 } 1231 1232 // gcBgMarkPrepare sets up state for background marking. 1233 // Mutator assists must not yet be enabled. 1234 func gcBgMarkPrepare() { 1235 // Background marking will stop when the work queues are empty 1236 // and there are no more workers (note that, since this is 1237 // concurrent, this may be a transient state, but mark 1238 // termination will clean it up). Between background workers 1239 // and assists, we don't really know how many workers there 1240 // will be, so we pretend to have an arbitrarily large number 1241 // of workers, almost all of which are "waiting". While a 1242 // worker is working it decrements nwait. If nproc == nwait, 1243 // there are no workers. 1244 work.nproc = ^uint32(0) 1245 work.nwait = ^uint32(0) 1246 } 1247 1248 // gcBgMarkWorkerNode is an entry in the gcBgMarkWorkerPool. It points to a single 1249 // gcBgMarkWorker goroutine. 1250 type gcBgMarkWorkerNode struct { 1251 // Unused workers are managed in a lock-free stack. This field must be first. 1252 node lfnode 1253 1254 // The g of this worker. 1255 gp guintptr 1256 1257 // Release this m on park. This is used to communicate with the unlock 1258 // function, which cannot access the G's stack. It is unused outside of 1259 // gcBgMarkWorker(). 1260 m muintptr 1261 } 1262 1263 func gcBgMarkWorker() { 1264 gp := getg() 1265 1266 // We pass node to a gopark unlock function, so it can't be on 1267 // the stack (see gopark). Prevent deadlock from recursively 1268 // starting GC by disabling preemption. 1269 gp.m.preemptoff = "GC worker init" 1270 node := new(gcBgMarkWorkerNode) 1271 gp.m.preemptoff = "" 1272 1273 node.gp.set(gp) 1274 1275 node.m.set(acquirem()) 1276 notewakeup(&work.bgMarkReady) 1277 // After this point, the background mark worker is generally scheduled 1278 // cooperatively by gcController.findRunnableGCWorker. While performing 1279 // work on the P, preemption is disabled because we are working on 1280 // P-local work buffers. When the preempt flag is set, this puts itself 1281 // into _Gwaiting to be woken up by gcController.findRunnableGCWorker 1282 // at the appropriate time. 1283 // 1284 // When preemption is enabled (e.g., while in gcMarkDone), this worker 1285 // may be preempted and schedule as a _Grunnable G from a runq. That is 1286 // fine; it will eventually gopark again for further scheduling via 1287 // findRunnableGCWorker. 1288 // 1289 // Since we disable preemption before notifying bgMarkReady, we 1290 // guarantee that this G will be in the worker pool for the next 1291 // findRunnableGCWorker. This isn't strictly necessary, but it reduces 1292 // latency between _GCmark starting and the workers starting. 1293 1294 for { 1295 // Go to sleep until woken by 1296 // gcController.findRunnableGCWorker. 1297 gopark(func(g *g, nodep unsafe.Pointer) bool { 1298 node := (*gcBgMarkWorkerNode)(nodep) 1299 1300 if mp := node.m.ptr(); mp != nil { 1301 // The worker G is no longer running; release 1302 // the M. 1303 // 1304 // N.B. it is _safe_ to release the M as soon 1305 // as we are no longer performing P-local mark 1306 // work. 1307 // 1308 // However, since we cooperatively stop work 1309 // when gp.preempt is set, if we releasem in 1310 // the loop then the following call to gopark 1311 // would immediately preempt the G. This is 1312 // also safe, but inefficient: the G must 1313 // schedule again only to enter gopark and park 1314 // again. Thus, we defer the release until 1315 // after parking the G. 1316 releasem(mp) 1317 } 1318 1319 // Release this G to the pool. 1320 gcBgMarkWorkerPool.push(&node.node) 1321 // Note that at this point, the G may immediately be 1322 // rescheduled and may be running. 1323 return true 1324 }, unsafe.Pointer(node), waitReasonGCWorkerIdle, traceEvGoBlock, 0) 1325 1326 // Preemption must not occur here, or another G might see 1327 // p.gcMarkWorkerMode. 1328 1329 // Disable preemption so we can use the gcw. If the 1330 // scheduler wants to preempt us, we'll stop draining, 1331 // dispose the gcw, and then preempt. 1332 node.m.set(acquirem()) 1333 pp := gp.m.p.ptr() // P can't change with preemption disabled. 1334 1335 if gcBlackenEnabled == 0 { 1336 println("worker mode", pp.gcMarkWorkerMode) 1337 throw("gcBgMarkWorker: blackening not enabled") 1338 } 1339 1340 if pp.gcMarkWorkerMode == gcMarkWorkerNotWorker { 1341 throw("gcBgMarkWorker: mode not set") 1342 } 1343 1344 startTime := nanotime() 1345 pp.gcMarkWorkerStartTime = startTime 1346 var trackLimiterEvent bool 1347 if pp.gcMarkWorkerMode == gcMarkWorkerIdleMode { 1348 trackLimiterEvent = pp.limiterEvent.start(limiterEventIdleMarkWork, startTime) 1349 } 1350 1351 decnwait := atomic.Xadd(&work.nwait, -1) 1352 if decnwait == work.nproc { 1353 println("runtime: work.nwait=", decnwait, "work.nproc=", work.nproc) 1354 throw("work.nwait was > work.nproc") 1355 } 1356 1357 systemstack(func() { 1358 // Mark our goroutine preemptible so its stack 1359 // can be scanned. This lets two mark workers 1360 // scan each other (otherwise, they would 1361 // deadlock). We must not modify anything on 1362 // the G stack. However, stack shrinking is 1363 // disabled for mark workers, so it is safe to 1364 // read from the G stack. 1365 casGToWaiting(gp, _Grunning, waitReasonGCWorkerActive) 1366 switch pp.gcMarkWorkerMode { 1367 default: 1368 throw("gcBgMarkWorker: unexpected gcMarkWorkerMode") 1369 case gcMarkWorkerDedicatedMode: 1370 gcDrain(&pp.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit) 1371 if gp.preempt { 1372 // We were preempted. This is 1373 // a useful signal to kick 1374 // everything out of the run 1375 // queue so it can run 1376 // somewhere else. 1377 if drainQ, n := runqdrain(pp); n > 0 { 1378 lock(&sched.lock) 1379 globrunqputbatch(&drainQ, int32(n)) 1380 unlock(&sched.lock) 1381 } 1382 } 1383 // Go back to draining, this time 1384 // without preemption. 1385 gcDrain(&pp.gcw, gcDrainFlushBgCredit) 1386 case gcMarkWorkerFractionalMode: 1387 gcDrain(&pp.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit) 1388 case gcMarkWorkerIdleMode: 1389 gcDrain(&pp.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit) 1390 } 1391 casgstatus(gp, _Gwaiting, _Grunning) 1392 }) 1393 1394 // Account for time and mark us as stopped. 1395 now := nanotime() 1396 duration := now - startTime 1397 gcController.markWorkerStop(pp.gcMarkWorkerMode, duration) 1398 if trackLimiterEvent { 1399 pp.limiterEvent.stop(limiterEventIdleMarkWork, now) 1400 } 1401 if pp.gcMarkWorkerMode == gcMarkWorkerFractionalMode { 1402 atomic.Xaddint64(&pp.gcFractionalMarkTime, duration) 1403 } 1404 1405 // Was this the last worker and did we run out 1406 // of work? 1407 incnwait := atomic.Xadd(&work.nwait, +1) 1408 if incnwait > work.nproc { 1409 println("runtime: p.gcMarkWorkerMode=", pp.gcMarkWorkerMode, 1410 "work.nwait=", incnwait, "work.nproc=", work.nproc) 1411 throw("work.nwait > work.nproc") 1412 } 1413 1414 // We'll releasem after this point and thus this P may run 1415 // something else. We must clear the worker mode to avoid 1416 // attributing the mode to a different (non-worker) G in 1417 // traceGoStart. 1418 pp.gcMarkWorkerMode = gcMarkWorkerNotWorker 1419 1420 // If this worker reached a background mark completion 1421 // point, signal the main GC goroutine. 1422 if incnwait == work.nproc && !gcMarkWorkAvailable(nil) { 1423 // We don't need the P-local buffers here, allow 1424 // preemption because we may schedule like a regular 1425 // goroutine in gcMarkDone (block on locks, etc). 1426 releasem(node.m.ptr()) 1427 node.m.set(nil) 1428 1429 gcMarkDone() 1430 } 1431 } 1432 } 1433 1434 // gcMarkWorkAvailable reports whether executing a mark worker 1435 // on p is potentially useful. p may be nil, in which case it only 1436 // checks the global sources of work. 1437 func gcMarkWorkAvailable(p *p) bool { 1438 if p != nil && !p.gcw.empty() { 1439 return true 1440 } 1441 if !work.full.empty() { 1442 return true // global work available 1443 } 1444 if work.markrootNext < work.markrootJobs { 1445 return true // root scan work available 1446 } 1447 return false 1448 } 1449 1450 // gcMark runs the mark (or, for concurrent GC, mark termination) 1451 // All gcWork caches must be empty. 1452 // STW is in effect at this point. 1453 func gcMark(startTime int64) { 1454 if debug.allocfreetrace > 0 { 1455 tracegc() 1456 } 1457 1458 if gcphase != _GCmarktermination { 1459 throw("in gcMark expecting to see gcphase as _GCmarktermination") 1460 } 1461 work.tstart = startTime 1462 1463 // Check that there's no marking work remaining. 1464 if work.full != 0 || work.markrootNext < work.markrootJobs { 1465 print("runtime: full=", hex(work.full), " next=", work.markrootNext, " jobs=", work.markrootJobs, " nDataRoots=", work.nDataRoots, " nBSSRoots=", work.nBSSRoots, " nSpanRoots=", work.nSpanRoots, " nStackRoots=", work.nStackRoots, "\n") 1466 panic("non-empty mark queue after concurrent mark") 1467 } 1468 1469 if debug.gccheckmark > 0 { 1470 // This is expensive when there's a large number of 1471 // Gs, so only do it if checkmark is also enabled. 1472 gcMarkRootCheck() 1473 } 1474 1475 // Drop allg snapshot. allgs may have grown, in which case 1476 // this is the only reference to the old backing store and 1477 // there's no need to keep it around. 1478 work.stackRoots = nil 1479 1480 // Clear out buffers and double-check that all gcWork caches 1481 // are empty. This should be ensured by gcMarkDone before we 1482 // enter mark termination. 1483 // 1484 // TODO: We could clear out buffers just before mark if this 1485 // has a non-negligible impact on STW time. 1486 for _, p := range allp { 1487 // The write barrier may have buffered pointers since 1488 // the gcMarkDone barrier. However, since the barrier 1489 // ensured all reachable objects were marked, all of 1490 // these must be pointers to black objects. Hence we 1491 // can just discard the write barrier buffer. 1492 if debug.gccheckmark > 0 { 1493 // For debugging, flush the buffer and make 1494 // sure it really was all marked. 1495 wbBufFlush1(p) 1496 } else { 1497 p.wbBuf.reset() 1498 } 1499 1500 gcw := &p.gcw 1501 if !gcw.empty() { 1502 printlock() 1503 print("runtime: P ", p.id, " flushedWork ", gcw.flushedWork) 1504 if gcw.wbuf1 == nil { 1505 print(" wbuf1=<nil>") 1506 } else { 1507 print(" wbuf1.n=", gcw.wbuf1.nobj) 1508 } 1509 if gcw.wbuf2 == nil { 1510 print(" wbuf2=<nil>") 1511 } else { 1512 print(" wbuf2.n=", gcw.wbuf2.nobj) 1513 } 1514 print("\n") 1515 throw("P has cached GC work at end of mark termination") 1516 } 1517 // There may still be cached empty buffers, which we 1518 // need to flush since we're going to free them. Also, 1519 // there may be non-zero stats because we allocated 1520 // black after the gcMarkDone barrier. 1521 gcw.dispose() 1522 } 1523 1524 // Flush scanAlloc from each mcache since we're about to modify 1525 // heapScan directly. If we were to flush this later, then scanAlloc 1526 // might have incorrect information. 1527 // 1528 // Note that it's not important to retain this information; we know 1529 // exactly what heapScan is at this point via scanWork. 1530 for _, p := range allp { 1531 c := p.mcache 1532 if c == nil { 1533 continue 1534 } 1535 c.scanAlloc = 0 1536 } 1537 1538 // Reset controller state. 1539 gcController.resetLive(work.bytesMarked) 1540 } 1541 1542 // gcSweep must be called on the system stack because it acquires the heap 1543 // lock. See mheap for details. 1544 // 1545 // The world must be stopped. 1546 // 1547 //go:systemstack 1548 func gcSweep(mode gcMode) { 1549 assertWorldStopped() 1550 1551 if gcphase != _GCoff { 1552 throw("gcSweep being done but phase is not GCoff") 1553 } 1554 1555 lock(&mheap_.lock) 1556 mheap_.sweepgen += 2 1557 sweep.active.reset() 1558 mheap_.pagesSwept.Store(0) 1559 mheap_.sweepArenas = mheap_.allArenas 1560 mheap_.reclaimIndex.Store(0) 1561 mheap_.reclaimCredit.Store(0) 1562 unlock(&mheap_.lock) 1563 1564 sweep.centralIndex.clear() 1565 1566 if !_ConcurrentSweep || mode == gcForceBlockMode { 1567 // Special case synchronous sweep. 1568 // Record that no proportional sweeping has to happen. 1569 lock(&mheap_.lock) 1570 mheap_.sweepPagesPerByte = 0 1571 unlock(&mheap_.lock) 1572 // Sweep all spans eagerly. 1573 for sweepone() != ^uintptr(0) { 1574 sweep.npausesweep++ 1575 } 1576 // Free workbufs eagerly. 1577 prepareFreeWorkbufs() 1578 for freeSomeWbufs(false) { 1579 } 1580 // All "free" events for this mark/sweep cycle have 1581 // now happened, so we can make this profile cycle 1582 // available immediately. 1583 mProf_NextCycle() 1584 mProf_Flush() 1585 return 1586 } 1587 1588 // Background sweep. 1589 lock(&sweep.lock) 1590 if sweep.parked { 1591 sweep.parked = false 1592 ready(sweep.g, 0, true) 1593 } 1594 unlock(&sweep.lock) 1595 } 1596 1597 // gcResetMarkState resets global state prior to marking (concurrent 1598 // or STW) and resets the stack scan state of all Gs. 1599 // 1600 // This is safe to do without the world stopped because any Gs created 1601 // during or after this will start out in the reset state. 1602 // 1603 // gcResetMarkState must be called on the system stack because it acquires 1604 // the heap lock. See mheap for details. 1605 // 1606 //go:systemstack 1607 func gcResetMarkState() { 1608 // This may be called during a concurrent phase, so lock to make sure 1609 // allgs doesn't change. 1610 forEachG(func(gp *g) { 1611 gp.gcscandone = false // set to true in gcphasework 1612 gp.gcAssistBytes = 0 1613 }) 1614 1615 // Clear page marks. This is just 1MB per 64GB of heap, so the 1616 // time here is pretty trivial. 1617 lock(&mheap_.lock) 1618 arenas := mheap_.allArenas 1619 unlock(&mheap_.lock) 1620 for _, ai := range arenas { 1621 ha := mheap_.arenas[ai.l1()][ai.l2()] 1622 for i := range ha.pageMarks { 1623 ha.pageMarks[i] = 0 1624 } 1625 } 1626 1627 work.bytesMarked = 0 1628 work.initialHeapLive = gcController.heapLive.Load() 1629 } 1630 1631 // Hooks for other packages 1632 1633 var poolcleanup func() 1634 var boringCaches []unsafe.Pointer // for crypto/internal/boring 1635 1636 //go:linkname sync_runtime_registerPoolCleanup sync.runtime_registerPoolCleanup 1637 func sync_runtime_registerPoolCleanup(f func()) { 1638 poolcleanup = f 1639 } 1640 1641 //go:linkname boring_registerCache crypto/internal/boring/bcache.registerCache 1642 func boring_registerCache(p unsafe.Pointer) { 1643 boringCaches = append(boringCaches, p) 1644 } 1645 1646 func clearpools() { 1647 // clear sync.Pools 1648 if poolcleanup != nil { 1649 poolcleanup() 1650 } 1651 1652 // clear boringcrypto caches 1653 for _, p := range boringCaches { 1654 atomicstorep(p, nil) 1655 } 1656 1657 // Clear central sudog cache. 1658 // Leave per-P caches alone, they have strictly bounded size. 1659 // Disconnect cached list before dropping it on the floor, 1660 // so that a dangling ref to one entry does not pin all of them. 1661 lock(&sched.sudoglock) 1662 var sg, sgnext *sudog 1663 for sg = sched.sudogcache; sg != nil; sg = sgnext { 1664 sgnext = sg.next 1665 sg.next = nil 1666 } 1667 sched.sudogcache = nil 1668 unlock(&sched.sudoglock) 1669 1670 // Clear central defer pool. 1671 // Leave per-P pools alone, they have strictly bounded size. 1672 lock(&sched.deferlock) 1673 // disconnect cached list before dropping it on the floor, 1674 // so that a dangling ref to one entry does not pin all of them. 1675 var d, dlink *_defer 1676 for d = sched.deferpool; d != nil; d = dlink { 1677 dlink = d.link 1678 d.link = nil 1679 } 1680 sched.deferpool = nil 1681 unlock(&sched.deferlock) 1682 } 1683 1684 // Timing 1685 1686 // itoaDiv formats val/(10**dec) into buf. 1687 func itoaDiv(buf []byte, val uint64, dec int) []byte { 1688 i := len(buf) - 1 1689 idec := i - dec 1690 for val >= 10 || i >= idec { 1691 buf[i] = byte(val%10 + '0') 1692 i-- 1693 if i == idec { 1694 buf[i] = '.' 1695 i-- 1696 } 1697 val /= 10 1698 } 1699 buf[i] = byte(val + '0') 1700 return buf[i:] 1701 } 1702 1703 // fmtNSAsMS nicely formats ns nanoseconds as milliseconds. 1704 func fmtNSAsMS(buf []byte, ns uint64) []byte { 1705 if ns >= 10e6 { 1706 // Format as whole milliseconds. 1707 return itoaDiv(buf, ns/1e6, 0) 1708 } 1709 // Format two digits of precision, with at most three decimal places. 1710 x := ns / 1e3 1711 if x == 0 { 1712 buf[0] = '0' 1713 return buf[:1] 1714 } 1715 dec := 3 1716 for x >= 100 { 1717 x /= 10 1718 dec-- 1719 } 1720 return itoaDiv(buf, x, dec) 1721 } 1722 1723 // Helpers for testing GC. 1724 1725 // gcTestMoveStackOnNextCall causes the stack to be moved on a call 1726 // immediately following the call to this. It may not work correctly 1727 // if any other work appears after this call (such as returning). 1728 // Typically the following call should be marked go:noinline so it 1729 // performs a stack check. 1730 // 1731 // In rare cases this may not cause the stack to move, specifically if 1732 // there's a preemption between this call and the next. 1733 func gcTestMoveStackOnNextCall() { 1734 gp := getg() 1735 gp.stackguard0 = stackForceMove 1736 } 1737 1738 // gcTestIsReachable performs a GC and returns a bit set where bit i 1739 // is set if ptrs[i] is reachable. 1740 func gcTestIsReachable(ptrs ...unsafe.Pointer) (mask uint64) { 1741 // This takes the pointers as unsafe.Pointers in order to keep 1742 // them live long enough for us to attach specials. After 1743 // that, we drop our references to them. 1744 1745 if len(ptrs) > 64 { 1746 panic("too many pointers for uint64 mask") 1747 } 1748 1749 // Block GC while we attach specials and drop our references 1750 // to ptrs. Otherwise, if a GC is in progress, it could mark 1751 // them reachable via this function before we have a chance to 1752 // drop them. 1753 semacquire(&gcsema) 1754 1755 // Create reachability specials for ptrs. 1756 specials := make([]*specialReachable, len(ptrs)) 1757 for i, p := range ptrs { 1758 lock(&mheap_.speciallock) 1759 s := (*specialReachable)(mheap_.specialReachableAlloc.alloc()) 1760 unlock(&mheap_.speciallock) 1761 s.special.kind = _KindSpecialReachable 1762 if !addspecial(p, &s.special) { 1763 throw("already have a reachable special (duplicate pointer?)") 1764 } 1765 specials[i] = s 1766 // Make sure we don't retain ptrs. 1767 ptrs[i] = nil 1768 } 1769 1770 semrelease(&gcsema) 1771 1772 // Force a full GC and sweep. 1773 GC() 1774 1775 // Process specials. 1776 for i, s := range specials { 1777 if !s.done { 1778 printlock() 1779 println("runtime: object", i, "was not swept") 1780 throw("IsReachable failed") 1781 } 1782 if s.reachable { 1783 mask |= 1 << i 1784 } 1785 lock(&mheap_.speciallock) 1786 mheap_.specialReachableAlloc.free(unsafe.Pointer(s)) 1787 unlock(&mheap_.speciallock) 1788 } 1789 1790 return mask 1791 } 1792 1793 // gcTestPointerClass returns the category of what p points to, one of: 1794 // "heap", "stack", "data", "bss", "other". This is useful for checking 1795 // that a test is doing what it's intended to do. 1796 // 1797 // This is nosplit simply to avoid extra pointer shuffling that may 1798 // complicate a test. 1799 // 1800 //go:nosplit 1801 func gcTestPointerClass(p unsafe.Pointer) string { 1802 p2 := uintptr(noescape(p)) 1803 gp := getg() 1804 if gp.stack.lo <= p2 && p2 < gp.stack.hi { 1805 return "stack" 1806 } 1807 if base, _, _ := findObject(p2, 0, 0); base != 0 { 1808 return "heap" 1809 } 1810 for _, datap := range activeModules() { 1811 if datap.data <= p2 && p2 < datap.edata || datap.noptrdata <= p2 && p2 < datap.enoptrdata { 1812 return "data" 1813 } 1814 if datap.bss <= p2 && p2 < datap.ebss || datap.noptrbss <= p2 && p2 <= datap.enoptrbss { 1815 return "bss" 1816 } 1817 } 1818 KeepAlive(p) 1819 return "other" 1820 }