github.com/bir3/gocompiler@v0.9.2202/src/cmd/compile/abi-internal.md (about) 1 # Go internal ABI specification 2 3 Self-link: [go.dev/s/regabi](https://go.dev/s/regabi) 4 5 This document describes Go’s internal application binary interface 6 (ABI), known as ABIInternal. 7 Go's ABI defines the layout of data in memory and the conventions for 8 calling between Go functions. 9 This ABI is *unstable* and will change between Go versions. 10 If you’re writing assembly code, please instead refer to Go’s 11 [assembly documentation](/doc/asm.html), which describes Go’s stable 12 ABI, known as ABI0. 13 14 All functions defined in Go source follow ABIInternal. 15 However, ABIInternal and ABI0 functions are able to call each other 16 through transparent *ABI wrappers*, described in the [internal calling 17 convention proposal](https://golang.org/design/27539-internal-abi). 18 19 Go uses a common ABI design across all architectures. 20 We first describe the common ABI, and then cover per-architecture 21 specifics. 22 23 *Rationale*: For the reasoning behind using a common ABI across 24 architectures instead of the platform ABI, see the [register-based Go 25 calling convention proposal](https://golang.org/design/40724-register-calling). 26 27 ## Memory layout 28 29 Go's built-in types have the following sizes and alignments. 30 Many, though not all, of these sizes are guaranteed by the [language 31 specification](/doc/go_spec.html#Size_and_alignment_guarantees). 32 Those that aren't guaranteed may change in future versions of Go (for 33 example, we've considered changing the alignment of int64 on 32-bit). 34 35 | Type | 64-bit | | 32-bit | | 36 |-----------------------------|--------|-------|--------|-------| 37 | | Size | Align | Size | Align | 38 | bool, uint8, int8 | 1 | 1 | 1 | 1 | 39 | uint16, int16 | 2 | 2 | 2 | 2 | 40 | uint32, int32 | 4 | 4 | 4 | 4 | 41 | uint64, int64 | 8 | 8 | 8 | 4 | 42 | int, uint | 8 | 8 | 4 | 4 | 43 | float32 | 4 | 4 | 4 | 4 | 44 | float64 | 8 | 8 | 8 | 4 | 45 | complex64 | 8 | 4 | 8 | 4 | 46 | complex128 | 16 | 8 | 16 | 4 | 47 | uintptr, *T, unsafe.Pointer | 8 | 8 | 4 | 4 | 48 49 The types `byte` and `rune` are aliases for `uint8` and `int32`, 50 respectively, and hence have the same size and alignment as these 51 types. 52 53 The layout of `map`, `chan`, and `func` types is equivalent to *T. 54 55 To describe the layout of the remaining composite types, we first 56 define the layout of a *sequence* S of N fields with types 57 t<sub>1</sub>, t<sub>2</sub>, ..., t<sub>N</sub>. 58 We define the byte offset at which each field begins relative to a 59 base address of 0, as well as the size and alignment of the sequence 60 as follows: 61 62 ``` 63 offset(S, i) = 0 if i = 1 64 = align(offset(S, i-1) + sizeof(t_(i-1)), alignof(t_i)) 65 alignof(S) = 1 if N = 0 66 = max(alignof(t_i) | 1 <= i <= N) 67 sizeof(S) = 0 if N = 0 68 = align(offset(S, N) + sizeof(t_N), alignof(S)) 69 ``` 70 71 Where sizeof(T) and alignof(T) are the size and alignment of type T, 72 respectively, and align(x, y) rounds x up to a multiple of y. 73 74 The `interface{}` type is a sequence of 1. a pointer to the runtime type 75 description for the interface's dynamic type and 2. an `unsafe.Pointer` 76 data field. 77 Any other interface type (besides the empty interface) is a sequence 78 of 1. a pointer to the runtime "itab" that gives the method pointers and 79 the type of the data field and 2. an `unsafe.Pointer` data field. 80 An interface can be "direct" or "indirect" depending on the dynamic 81 type: a direct interface stores the value directly in the data field, 82 and an indirect interface stores a pointer to the value in the data 83 field. 84 An interface can only be direct if the value consists of a single 85 pointer word. 86 87 An array type `[N]T` is a sequence of N fields of type T. 88 89 The slice type `[]T` is a sequence of a `*[cap]T` pointer to the slice 90 backing store, an `int` giving the `len` of the slice, and an `int` 91 giving the `cap` of the slice. 92 93 The `string` type is a sequence of a `*[len]byte` pointer to the 94 string backing store, and an `int` giving the `len` of the string. 95 96 A struct type `struct { f1 t1; ...; fM tM }` is laid out as the 97 sequence t1, ..., tM, tP, where tP is either: 98 99 - Type `byte` if sizeof(tM) = 0 and any of sizeof(t*i*) ≠ 0. 100 - Empty (size 0 and align 1) otherwise. 101 102 The padding byte prevents creating a past-the-end pointer by taking 103 the address of the final, empty fN field. 104 105 Note that user-written assembly code should generally not depend on Go 106 type layout and should instead use the constants defined in 107 [`go_asm.h`](/doc/asm.html#data-offsets). 108 109 ## Function call argument and result passing 110 111 Function calls pass arguments and results using a combination of the 112 stack and machine registers. 113 Each argument or result is passed either entirely in registers or 114 entirely on the stack. 115 Because access to registers is generally faster than access to the 116 stack, arguments and results are preferentially passed in registers. 117 However, any argument or result that contains a non-trivial array or 118 does not fit entirely in the remaining available registers is passed 119 on the stack. 120 121 Each architecture defines a sequence of integer registers and a 122 sequence of floating-point registers. 123 At a high level, arguments and results are recursively broken down 124 into values of base types and these base values are assigned to 125 registers from these sequences. 126 127 Arguments and results can share the same registers, but do not share 128 the same stack space. 129 Beyond the arguments and results passed on the stack, the caller also 130 reserves spill space on the stack for all register-based arguments 131 (but does not populate this space). 132 133 The receiver, arguments, and results of function or method F are 134 assigned to registers or the stack using the following algorithm: 135 136 1. Let NI and NFP be the length of integer and floating-point register 137 sequences defined by the architecture. 138 Let I and FP be 0; these are the indexes of the next integer and 139 floating-point register. 140 Let S, the type sequence defining the stack frame, be empty. 141 1. If F is a method, assign F’s receiver. 142 1. For each argument A of F, assign A. 143 1. Add a pointer-alignment field to S. This has size 0 and the same 144 alignment as `uintptr`. 145 1. Reset I and FP to 0. 146 1. For each result R of F, assign R. 147 1. Add a pointer-alignment field to S. 148 1. For each register-assigned receiver and argument of F, let T be its 149 type and add T to the stack sequence S. 150 This is the argument's (or receiver's) spill space and will be 151 uninitialized at the call. 152 1. Add a pointer-alignment field to S. 153 154 Assigning a receiver, argument, or result V of underlying type T works 155 as follows: 156 157 1. Remember I and FP. 158 1. If T has zero size, add T to the stack sequence S and return. 159 1. Try to register-assign V. 160 1. If step 3 failed, reset I and FP to the values from step 1, add T 161 to the stack sequence S, and assign V to this field in S. 162 163 Register-assignment of a value V of underlying type T works as follows: 164 165 1. If T is a boolean or integral type that fits in an integer 166 register, assign V to register I and increment I. 167 1. If T is an integral type that fits in two integer registers, assign 168 the least significant and most significant halves of V to registers 169 I and I+1, respectively, and increment I by 2 170 1. If T is a floating-point type and can be represented without loss 171 of precision in a floating-point register, assign V to register FP 172 and increment FP. 173 1. If T is a complex type, recursively register-assign its real and 174 imaginary parts. 175 1. If T is a pointer type, map type, channel type, or function type, 176 assign V to register I and increment I. 177 1. If T is a string type, interface type, or slice type, recursively 178 register-assign V’s components (2 for strings and interfaces, 3 for 179 slices). 180 1. If T is a struct type, recursively register-assign each field of V. 181 1. If T is an array type of length 0, do nothing. 182 1. If T is an array type of length 1, recursively register-assign its 183 one element. 184 1. If T is an array type of length > 1, fail. 185 1. If I > NI or FP > NFP, fail. 186 1. If any recursive assignment above fails, fail. 187 188 The above algorithm produces an assignment of each receiver, argument, 189 and result to registers or to a field in the stack sequence. 190 The final stack sequence looks like: stack-assigned receiver, 191 stack-assigned arguments, pointer-alignment, stack-assigned results, 192 pointer-alignment, spill space for each register-assigned argument, 193 pointer-alignment. 194 The following diagram shows what this stack frame looks like on the 195 stack, using the typical convention where address 0 is at the bottom: 196 197 +------------------------------+ 198 | . . . | 199 | 2nd reg argument spill space | 200 | 1st reg argument spill space | 201 | <pointer-sized alignment> | 202 | . . . | 203 | 2nd stack-assigned result | 204 | 1st stack-assigned result | 205 | <pointer-sized alignment> | 206 | . . . | 207 | 2nd stack-assigned argument | 208 | 1st stack-assigned argument | 209 | stack-assigned receiver | 210 +------------------------------+ ↓ lower addresses 211 212 To perform a call, the caller reserves space starting at the lowest 213 address in its stack frame for the call stack frame, stores arguments 214 in the registers and argument stack fields determined by the above 215 algorithm, and performs the call. 216 At the time of a call, spill space, result stack fields, and result 217 registers are left uninitialized. 218 Upon return, the callee must have stored results to all result 219 registers and result stack fields determined by the above algorithm. 220 221 There are no callee-save registers, so a call may overwrite any 222 register that doesn’t have a fixed meaning, including argument 223 registers. 224 225 ### Example 226 227 Consider the function `func f(a1 uint8, a2 [2]uintptr, a3 uint8) (r1 228 struct { x uintptr; y [2]uintptr }, r2 string)` on a 64-bit 229 architecture with hypothetical integer registers R0–R9. 230 231 On entry, `a1` is assigned to `R0`, `a3` is assigned to `R1` and the 232 stack frame is laid out in the following sequence: 233 234 a2 [2]uintptr 235 r1.x uintptr 236 r1.y [2]uintptr 237 a1Spill uint8 238 a3Spill uint8 239 _ [6]uint8 // alignment padding 240 241 In the stack frame, only the `a2` field is initialized on entry; the 242 rest of the frame is left uninitialized. 243 244 On exit, `r2.base` is assigned to `R0`, `r2.len` is assigned to `R1`, 245 and `r1.x` and `r1.y` are initialized in the stack frame. 246 247 There are several things to note in this example. 248 First, `a2` and `r1` are stack-assigned because they contain arrays. 249 The other arguments and results are register-assigned. 250 Result `r2` is decomposed into its components, which are individually 251 register-assigned. 252 On the stack, the stack-assigned arguments appear at lower addresses 253 than the stack-assigned results, which appear at lower addresses than 254 the argument spill area. 255 Only arguments, not results, are assigned a spill area on the stack. 256 257 ### Rationale 258 259 Each base value is assigned to its own register to optimize 260 construction and access. 261 An alternative would be to pack multiple sub-word values into 262 registers, or to simply map an argument's in-memory layout to 263 registers (this is common in C ABIs), but this typically adds cost to 264 pack and unpack these values. 265 Modern architectures have more than enough registers to pass all 266 arguments and results this way for nearly all functions (see the 267 appendix), so there’s little downside to spreading base values across 268 registers. 269 270 Arguments that can’t be fully assigned to registers are passed 271 entirely on the stack in case the callee takes the address of that 272 argument. 273 If an argument could be split across the stack and registers and the 274 callee took its address, it would need to be reconstructed in memory, 275 a process that would be proportional to the size of the argument. 276 277 Non-trivial arrays are always passed on the stack because indexing 278 into an array typically requires a computed offset, which generally 279 isn’t possible with registers. 280 Arrays in general are rare in function signatures (only 0.7% of 281 functions in the Go 1.15 standard library and 0.2% in kubelet). 282 We considered allowing array fields to be passed on the stack while 283 the rest of an argument’s fields are passed in registers, but this 284 creates the same problems as other large structs if the callee takes 285 the address of an argument, and would benefit <0.1% of functions in 286 kubelet (and even these very little). 287 288 We make exceptions for 0 and 1-element arrays because these don’t 289 require computed offsets, and 1-element arrays are already decomposed 290 in the compiler’s SSA representation. 291 292 The ABI assignment algorithm above is equivalent to Go’s stack-based 293 ABI0 calling convention if there are zero architecture registers. 294 This is intended to ease the transition to the register-based internal 295 ABI and make it easy for the compiler to generate either calling 296 convention. 297 An architecture may still define register meanings that aren’t 298 compatible with ABI0, but these differences should be easy to account 299 for in the compiler. 300 301 The assignment algorithm assigns zero-sized values to the stack 302 (assignment step 2) in order to support ABI0-equivalence. 303 While these values take no space themselves, they do result in 304 alignment padding on the stack in ABI0. 305 Without this step, the internal ABI would register-assign zero-sized 306 values even on architectures that provide no argument registers 307 because they don't consume any registers, and hence not add alignment 308 padding to the stack. 309 310 The algorithm reserves spill space for arguments in the caller’s frame 311 so that the compiler can generate a stack growth path that spills into 312 this reserved space. 313 If the callee has to grow the stack, it may not be able to reserve 314 enough additional stack space in its own frame to spill these, which 315 is why it’s important that the caller do so. 316 These slots also act as the home location if these arguments need to 317 be spilled for any other reason, which simplifies traceback printing. 318 319 There are several options for how to lay out the argument spill space. 320 We chose to lay out each argument according to its type's usual memory 321 layout but to separate the spill space from the regular argument 322 space. 323 Using the usual memory layout simplifies the compiler because it 324 already understands this layout. 325 Also, if a function takes the address of a register-assigned argument, 326 the compiler must spill that argument to memory in its usual memory 327 layout and it's more convenient to use the argument spill space for 328 this purpose. 329 330 Alternatively, the spill space could be structured around argument 331 registers. 332 In this approach, the stack growth spill path would spill each 333 argument register to a register-sized stack word. 334 However, if the function takes the address of a register-assigned 335 argument, the compiler would have to reconstruct it in memory layout 336 elsewhere on the stack. 337 338 The spill space could also be interleaved with the stack-assigned 339 arguments so the arguments appear in order whether they are register- 340 or stack-assigned. 341 This would be close to ABI0, except that register-assigned arguments 342 would be uninitialized on the stack and there's no need to reserve 343 stack space for register-assigned results. 344 We expect separating the spill space to perform better because of 345 memory locality. 346 Separating the space is also potentially simpler for `reflect` calls 347 because this allows `reflect` to summarize the spill space as a single 348 number. 349 Finally, the long-term intent is to remove reserved spill slots 350 entirely – allowing most functions to be called without any stack 351 setup and easing the introduction of callee-save registers – and 352 separating the spill space makes that transition easier. 353 354 ## Closures 355 356 A func value (e.g., `var x func()`) is a pointer to a closure object. 357 A closure object begins with a pointer-sized program counter 358 representing the entry point of the function, followed by zero or more 359 bytes containing the closed-over environment. 360 361 Closure calls follow the same conventions as static function and 362 method calls, with one addition. Each architecture specifies a 363 *closure context pointer* register and calls to closures store the 364 address of the closure object in the closure context pointer register 365 prior to the call. 366 367 ## Software floating-point mode 368 369 In "softfloat" mode, the ABI simply treats the hardware as having zero 370 floating-point registers. 371 As a result, any arguments containing floating-point values will be 372 passed on the stack. 373 374 *Rationale*: Softfloat mode is about compatibility over performance 375 and is not commonly used. 376 Hence, we keep the ABI as simple as possible in this case, rather than 377 adding additional rules for passing floating-point values in integer 378 registers. 379 380 ## Architecture specifics 381 382 This section describes per-architecture register mappings, as well as 383 other per-architecture special cases. 384 385 ### amd64 architecture 386 387 The amd64 architecture uses the following sequence of 9 registers for 388 integer arguments and results: 389 390 RAX, RBX, RCX, RDI, RSI, R8, R9, R10, R11 391 392 It uses X0 – X14 for floating-point arguments and results. 393 394 *Rationale*: These sequences are chosen from the available registers 395 to be relatively easy to remember. 396 397 Registers R12 and R13 are permanent scratch registers. 398 R15 is a scratch register except in dynamically linked binaries. 399 400 *Rationale*: Some operations such as stack growth and reflection calls 401 need dedicated scratch registers in order to manipulate call frames 402 without corrupting arguments or results. 403 404 Special-purpose registers are as follows: 405 406 | Register | Call meaning | Return meaning | Body meaning | 407 | --- | --- | --- | --- | 408 | RSP | Stack pointer | Same | Same | 409 | RBP | Frame pointer | Same | Same | 410 | RDX | Closure context pointer | Scratch | Scratch | 411 | R12 | Scratch | Scratch | Scratch | 412 | R13 | Scratch | Scratch | Scratch | 413 | R14 | Current goroutine | Same | Same | 414 | R15 | GOT reference temporary if dynlink | Same | Same | 415 | X15 | Zero value (*) | Same | Scratch | 416 417 (*) Except on Plan 9, where X15 is a scratch register because SSE 418 registers cannot be used in note handlers (so the compiler avoids 419 using them except when absolutely necessary). 420 421 *Rationale*: These register meanings are compatible with Go’s 422 stack-based calling convention except for R14 and X15, which will have 423 to be restored on transitions from ABI0 code to ABIInternal code. 424 In ABI0, these are undefined, so transitions from ABIInternal to ABI0 425 can ignore these registers. 426 427 *Rationale*: For the current goroutine pointer, we chose a register 428 that requires an additional REX byte. 429 While this adds one byte to every function prologue, it is hardly ever 430 accessed outside the function prologue and we expect making more 431 single-byte registers available to be a net win. 432 433 *Rationale*: We could allow R14 (the current goroutine pointer) to be 434 a scratch register in function bodies because it can always be 435 restored from TLS on amd64. 436 However, we designate it as a fixed register for simplicity and for 437 consistency with other architectures that may not have a copy of the 438 current goroutine pointer in TLS. 439 440 *Rationale*: We designate X15 as a fixed zero register because 441 functions often have to bulk zero their stack frames, and this is more 442 efficient with a designated zero register. 443 444 *Implementation note*: Registers with fixed meaning at calls but not 445 in function bodies must be initialized by "injected" calls such as 446 signal-based panics. 447 448 #### Stack layout 449 450 The stack pointer, RSP, grows down and is always aligned to 8 bytes. 451 452 The amd64 architecture does not use a link register. 453 454 A function's stack frame is laid out as follows: 455 456 +------------------------------+ 457 | return PC | 458 | RBP on entry | 459 | ... locals ... | 460 | ... outgoing arguments ... | 461 +------------------------------+ ↓ lower addresses 462 463 The "return PC" is pushed as part of the standard amd64 `CALL` 464 operation. 465 On entry, a function subtracts from RSP to open its stack frame and 466 saves the value of RBP directly below the return PC. 467 A leaf function that does not require any stack space may omit the 468 saved RBP. 469 470 The Go ABI's use of RBP as a frame pointer register is compatible with 471 amd64 platform conventions so that Go can inter-operate with platform 472 debuggers and profilers. 473 474 #### Flags 475 476 The direction flag (D) is always cleared (set to the “forward” 477 direction) at a call. 478 The arithmetic status flags are treated like scratch registers and not 479 preserved across calls. 480 All other bits in RFLAGS are system flags. 481 482 At function calls and returns, the CPU is in x87 mode (not MMX 483 technology mode). 484 485 *Rationale*: Go on amd64 does not use either the x87 registers or MMX 486 registers. Hence, we follow the SysV platform conventions in order to 487 simplify transitions to and from the C ABI. 488 489 At calls, the MXCSR control bits are always set as follows: 490 491 | Flag | Bit | Value | Meaning | 492 | --- | --- | --- | --- | 493 | FZ | 15 | 0 | Do not flush to zero | 494 | RC | 14/13 | 0 (RN) | Round to nearest | 495 | PM | 12 | 1 | Precision masked | 496 | UM | 11 | 1 | Underflow masked | 497 | OM | 10 | 1 | Overflow masked | 498 | ZM | 9 | 1 | Divide-by-zero masked | 499 | DM | 8 | 1 | Denormal operations masked | 500 | IM | 7 | 1 | Invalid operations masked | 501 | DAZ | 6 | 0 | Do not zero de-normals | 502 503 The MXCSR status bits are callee-save. 504 505 *Rationale*: Having a fixed MXCSR control configuration allows Go 506 functions to use SSE operations without modifying or saving the MXCSR. 507 Functions are allowed to modify it between calls (as long as they 508 restore it), but as of this writing Go code never does. 509 The above fixed configuration matches the process initialization 510 control bits specified by the ELF AMD64 ABI. 511 512 The x87 floating-point control word is not used by Go on amd64. 513 514 ### arm64 architecture 515 516 The arm64 architecture uses R0 – R15 for integer arguments and results. 517 518 It uses F0 – F15 for floating-point arguments and results. 519 520 *Rationale*: 16 integer registers and 16 floating-point registers are 521 more than enough for passing arguments and results for practically all 522 functions (see Appendix). While there are more registers available, 523 using more registers provides little benefit. Additionally, it will add 524 overhead on code paths where the number of arguments are not statically 525 known (e.g. reflect call), and will consume more stack space when there 526 is only limited stack space available to fit in the nosplit limit. 527 528 Registers R16 and R17 are permanent scratch registers. They are also 529 used as scratch registers by the linker (Go linker and external 530 linker) in trampolines. 531 532 Register R18 is reserved and never used. It is reserved for the OS 533 on some platforms (e.g. macOS). 534 535 Registers R19 – R25 are permanent scratch registers. In addition, 536 R27 is a permanent scratch register used by the assembler when 537 expanding instructions. 538 539 Floating-point registers F16 – F31 are also permanent scratch 540 registers. 541 542 Special-purpose registers are as follows: 543 544 | Register | Call meaning | Return meaning | Body meaning | 545 | --- | --- | --- | --- | 546 | RSP | Stack pointer | Same | Same | 547 | R30 | Link register | Same | Scratch (non-leaf functions) | 548 | R29 | Frame pointer | Same | Same | 549 | R28 | Current goroutine | Same | Same | 550 | R27 | Scratch | Scratch | Scratch | 551 | R26 | Closure context pointer | Scratch | Scratch | 552 | R18 | Reserved (not used) | Same | Same | 553 | ZR | Zero value | Same | Same | 554 555 *Rationale*: These register meanings are compatible with Go’s 556 stack-based calling convention. 557 558 *Rationale*: The link register, R30, holds the function return 559 address at the function entry. For functions that have frames 560 (including most non-leaf functions), R30 is saved to stack in the 561 function prologue and restored in the epilogue. Within the function 562 body, R30 can be used as a scratch register. 563 564 *Implementation note*: Registers with fixed meaning at calls but not 565 in function bodies must be initialized by "injected" calls such as 566 signal-based panics. 567 568 #### Stack layout 569 570 The stack pointer, RSP, grows down and is always aligned to 16 bytes. 571 572 *Rationale*: The arm64 architecture requires the stack pointer to be 573 16-byte aligned. 574 575 A function's stack frame, after the frame is created, is laid out as 576 follows: 577 578 +------------------------------+ 579 | ... locals ... | 580 | ... outgoing arguments ... | 581 | return PC | ← RSP points to 582 | frame pointer on entry | 583 +------------------------------+ ↓ lower addresses 584 585 The "return PC" is loaded to the link register, R30, as part of the 586 arm64 `CALL` operation. 587 588 On entry, a function subtracts from RSP to open its stack frame, and 589 saves the values of R30 and R29 at the bottom of the frame. 590 Specifically, R30 is saved at 0(RSP) and R29 is saved at -8(RSP), 591 after RSP is updated. 592 593 A leaf function that does not require any stack space may omit the 594 saved R30 and R29. 595 596 The Go ABI's use of R29 as a frame pointer register is compatible with 597 arm64 architecture requirement so that Go can inter-operate with platform 598 debuggers and profilers. 599 600 This stack layout is used by both register-based (ABIInternal) and 601 stack-based (ABI0) calling conventions. 602 603 #### Flags 604 605 The arithmetic status flags (NZCV) are treated like scratch registers 606 and not preserved across calls. 607 All other bits in PSTATE are system flags and are not modified by Go. 608 609 The floating-point status register (FPSR) is treated like scratch 610 registers and not preserved across calls. 611 612 At calls, the floating-point control register (FPCR) bits are always 613 set as follows: 614 615 | Flag | Bit | Value | Meaning | 616 | --- | --- | --- | --- | 617 | DN | 25 | 0 | Propagate NaN operands | 618 | FZ | 24 | 0 | Do not flush to zero | 619 | RC | 23/22 | 0 (RN) | Round to nearest, choose even if tied | 620 | IDE | 15 | 0 | Denormal operations trap disabled | 621 | IXE | 12 | 0 | Inexact trap disabled | 622 | UFE | 11 | 0 | Underflow trap disabled | 623 | OFE | 10 | 0 | Overflow trap disabled | 624 | DZE | 9 | 0 | Divide-by-zero trap disabled | 625 | IOE | 8 | 0 | Invalid operations trap disabled | 626 | NEP | 2 | 0 | Scalar operations do not affect higher elements in vector registers | 627 | AH | 1 | 0 | No alternate handling of de-normal inputs | 628 | FIZ | 0 | 0 | Do not zero de-normals | 629 630 *Rationale*: Having a fixed FPCR control configuration allows Go 631 functions to use floating-point and vector (SIMD) operations without 632 modifying or saving the FPCR. 633 Functions are allowed to modify it between calls (as long as they 634 restore it), but as of this writing Go code never does. 635 636 ### loong64 architecture 637 638 The loong64 architecture uses R4 – R19 for integer arguments and integer results. 639 640 It uses F0 – F15 for floating-point arguments and results. 641 642 Registers R20 - R21, R23 – R28, R30 - R31, F16 – F31 are permanent scratch registers. 643 644 Register R2 is reserved and never used. 645 646 Register R20, R21 is Used by runtime.duffcopy, runtime.duffzero. 647 648 Special-purpose registers used within Go generated code and Go assembly code 649 are as follows: 650 651 | Register | Call meaning | Return meaning | Body meaning | 652 | --- | --- | --- | --- | 653 | R0 | Zero value | Same | Same | 654 | R1 | Link register | Link register | Scratch | 655 | R3 | Stack pointer | Same | Same | 656 | R20,R21 | Scratch | Scratch | Used by duffcopy, duffzero | 657 | R22 | Current goroutine | Same | Same | 658 | R29 | Closure context pointer | Same | Same | 659 | R30, R31 | used by the assembler | Same | Same | 660 661 *Rationale*: These register meanings are compatible with Go’s stack-based 662 calling convention. 663 664 #### Stack layout 665 666 The stack pointer, R3, grows down and is aligned to 8 bytes. 667 668 A function's stack frame, after the frame is created, is laid out as 669 follows: 670 671 +------------------------------+ 672 | ... locals ... | 673 | ... outgoing arguments ... | 674 | return PC | ← R3 points to 675 +------------------------------+ ↓ lower addresses 676 677 This stack layout is used by both register-based (ABIInternal) and 678 stack-based (ABI0) calling conventions. 679 680 The "return PC" is loaded to the link register, R1, as part of the 681 loong64 `JAL` operation. 682 683 #### Flags 684 All bits in CSR are system flags and are not modified by Go. 685 686 ### ppc64 architecture 687 688 The ppc64 architecture uses R3 – R10 and R14 – R17 for integer arguments 689 and results. 690 691 It uses F1 – F12 for floating-point arguments and results. 692 693 Register R31 is a permanent scratch register in Go. 694 695 Special-purpose registers used within Go generated code and Go 696 assembly code are as follows: 697 698 | Register | Call meaning | Return meaning | Body meaning | 699 | --- | --- | --- | --- | 700 | R0 | Zero value | Same | Same | 701 | R1 | Stack pointer | Same | Same | 702 | R2 | TOC register | Same | Same | 703 | R11 | Closure context pointer | Scratch | Scratch | 704 | R12 | Function address on indirect calls | Scratch | Scratch | 705 | R13 | TLS pointer | Same | Same | 706 | R20,R21 | Scratch | Scratch | Used by duffcopy, duffzero | 707 | R30 | Current goroutine | Same | Same | 708 | R31 | Scratch | Scratch | Scratch | 709 | LR | Link register | Link register | Scratch | 710 *Rationale*: These register meanings are compatible with Go’s 711 stack-based calling convention. 712 713 The link register, LR, holds the function return 714 address at the function entry and is set to the correct return 715 address before exiting the function. It is also used 716 in some cases as the function address when doing an indirect call. 717 718 The register R2 contains the address of the TOC (table of contents) which 719 contains data or code addresses used when generating position independent 720 code. Non-Go code generated when using cgo contains TOC-relative addresses 721 which depend on R2 holding a valid TOC. Go code compiled with -shared or 722 -dynlink initializes and maintains R2 and uses it in some cases for 723 function calls; Go code compiled without these options does not modify R2. 724 725 When making a function call R12 contains the function address for use by the 726 code to generate R2 at the beginning of the function. R12 can be used for 727 other purposes within the body of the function, such as trampoline generation. 728 729 R20 and R21 are used in duffcopy and duffzero which could be generated 730 before arguments are saved so should not be used for register arguments. 731 732 The Count register CTR can be used as the call target for some branch instructions. 733 It holds the return address when preemption has occurred. 734 735 On PPC64 when a float32 is loaded it becomes a float64 in the register, which is 736 different from other platforms and that needs to be recognized by the internal 737 implementation of reflection so that float32 arguments are passed correctly. 738 739 Registers R18 - R29 and F13 - F31 are considered scratch registers. 740 741 #### Stack layout 742 743 The stack pointer, R1, grows down and is aligned to 8 bytes in Go, but changed 744 to 16 bytes when calling cgo. 745 746 A function's stack frame, after the frame is created, is laid out as 747 follows: 748 749 +------------------------------+ 750 | ... locals ... | 751 | ... outgoing arguments ... | 752 | 24 TOC register R2 save | When compiled with -shared/-dynlink 753 | 16 Unused in Go | Not used in Go 754 | 8 CR save | nonvolatile CR fields 755 | 0 return PC | ← R1 points to 756 +------------------------------+ ↓ lower addresses 757 758 The "return PC" is loaded to the link register, LR, as part of the 759 ppc64 `BL` operations. 760 761 On entry to a non-leaf function, the stack frame size is subtracted from R1 to 762 create its stack frame, and saves the value of LR at the bottom of the frame. 763 764 A leaf function that does not require any stack space does not modify R1 and 765 does not save LR. 766 767 *NOTE*: We might need to save the frame pointer on the stack as 768 in the PPC64 ELF v2 ABI so Go can inter-operate with platform debuggers 769 and profilers. 770 771 This stack layout is used by both register-based (ABIInternal) and 772 stack-based (ABI0) calling conventions. 773 774 #### Flags 775 776 The condition register consists of 8 condition code register fields 777 CR0-CR7. Go generated code only sets and uses CR0, commonly set by 778 compare functions and use to determine the target of a conditional 779 branch. The generated code does not set or use CR1-CR7. 780 781 The floating point status and control register (FPSCR) is initialized 782 to 0 by the kernel at startup of the Go program and not changed by 783 the Go generated code. 784 785 ### riscv64 architecture 786 787 The riscv64 architecture uses X10 – X17, X8, X9, X18 – X23 for integer arguments 788 and results. 789 790 It uses F10 – F17, F8, F9, F18 – F23 for floating-point arguments and results. 791 792 Special-purpose registers used within Go generated code and Go 793 assembly code are as follows: 794 795 | Register | Call meaning | Return meaning | Body meaning | 796 | --- | --- | --- | --- | 797 | X0 | Zero value | Same | Same | 798 | X1 | Link register | Link register | Scratch | 799 | X2 | Stack pointer | Same | Same | 800 | X3 | Global pointer | Same | Used by dynamic linker | 801 | X4 | TLS (thread pointer) | TLS | Scratch | 802 | X24,X25 | Scratch | Scratch | Used by duffcopy, duffzero | 803 | X26 | Closure context pointer | Scratch | Scratch | 804 | X27 | Current goroutine | Same | Same | 805 | X31 | Scratch | Scratch | Scratch | 806 807 *Rationale*: These register meanings are compatible with Go’s 808 stack-based calling convention. Context register X20 will change to X26, 809 duffcopy, duffzero register will change to X24, X25 before this register ABI been adopted. 810 X10 – X17, X8, X9, X18 – X23, is the same order as A0 – A7, S0 – S7 in platform ABI. 811 F10 – F17, F8, F9, F18 – F23, is the same order as FA0 – FA7, FS0 – FS7 in platform ABI. 812 X8 – X23, F8 – F15 are used for compressed instruction (RVC) which will benefit code size in the future. 813 814 #### Stack layout 815 816 The stack pointer, X2, grows down and is aligned to 8 bytes. 817 818 A function's stack frame, after the frame is created, is laid out as 819 follows: 820 821 +------------------------------+ 822 | ... locals ... | 823 | ... outgoing arguments ... | 824 | return PC | ← X2 points to 825 +------------------------------+ ↓ lower addresses 826 827 The "return PC" is loaded to the link register, X1, as part of the 828 riscv64 `CALL` operation. 829 830 #### Flags 831 832 The riscv64 has Zicsr extension for control and status register (CSR) and 833 treated as scratch register. 834 All bits in CSR are system flags and are not modified by Go. 835 836 ## Future directions 837 838 ### Spill path improvements 839 840 The ABI currently reserves spill space for argument registers so the 841 compiler can statically generate an argument spill path before calling 842 into `runtime.morestack` to grow the stack. 843 This ensures there will be sufficient spill space even when the stack 844 is nearly exhausted and keeps stack growth and stack scanning 845 essentially unchanged from ABI0. 846 847 However, this wastes stack space (the median wastage is 16 bytes per 848 call), resulting in larger stacks and increased cache footprint. 849 A better approach would be to reserve stack space only when spilling. 850 One way to ensure enough space is available to spill would be for 851 every function to ensure there is enough space for the function's own 852 frame *as well as* the spill space of all functions it calls. 853 For most functions, this would change the threshold for the prologue 854 stack growth check. 855 For `nosplit` functions, this would change the threshold used in the 856 linker's static stack size check. 857 858 Allocating spill space in the callee rather than the caller may also 859 allow for faster reflection calls in the common case where a function 860 takes only register arguments, since it would allow reflection to make 861 these calls directly without allocating any frame. 862 863 The statically-generated spill path also increases code size. 864 It is possible to instead have a generic spill path in the runtime, as 865 part of `morestack`. 866 However, this complicates reserving the spill space, since spilling 867 all possible register arguments would, in most cases, take 868 significantly more space than spilling only those used by a particular 869 function. 870 Some options are to spill to a temporary space and copy back only the 871 registers used by the function, or to grow the stack if necessary 872 before spilling to it (using a temporary space if necessary), or to 873 use a heap-allocated space if insufficient stack space is available. 874 These options all add enough complexity that we will have to make this 875 decision based on the actual code size growth caused by the static 876 spill paths. 877 878 ### Clobber sets 879 880 As defined, the ABI does not use callee-save registers. 881 This significantly simplifies the garbage collector and the compiler's 882 register allocator, but at some performance cost. 883 A potentially better balance for Go code would be to use *clobber 884 sets*: for each function, the compiler records the set of registers it 885 clobbers (including those clobbered by functions it calls) and any 886 register not clobbered by function F can remain live across calls to 887 F. 888 889 This is generally a good fit for Go because Go's package DAG allows 890 function metadata like the clobber set to flow up the call graph, even 891 across package boundaries. 892 Clobber sets would require relatively little change to the garbage 893 collector, unlike general callee-save registers. 894 One disadvantage of clobber sets over callee-save registers is that 895 they don't help with indirect function calls or interface method 896 calls, since static information isn't available in these cases. 897 898 ### Large aggregates 899 900 Go encourages passing composite values by value, and this simplifies 901 reasoning about mutation and races. 902 However, this comes at a performance cost for large composite values. 903 It may be possible to instead transparently pass large composite 904 values by reference and delay copying until it is actually necessary. 905 906 ## Appendix: Register usage analysis 907 908 In order to understand the impacts of the above design on register 909 usage, we 910 [analyzed](https://github.com/aclements/go-misc/tree/master/abi) the 911 impact of the above ABI on a large code base: cmd/kubelet from 912 [Kubernetes](https://github.com/kubernetes/kubernetes) at tag v1.18.8. 913 914 The following table shows the impact of different numbers of available 915 integer and floating-point registers on argument assignment: 916 917 ``` 918 | | | | stack args | spills | stack total | 919 | ints | floats | % fit | p50 | p95 | p99 | p50 | p95 | p99 | p50 | p95 | p99 | 920 | 0 | 0 | 6.3% | 32 | 152 | 256 | 0 | 0 | 0 | 32 | 152 | 256 | 921 | 0 | 8 | 6.4% | 32 | 152 | 256 | 0 | 0 | 0 | 32 | 152 | 256 | 922 | 1 | 8 | 21.3% | 24 | 144 | 248 | 8 | 8 | 8 | 32 | 152 | 256 | 923 | 2 | 8 | 38.9% | 16 | 128 | 224 | 8 | 16 | 16 | 24 | 136 | 240 | 924 | 3 | 8 | 57.0% | 0 | 120 | 224 | 16 | 24 | 24 | 24 | 136 | 240 | 925 | 4 | 8 | 73.0% | 0 | 120 | 216 | 16 | 32 | 32 | 24 | 136 | 232 | 926 | 5 | 8 | 83.3% | 0 | 112 | 216 | 16 | 40 | 40 | 24 | 136 | 232 | 927 | 6 | 8 | 87.5% | 0 | 112 | 208 | 16 | 48 | 48 | 24 | 136 | 232 | 928 | 7 | 8 | 89.8% | 0 | 112 | 208 | 16 | 48 | 56 | 24 | 136 | 232 | 929 | 8 | 8 | 91.3% | 0 | 112 | 200 | 16 | 56 | 64 | 24 | 136 | 232 | 930 | 9 | 8 | 92.1% | 0 | 112 | 192 | 16 | 56 | 72 | 24 | 136 | 232 | 931 | 10 | 8 | 92.6% | 0 | 104 | 192 | 16 | 56 | 72 | 24 | 136 | 232 | 932 | 11 | 8 | 93.1% | 0 | 104 | 184 | 16 | 56 | 80 | 24 | 128 | 232 | 933 | 12 | 8 | 93.4% | 0 | 104 | 176 | 16 | 56 | 88 | 24 | 128 | 232 | 934 | 13 | 8 | 94.0% | 0 | 88 | 176 | 16 | 56 | 96 | 24 | 128 | 232 | 935 | 14 | 8 | 94.4% | 0 | 80 | 152 | 16 | 64 | 104 | 24 | 128 | 232 | 936 | 15 | 8 | 94.6% | 0 | 80 | 152 | 16 | 64 | 112 | 24 | 128 | 232 | 937 | 16 | 8 | 94.9% | 0 | 16 | 152 | 16 | 64 | 112 | 24 | 128 | 232 | 938 | ∞ | 8 | 99.8% | 0 | 0 | 0 | 24 | 112 | 216 | 24 | 120 | 216 | 939 ``` 940 941 The first two columns show the number of available integer and 942 floating-point registers. 943 The first row shows the results for 0 integer and 0 floating-point 944 registers, which is equivalent to ABI0. 945 We found that any reasonable number of floating-point registers has 946 the same effect, so we fixed it at 8 for all other rows. 947 948 The “% fit” column gives the fraction of functions where all arguments 949 and results are register-assigned and no arguments are passed on the 950 stack. 951 The three “stack args” columns give the median, 95th and 99th 952 percentile number of bytes of stack arguments. 953 The “spills” columns likewise summarize the number of bytes in 954 on-stack spill space. 955 And “stack total” summarizes the sum of stack arguments and on-stack 956 spill slots. 957 Note that these are three different distributions; for example, 958 there’s no single function that takes 0 stack argument bytes, 16 spill 959 bytes, and 24 total stack bytes. 960 961 From this, we can see that the fraction of functions that fit entirely 962 in registers grows very slowly once it reaches about 90%, though 963 curiously there is a small minority of functions that could benefit 964 from a huge number of registers. 965 Making 9 integer registers available on amd64 puts it in this realm. 966 We also see that the stack space required for most functions is fairly 967 small. 968 While the increasing space required for spills largely balances out 969 the decreasing space required for stack arguments as the number of 970 available registers increases, there is a general reduction in the 971 total stack space required with more available registers. 972 This does, however, suggest that eliminating spill slots in the future 973 would noticeably reduce stack requirements.