github.com/go-asm/go@v1.21.1-0.20240213172139-40c5ead50c48/cmd/obj/ppc64/doc.go (about)

     1  // Copyright 2019 The Go Authors. All rights reserved.
     2  // Use of this source code is governed by a BSD-style
     3  // license that can be found in the LICENSE file.
     4  
     5  /*
     6  Package ppc64 implements a PPC64 assembler that assembles Go asm into
     7  the corresponding PPC64 instructions as defined by the Power ISA 3.0B.
     8  
     9  This document provides information on how to write code in Go assembler
    10  for PPC64, focusing on the differences between Go and PPC64 assembly language.
    11  It assumes some knowledge of PPC64 assembler. The original implementation of
    12  PPC64 in Go defined many opcodes that are different from PPC64 opcodes, but
    13  updates to the Go assembly language used mnemonics that are mostly similar if not
    14  identical to the PPC64 mneumonics, such as VMX and VSX instructions. Not all detail
    15  is included here; refer to the Power ISA document if interested in more detail.
    16  
    17  Starting with Go 1.15 the Go objdump supports the -gnu option, which provides a
    18  side by side view of the Go assembler and the PPC64 assembler output. This is
    19  extremely helpful in determining what final PPC64 assembly is generated from the
    20  corresponding Go assembly.
    21  
    22  In the examples below, the Go assembly is on the left, PPC64 assembly on the right.
    23  
    24  1. Operand ordering
    25  
    26  In Go asm, the last operand (right) is the target operand, but with PPC64 asm,
    27  the first operand (left) is the target. The order of the remaining operands is
    28  not consistent: in general opcodes with 3 operands that perform math or logical
    29  operations have their operands in reverse order. Opcodes for vector instructions
    30  and those with more than 3 operands usually have operands in the same order except
    31  for the target operand, which is first in PPC64 asm and last in Go asm.
    32  
    33  Example:
    34  
    35  	ADD R3, R4, R5		<=>	add r5, r4, r3
    36  
    37  2. Constant operands
    38  
    39  In Go asm, an operand that starts with '$' indicates a constant value. If the
    40  instruction using the constant has an immediate version of the opcode, then an
    41  immediate value is used with the opcode if possible.
    42  
    43  Example:
    44  
    45  	ADD $1, R3, R4		<=> 	addi r4, r3, 1
    46  
    47  3. Opcodes setting condition codes
    48  
    49  In PPC64 asm, some instructions other than compares have variations that can set
    50  the condition code where meaningful. This is indicated by adding '.' to the end
    51  of the PPC64 instruction. In Go asm, these instructions have 'CC' at the end of
    52  the opcode. The possible settings of the condition code depend on the instruction.
    53  CR0 is the default for fixed-point instructions; CR1 for floating point; CR6 for
    54  vector instructions.
    55  
    56  Example:
    57  
    58  	ANDCC R3, R4, R5		<=>	and. r5, r3, r4 (set CR0)
    59  
    60  4. Loads and stores from memory
    61  
    62  In Go asm, opcodes starting with 'MOV' indicate a load or store. When the target
    63  is a memory reference, then it is a store; when the target is a register and the
    64  source is a memory reference, then it is a load.
    65  
    66  MOV{B,H,W,D} variations identify the size as byte, halfword, word, doubleword.
    67  
    68  Adding 'Z' to the opcode for a load indicates zero extend; if omitted it is sign extend.
    69  Adding 'U' to a load or store indicates an update of the base register with the offset.
    70  Adding 'BR' to an opcode indicates byte-reversed load or store, or the order opposite
    71  of the expected endian order. If 'BR' is used then zero extend is assumed.
    72  
    73  Memory references n(Ra) indicate the address in Ra + n. When used with an update form
    74  of an opcode, the value in Ra is incremented by n.
    75  
    76  Memory references (Ra+Rb) or (Ra)(Rb) indicate the address Ra + Rb, used by indexed
    77  loads or stores. Both forms are accepted. When used with an update then the base register
    78  is updated by the value in the index register.
    79  
    80  Examples:
    81  
    82  	MOVD (R3), R4		<=>	ld r4,0(r3)
    83  	MOVW (R3), R4		<=>	lwa r4,0(r3)
    84  	MOVWZU 4(R3), R4		<=>	lwzu r4,4(r3)
    85  	MOVWZ (R3+R5), R4		<=>	lwzx r4,r3,r5
    86  	MOVHZ  (R3), R4		<=>	lhz r4,0(r3)
    87  	MOVHU 2(R3), R4		<=>	lhau r4,2(r3)
    88  	MOVBZ (R3), R4		<=>	lbz r4,0(r3)
    89  
    90  	MOVD R4,(R3)		<=>	std r4,0(r3)
    91  	MOVW R4,(R3)		<=>	stw r4,0(r3)
    92  	MOVW R4,(R3+R5)		<=>	stwx r4,r3,r5
    93  	MOVWU R4,4(R3)		<=>	stwu r4,4(r3)
    94  	MOVH R4,2(R3)		<=>	sth r4,2(r3)
    95  	MOVBU R4,(R3)(R5)		<=>	stbux r4,r3,r5
    96  
    97  4. Compares
    98  
    99  When an instruction does a compare or other operation that might
   100  result in a condition code, then the resulting condition is set
   101  in a field of the condition register. The condition register consists
   102  of 8 4-bit fields named CR0 - CR7. When a compare instruction
   103  identifies a CR then the resulting condition is set in that field
   104  to be read by a later branch or isel instruction. Within these fields,
   105  bits are set to indicate less than, greater than, or equal conditions.
   106  
   107  Once an instruction sets a condition, then a subsequent branch, isel or
   108  other instruction can read the condition field and operate based on the
   109  bit settings.
   110  
   111  Examples:
   112  
   113  	CMP R3, R4			<=>	cmp r3, r4	(CR0 assumed)
   114  	CMP R3, R4, CR1		<=>	cmp cr1, r3, r4
   115  
   116  Note that the condition register is the target operand of compare opcodes, so
   117  the remaining operands are in the same order for Go asm and PPC64 asm.
   118  When CR0 is used then it is implicit and does not need to be specified.
   119  
   120  5. Branches
   121  
   122  Many branches are represented as a form of the BC instruction. There are
   123  other extended opcodes to make it easier to see what type of branch is being
   124  used.
   125  
   126  The following is a brief description of the BC instruction and its commonly
   127  used operands.
   128  
   129  BC op1, op2, op3
   130  
   131  	  op1: type of branch
   132  	      16 -> bctr (branch on ctr)
   133  	      12 -> bcr  (branch if cr bit is set)
   134  	      8  -> bcr+bctr (branch on ctr and cr values)
   135  		4  -> bcr != 0 (branch if specified cr bit is not set)
   136  
   137  		There are more combinations but these are the most common.
   138  
   139  	  op2: condition register field and condition bit
   140  
   141  		This contains an immediate value indicating which condition field
   142  		to read and what bits to test. Each field is 4 bits long with CR0
   143  	      at bit 0, CR1 at bit 4, etc. The value is computed as 4*CR+condition
   144  	      with these condition values:
   145  
   146  	      0 -> LT
   147  	      1 -> GT
   148  	      2 -> EQ
   149  	      3 -> OVG
   150  
   151  		Thus 0 means test CR0 for LT, 5 means CR1 for GT, 30 means CR7 for EQ.
   152  
   153  	  op3: branch target
   154  
   155  Examples:
   156  
   157  	BC 12, 0, target		<=>	blt cr0, target
   158  	BC 12, 2, target		<=>	beq cr0, target
   159  	BC 12, 5, target		<=>	bgt cr1, target
   160  	BC 12, 30, target		<=>	beq cr7, target
   161  	BC 4, 6, target		<=>	bne cr1, target
   162  	BC 4, 1, target		<=>	ble cr1, target
   163  
   164  	The following extended opcodes are available for ease of use and readability:
   165  
   166  	BNE CR2, target		<=>	bne cr2, target
   167  	BEQ CR4, target		<=>	beq cr4, target
   168  	BLT target			<=>	blt target (cr0 default)
   169  	BGE CR7, target		<=>	bge cr7, target
   170  
   171  Refer to the ISA for more information on additional values for the BC instruction,
   172  how to handle OVG information, and much more.
   173  
   174  5. Align directive
   175  
   176  Starting with Go 1.12, Go asm supports the PCALIGN directive, which indicates
   177  that the next instruction should be aligned to the specified value. Currently
   178  8 and 16 are the only supported values, and a maximum of 2 NOPs will be added
   179  to align the code. That means in the case where the code is aligned to 4 but
   180  PCALIGN $16 is at that location, the code will only be aligned to 8 to avoid
   181  adding 3 NOPs.
   182  
   183  The purpose of this directive is to improve performance for cases like loops
   184  where better alignment (8 or 16 instead of 4) might be helpful. This directive
   185  exists in PPC64 assembler and is frequently used by PPC64 assembler writers.
   186  
   187  PCALIGN $16
   188  PCALIGN $8
   189  
   190  By default, functions in Go are aligned to 16 bytes, as is the case in all
   191  other compilers for PPC64. If there is a PCALIGN directive requesting alignment
   192  greater than 16, then the alignment of the containing function must be
   193  promoted to that same alignment or greater.
   194  
   195  The behavior of PCALIGN is changed in Go 1.21 to be more straightforward to
   196  ensure the alignment required for some instructions in power10. The acceptable
   197  values are 8, 16, 32 and 64, and the use of those values will always provide the
   198  specified alignment.
   199  
   200  6. Shift instructions
   201  
   202  The simple scalar shifts on PPC64 expect a shift count that fits in 5 bits for
   203  32-bit values or 6 bit for 64-bit values. If the shift count is a constant value
   204  greater than the max then the assembler sets it to the max for that size (31 for
   205  32 bit values, 63 for 64 bit values). If the shift count is in a register, then
   206  only the low 5 or 6 bits of the register will be used as the shift count. The
   207  Go compiler will add appropriate code to compare the shift value to achieve the
   208  correct result, and the assembler does not add extra checking.
   209  
   210  Examples:
   211  
   212  	SRAD $8,R3,R4		=>	sradi r4,r3,8
   213  	SRD $8,R3,R4		=>	rldicl r4,r3,56,8
   214  	SLD $8,R3,R4		=>	rldicr r4,r3,8,55
   215  	SRAW $16,R4,R5		=>	srawi r5,r4,16
   216  	SRW $40,R4,R5		=>	rlwinm r5,r4,0,0,31
   217  	SLW $12,R4,R5		=>	rlwinm r5,r4,12,0,19
   218  
   219  Some non-simple shifts have operands in the Go assembly which don't map directly
   220  onto operands in the PPC64 assembly. When an operand in a shift instruction in the
   221  Go assembly is a bit mask, that mask is represented as a start and end bit in the
   222  PPC64 assembly instead of a mask. See the ISA for more detail on these types of shifts.
   223  Here are a few examples:
   224  
   225  	RLWMI $7,R3,$65535,R6 	=>	rlwimi r6,r3,7,16,31
   226  	RLDMI $0,R4,$7,R6 		=>	rldimi r6,r4,0,61
   227  
   228  More recently, Go opcodes were added which map directly onto the PPC64 opcodes. It is
   229  recommended to use the newer opcodes to avoid confusion.
   230  
   231  	RLDICL $0,R4,$15,R6		=>	rldicl r6,r4,0,15
   232  	RLDICR $0,R4,$15,R6		=>	rldicr r6.r4,0,15
   233  
   234  # Register naming
   235  
   236  1. Special register usage in Go asm
   237  
   238  The following registers should not be modified by user Go assembler code.
   239  
   240  	R0: Go code expects this register to contain the value 0.
   241  	R1: Stack pointer
   242  	R2: TOC pointer when compiled with -shared or -dynlink (a.k.a position independent code)
   243  	R13: TLS pointer
   244  	R30: g (goroutine)
   245  
   246  Register names:
   247  
   248  	Rn is used for general purpose registers. (0-31)
   249  	Fn is used for floating point registers. (0-31)
   250  	Vn is used for vector registers. Slot 0 of Vn overlaps with Fn. (0-31)
   251  	VSn is used for vector-scalar registers. V0-V31 overlap with VS32-VS63. (0-63)
   252  	CTR represents the count register.
   253  	LR represents the link register.
   254  	CR represents the condition register
   255  	CRn represents a condition register field. (0-7)
   256  	CRnLT represents CR bit 0 of CR field n. (0-7)
   257  	CRnGT represents CR bit 1 of CR field n. (0-7)
   258  	CRnEQ represents CR bit 2 of CR field n. (0-7)
   259  	CRnSO represents CR bit 3 of CR field n. (0-7)
   260  
   261  # GOPPC64 >= power10 and its effects on Go asm
   262  
   263  When GOPPC64=power10 is used to compile a Go program for ppc64le/linux, MOV*, FMOV*, and ADD
   264  opcodes which would require 2 or more machine instructions to emulate a 32 bit constant, or
   265  symbolic reference are implemented using prefixed instructions.
   266  
   267  A user who wishes granular control over the generated machine code is advised to use Go asm
   268  opcodes which explicitly translate to one PPC64 machine instruction. Most common opcodes
   269  are supported.
   270  
   271  Some examples of how pseudo-op assembly changes with GOPPC64:
   272  
   273  	Go asm                       GOPPC64 <= power9          GOPPC64 >= power10
   274  	MOVD mypackage·foo(SB), R3   addis r2, r3, ...          pld r3, ...
   275  	                             ld    r3, r3, ...
   276  
   277  	MOVD 131072(R3), R4          addis r31, r4, 2           pld r4, 131072(r3)
   278  	                             ld    r4, 0(R3)
   279  
   280  	ADD $131073, R3              lis  r31, 2                paddi r3, r3, 131073
   281  	                             addi r31, 1
   282  	                             add  r3,r31,r3
   283  
   284  	MOVD $131073, R3             lis  r3, 2                 pli r3, 131073
   285  	                             addi r3, 1
   286  
   287  	MOVD $mypackage·foo(SB), R3  addis r2, r3, ...          pla r3, ...
   288  	                             addi  r3, r3, ...
   289  */
   290  package ppc64