github.com/likebike/go--@v0.0.0-20190911215757-0bd925d16e96/go/doc/asm.html (about)

     1  <!--{
     2  	"Title": "A Quick Guide to Go's Assembler",
     3  	"Path":  "/doc/asm"
     4  }-->
     5  
     6  <h2 id="introduction">A Quick Guide to Go's Assembler</h2>
     7  
     8  <p>
     9  This document is a quick outline of the unusual form of assembly language used by the <code>gc</code> Go compiler.
    10  The document is not comprehensive.
    11  </p>
    12  
    13  <p>
    14  The assembler is based on the input style of the Plan 9 assemblers, which is documented in detail
    15  <a href="https://9p.io/sys/doc/asm.html">elsewhere</a>.
    16  If you plan to write assembly language, you should read that document although much of it is Plan 9-specific.
    17  The current document provides a summary of the syntax and the differences with
    18  what is explained in that document, and
    19  describes the peculiarities that apply when writing assembly code to interact with Go.
    20  </p>
    21  
    22  <p>
    23  The most important thing to know about Go's assembler is that it is not a direct representation of the underlying machine.
    24  Some of the details map precisely to the machine, but some do not.
    25  This is because the compiler suite (see
    26  <a href="https://9p.io/sys/doc/compiler.html">this description</a>)
    27  needs no assembler pass in the usual pipeline.
    28  Instead, the compiler operates on a kind of semi-abstract instruction set,
    29  and instruction selection occurs partly after code generation.
    30  The assembler works on the semi-abstract form, so
    31  when you see an instruction like <code>MOV</code>
    32  what the toolchain actually generates for that operation might
    33  not be a move instruction at all, perhaps a clear or load.
    34  Or it might correspond exactly to the machine instruction with that name.
    35  In general, machine-specific operations tend to appear as themselves, while more general concepts like
    36  memory move and subroutine call and return are more abstract.
    37  The details vary with architecture, and we apologize for the imprecision; the situation is not well-defined.
    38  </p>
    39  
    40  <p>
    41  The assembler program is a way to parse a description of that
    42  semi-abstract instruction set and turn it into instructions to be
    43  input to the linker.
    44  If you want to see what the instructions look like in assembly for a given architecture, say amd64, there
    45  are many examples in the sources of the standard library, in packages such as
    46  <a href="/pkg/runtime/"><code>runtime</code></a> and
    47  <a href="/pkg/math/big/"><code>math/big</code></a>.
    48  You can also examine what the compiler emits as assembly code
    49  (the actual output may differ from what you see here):
    50  </p>
    51  
    52  <pre>
    53  $ cat x.go
    54  package main
    55  
    56  func main() {
    57  	println(3)
    58  }
    59  $ GOOS=linux GOARCH=amd64 go tool compile -S x.go        # or: go build -gcflags -S x.go
    60  
    61  --- prog list "main" ---
    62  0000 (x.go:3) TEXT    main+0(SB),$8-0
    63  0001 (x.go:3) FUNCDATA $0,gcargs·0+0(SB)
    64  0002 (x.go:3) FUNCDATA $1,gclocals·0+0(SB)
    65  0003 (x.go:4) MOVQ    $3,(SP)
    66  0004 (x.go:4) PCDATA  $0,$8
    67  0005 (x.go:4) CALL    ,runtime.printint+0(SB)
    68  0006 (x.go:4) PCDATA  $0,$-1
    69  0007 (x.go:4) PCDATA  $0,$0
    70  0008 (x.go:4) CALL    ,runtime.printnl+0(SB)
    71  0009 (x.go:4) PCDATA  $0,$-1
    72  0010 (x.go:5) RET     ,
    73  ...
    74  </pre>
    75  
    76  <p>
    77  The <code>FUNCDATA</code> and <code>PCDATA</code> directives contain information
    78  for use by the garbage collector; they are introduced by the compiler.
    79  </p> 
    80  
    81  <!-- Commenting out because the feature is gone but it's popular and may come back.
    82  
    83  <p>
    84  To see what gets put in the binary after linking, add the <code>-a</code> flag to the linker:
    85  </p>
    86  
    87  <pre>
    88  $ go tool 6l -a x.6        # or: go build -ldflags -a x.go
    89  codeblk [0x2000,0x1d059) at offset 0x1000
    90  002000	main.main            | (3)	TEXT	main.main+0(SB),$8
    91  002000	65488b0c25a0080000   | (3)	MOVQ	2208(GS),CX
    92  002009	483b21               | (3)	CMPQ	SP,(CX)
    93  00200c	7707                 | (3)	JHI	,2015
    94  00200e	e83da20100           | (3)	CALL	,1c250+runtime.morestack00
    95  002013	ebeb                 | (3)	JMP	,2000
    96  002015	4883ec08             | (3)	SUBQ	$8,SP
    97  002019	                     | (3)	FUNCDATA	$0,main.gcargs·0+0(SB)
    98  002019	                     | (3)	FUNCDATA	$1,main.gclocals·0+0(SB)
    99  002019	48c7042403000000     | (4)	MOVQ	$3,(SP)
   100  002021	                     | (4)	PCDATA	$0,$8
   101  002021	e8aad20000           | (4)	CALL	,f2d0+runtime.printint
   102  002026	                     | (4)	PCDATA	$0,$-1
   103  002026	                     | (4)	PCDATA	$0,$0
   104  002026	e865d40000           | (4)	CALL	,f490+runtime.printnl
   105  00202b	                     | (4)	PCDATA	$0,$-1
   106  00202b	4883c408             | (5)	ADDQ	$8,SP
   107  00202f	c3                   | (5)	RET	,
   108  ...
   109  </pre>
   110  
   111  -->
   112  
   113  <h3 id="constants">Constants</h3>
   114  
   115  <p>
   116  Although the assembler takes its guidance from the Plan 9 assemblers,
   117  it is a distinct program, so there are some differences.
   118  One is in constant evaluation.
   119  Constant expressions in the assembler are parsed using Go's operator
   120  precedence, not the C-like precedence of the original.
   121  Thus <code>3&amp;1<<2</code> is 4, not 0—it parses as <code>(3&amp;1)<<2</code>
   122  not <code>3&amp;(1<<2)</code>.
   123  Also, constants are always evaluated as 64-bit unsigned integers.
   124  Thus <code>-2</code> is not the integer value minus two,
   125  but the unsigned 64-bit integer with the same bit pattern.
   126  The distinction rarely matters but
   127  to avoid ambiguity, division or right shift where the right operand's
   128  high bit is set is rejected.
   129  </p>
   130  
   131  <h3 id="symbols">Symbols</h3>
   132  
   133  <p>
   134  Some symbols, such as <code>R1</code> or <code>LR</code>,
   135  are predefined and refer to registers.
   136  The exact set depends on the architecture.
   137  </p>
   138  
   139  <p>
   140  There are four predeclared symbols that refer to pseudo-registers.
   141  These are not real registers, but rather virtual registers maintained by
   142  the toolchain, such as a frame pointer.
   143  The set of pseudo-registers is the same for all architectures:
   144  </p>
   145  
   146  <ul>
   147  
   148  <li>
   149  <code>FP</code>: Frame pointer: arguments and locals.
   150  </li>
   151  
   152  <li>
   153  <code>PC</code>: Program counter:
   154  jumps and branches.
   155  </li>
   156  
   157  <li>
   158  <code>SB</code>: Static base pointer: global symbols.
   159  </li>
   160  
   161  <li>
   162  <code>SP</code>: Stack pointer: top of stack.
   163  </li>
   164  
   165  </ul>
   166  
   167  <p>
   168  All user-defined symbols are written as offsets to the pseudo-registers
   169  <code>FP</code> (arguments and locals) and <code>SB</code> (globals).
   170  </p>
   171  
   172  <p>
   173  The <code>SB</code> pseudo-register can be thought of as the origin of memory, so the symbol <code>foo(SB)</code>
   174  is the name <code>foo</code> as an address in memory.
   175  This form is used to name global functions and data.
   176  Adding <code>&lt;&gt;</code> to the name, as in <span style="white-space: nowrap"><code>foo&lt;&gt;(SB)</code></span>, makes the name
   177  visible only in the current source file, like a top-level <code>static</code> declaration in a C file.
   178  Adding an offset to the name refers to that offset from the symbol's address, so
   179  <code>foo+4(SB)</code> is four bytes past the start of <code>foo</code>.
   180  </p>
   181  
   182  <p>
   183  The <code>FP</code> pseudo-register is a virtual frame pointer
   184  used to refer to function arguments.
   185  The compilers maintain a virtual frame pointer and refer to the arguments on the stack as offsets from that pseudo-register.
   186  Thus <code>0(FP)</code> is the first argument to the function,
   187  <code>8(FP)</code> is the second (on a 64-bit machine), and so on.
   188  However, when referring to a function argument this way, it is necessary to place a name
   189  at the beginning, as in <code>first_arg+0(FP)</code> and <code>second_arg+8(FP)</code>.
   190  (The meaning of the offset—offset from the frame pointer—distinct
   191  from its use with <code>SB</code>, where it is an offset from the symbol.)
   192  The assembler enforces this convention, rejecting plain <code>0(FP)</code> and <code>8(FP)</code>.
   193  The actual name is semantically irrelevant but should be used to document
   194  the argument's name.
   195  It is worth stressing that <code>FP</code> is always a
   196  pseudo-register, not a hardware
   197  register, even on architectures with a hardware frame pointer.
   198  </p>
   199  
   200  <p>
   201  For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the argument names
   202  and offsets match.
   203  On 32-bit systems, the low and high 32 bits of a 64-bit value are distinguished by adding
   204  a <code>_lo</code> or <code>_hi</code> suffix to the name, as in <code>arg_lo+0(FP)</code> or <code>arg_hi+4(FP)</code>.
   205  If a Go prototype does not name its result, the expected assembly name is <code>ret</code>.
   206  </p>
   207  
   208  <p>
   209  The <code>SP</code> pseudo-register is a virtual stack pointer
   210  used to refer to frame-local variables and the arguments being
   211  prepared for function calls.
   212  It points to the top of the local stack frame, so references should use negative offsets
   213  in the range [−framesize, 0):
   214  <code>x-8(SP)</code>, <code>y-4(SP)</code>, and so on.
   215  </p>
   216  
   217  <p>
   218  On architectures with a hardware register named <code>SP</code>,
   219  the name prefix distinguishes
   220  references to the virtual stack pointer from references to the architectural
   221  <code>SP</code> register.
   222  That is, <code>x-8(SP)</code> and <code>-8(SP)</code>
   223  are different memory locations:
   224  the first refers to the virtual stack pointer pseudo-register,
   225  while the second refers to the
   226  hardware's <code>SP</code> register.
   227  </p>
   228  
   229  <p>
   230  On machines where <code>SP</code> and <code>PC</code> are
   231  traditionally aliases for a physical, numbered register,
   232  in the Go assembler the names <code>SP</code> and <code>PC</code>
   233  are still treated specially;
   234  for instance, references to <code>SP</code> require a symbol,
   235  much like <code>FP</code>.
   236  To access the actual hardware register use the true <code>R</code> name.
   237  For example, on the ARM architecture the hardware
   238  <code>SP</code> and <code>PC</code> are accessible as
   239  <code>R13</code> and <code>R15</code>.
   240  </p>
   241  
   242  <p>
   243  Branches and direct jumps are always written as offsets to the PC, or as
   244  jumps to labels:
   245  </p>
   246  
   247  <pre>
   248  label:
   249  	MOVW $0, R1
   250  	JMP label
   251  </pre>
   252  
   253  <p>
   254  Each label is visible only within the function in which it is defined.
   255  It is therefore permitted for multiple functions in a file to define
   256  and use the same label names.
   257  Direct jumps and call instructions can target text symbols,
   258  such as <code>name(SB)</code>, but not offsets from symbols,
   259  such as <code>name+4(SB)</code>.
   260  </p>
   261  
   262  <p>
   263  Instructions, registers, and assembler directives are always in UPPER CASE to remind you
   264  that assembly programming is a fraught endeavor.
   265  (Exception: the <code>g</code> register renaming on ARM.)
   266  </p>
   267  
   268  <p>
   269  In Go object files and binaries, the full name of a symbol is the 
   270  package path followed by a period and the symbol name:
   271  <code>fmt.Printf</code> or <code>math/rand.Int</code>.
   272  Because the assembler's parser treats period and slash as punctuation,
   273  those strings cannot be used directly as identifier names.
   274  Instead, the assembler allows the middle dot character U+00B7
   275  and the division slash U+2215 in identifiers and rewrites them to
   276  plain period and slash.
   277  Within an assembler source file, the symbols above are written as
   278  <code>fmt·Printf</code> and <code>math∕rand·Int</code>.
   279  The assembly listings generated by the compilers when using the <code>-S</code> flag
   280  show the period and slash directly instead of the Unicode replacements
   281  required by the assemblers.
   282  </p>
   283  
   284  <p>
   285  Most hand-written assembly files do not include the full package path
   286  in symbol names, because the linker inserts the package path of the current
   287  object file at the beginning of any name starting with a period:
   288  in an assembly source file within the math/rand package implementation,
   289  the package's Int function can be referred to as <code>·Int</code>.
   290  This convention avoids the need to hard-code a package's import path in its
   291  own source code, making it easier to move the code from one location to another.
   292  </p>
   293  
   294  <h3 id="directives">Directives</h3>
   295  
   296  <p>
   297  The assembler uses various directives to bind text and data to symbol names.
   298  For example, here is a simple complete function definition. The <code>TEXT</code>
   299  directive declares the symbol <code>runtime·profileloop</code> and the instructions
   300  that follow form the body of the function.
   301  The last instruction in a <code>TEXT</code> block must be some sort of jump, usually a <code>RET</code> (pseudo-)instruction.
   302  (If it's not, the linker will append a jump-to-itself instruction; there is no fallthrough in <code>TEXTs</code>.)
   303  After the symbol, the arguments are flags (see below)
   304  and the frame size, a constant (but see below):
   305  </p>
   306  
   307  <pre>
   308  TEXT runtime·profileloop(SB),NOSPLIT,$8
   309  	MOVQ	$runtime·profileloop1(SB), CX
   310  	MOVQ	CX, 0(SP)
   311  	CALL	runtime·externalthreadhandler(SB)
   312  	RET
   313  </pre>
   314  
   315  <p>
   316  In the general case, the frame size is followed by an argument size, separated by a minus sign.
   317  (It's not a subtraction, just idiosyncratic syntax.)
   318  The frame size <code>$24-8</code> states that the function has a 24-byte frame
   319  and is called with 8 bytes of argument, which live on the caller's frame.
   320  If <code>NOSPLIT</code> is not specified for the <code>TEXT</code>,
   321  the argument size must be provided.
   322  For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the
   323  argument size is correct.
   324  </p>
   325  
   326  <p>
   327  Note that the symbol name uses a middle dot to separate the components and is specified as an offset from the
   328  static base pseudo-register <code>SB</code>.
   329  This function would be called from Go source for package <code>runtime</code> using the
   330  simple name <code>profileloop</code>.
   331  </p>
   332  
   333  <p>
   334  Global data symbols are defined by a sequence of initializing
   335  <code>DATA</code> directives followed by a <code>GLOBL</code> directive.
   336  Each <code>DATA</code> directive initializes a section of the
   337  corresponding memory.
   338  The memory not explicitly initialized is zeroed.
   339  The general form of the <code>DATA</code> directive is
   340  
   341  <pre>
   342  DATA	symbol+offset(SB)/width, value
   343  </pre>
   344  
   345  <p>
   346  which initializes the symbol memory at the given offset and width with the given value.
   347  The <code>DATA</code> directives for a given symbol must be written with increasing offsets.
   348  </p>
   349  
   350  <p>
   351  The <code>GLOBL</code> directive declares a symbol to be global.
   352  The arguments are optional flags and the size of the data being declared as a global,
   353  which will have initial value all zeros unless a <code>DATA</code> directive
   354  has initialized it.
   355  The <code>GLOBL</code> directive must follow any corresponding <code>DATA</code> directives.
   356  </p>
   357  
   358  <p>
   359  For example,
   360  </p>
   361  
   362  <pre>
   363  DATA divtab&lt;&gt;+0x00(SB)/4, $0xf4f8fcff
   364  DATA divtab&lt;&gt;+0x04(SB)/4, $0xe6eaedf0
   365  ...
   366  DATA divtab&lt;&gt;+0x3c(SB)/4, $0x81828384
   367  GLOBL divtab&lt;&gt;(SB), RODATA, $64
   368  
   369  GLOBL runtime·tlsoffset(SB), NOPTR, $4
   370  </pre>
   371  
   372  <p>
   373  declares and initializes <code>divtab&lt;&gt;</code>, a read-only 64-byte table of 4-byte integer values,
   374  and declares <code>runtime·tlsoffset</code>, a 4-byte, implicitly zeroed variable that
   375  contains no pointers.
   376  </p>
   377  
   378  <p>
   379  There may be one or two arguments to the directives.
   380  If there are two, the first is a bit mask of flags,
   381  which can be written as numeric expressions, added or or-ed together,
   382  or can be set symbolically for easier absorption by a human.
   383  Their values, defined in the standard <code>#include</code>  file <code>textflag.h</code>, are:
   384  </p>
   385  
   386  <ul>
   387  <li>
   388  <code>NOPROF</code> = 1
   389  <br>
   390  (For <code>TEXT</code> items.)
   391  Don't profile the marked function.  This flag is deprecated.
   392  </li>
   393  <li>
   394  <code>DUPOK</code> = 2
   395  <br>
   396  It is legal to have multiple instances of this symbol in a single binary.
   397  The linker will choose one of the duplicates to use.
   398  </li>
   399  <li>
   400  <code>NOSPLIT</code> = 4
   401  <br>
   402  (For <code>TEXT</code> items.)
   403  Don't insert the preamble to check if the stack must be split.
   404  The frame for the routine, plus anything it calls, must fit in the
   405  spare space at the top of the stack segment.
   406  Used to protect routines such as the stack splitting code itself.
   407  </li>
   408  <li>
   409  <code>RODATA</code> = 8
   410  <br>
   411  (For <code>DATA</code> and <code>GLOBL</code> items.)
   412  Put this data in a read-only section.
   413  </li>
   414  <li>
   415  <code>NOPTR</code> = 16
   416  <br>
   417  (For <code>DATA</code> and <code>GLOBL</code> items.)
   418  This data contains no pointers and therefore does not need to be
   419  scanned by the garbage collector.
   420  </li>
   421  <li>
   422  <code>WRAPPER</code> = 32
   423  <br>
   424  (For <code>TEXT</code> items.)
   425  This is a wrapper function and should not count as disabling <code>recover</code>.
   426  </li>
   427  <li>
   428  <code>NEEDCTXT</code> = 64
   429  <br>
   430  (For <code>TEXT</code> items.)
   431  This function is a closure so it uses its incoming context register.
   432  </li>
   433  </ul>
   434  
   435  <h3 id="runtime">Runtime Coordination</h3>
   436  
   437  <p>
   438  For garbage collection to run correctly, the runtime must know the
   439  location of pointers in all global data and in most stack frames.
   440  The Go compiler emits this information when compiling Go source files,
   441  but assembly programs must define it explicitly.
   442  </p>
   443  
   444  <p>
   445  A data symbol marked with the <code>NOPTR</code> flag (see above)
   446  is treated as containing no pointers to runtime-allocated data.
   447  A data symbol with the <code>RODATA</code> flag
   448  is allocated in read-only memory and is therefore treated
   449  as implicitly marked <code>NOPTR</code>.
   450  A data symbol with a total size smaller than a pointer
   451  is also treated as implicitly marked <code>NOPTR</code>.
   452  It is not possible to define a symbol containing pointers in an assembly source file;
   453  such a symbol must be defined in a Go source file instead.
   454  Assembly source can still refer to the symbol by name
   455  even without <code>DATA</code> and <code>GLOBL</code> directives.
   456  A good general rule of thumb is to define all non-<code>RODATA</code>
   457  symbols in Go instead of in assembly.
   458  </p>
   459  
   460  <p>
   461  Each function also needs annotations giving the location of
   462  live pointers in its arguments, results, and local stack frame.
   463  For an assembly function with no pointer results and
   464  either no local stack frame or no function calls,
   465  the only requirement is to define a Go prototype for the function
   466  in a Go source file in the same package. The name of the assembly
   467  function must not contain the package name component (for example,
   468  function <code>Syscall</code> in package <code>syscall</code> should
   469  use the name <code>·Syscall</code> instead of the equivalent name
   470  <code>syscall·Syscall</code> in its <code>TEXT</code> directive).
   471  For more complex situations, explicit annotation is needed.
   472  These annotations use pseudo-instructions defined in the standard
   473  <code>#include</code> file <code>funcdata.h</code>.
   474  </p>
   475  
   476  <p>
   477  If a function has no arguments and no results,
   478  the pointer information can be omitted.
   479  This is indicated by an argument size annotation of <code>$<i>n</i>-0</code>
   480  on the <code>TEXT</code> instruction.
   481  Otherwise, pointer information must be provided by
   482  a Go prototype for the function in a Go source file,
   483  even for assembly functions not called directly from Go.
   484  (The prototype will also let <code>go</code> <code>vet</code> check the argument references.)
   485  At the start of the function, the arguments are assumed
   486  to be initialized but the results are assumed uninitialized.
   487  If the results will hold live pointers during a call instruction,
   488  the function should start by zeroing the results and then 
   489  executing the pseudo-instruction <code>GO_RESULTS_INITIALIZED</code>.
   490  This instruction records that the results are now initialized
   491  and should be scanned during stack movement and garbage collection.
   492  It is typically easier to arrange that assembly functions do not
   493  return pointers or do not contain call instructions;
   494  no assembly functions in the standard library use
   495  <code>GO_RESULTS_INITIALIZED</code>.
   496  </p>
   497  
   498  <p>
   499  If a function has no local stack frame,
   500  the pointer information can be omitted.
   501  This is indicated by a local frame size annotation of <code>$0-<i>n</i></code>
   502  on the <code>TEXT</code> instruction.
   503  The pointer information can also be omitted if the
   504  function contains no call instructions.
   505  Otherwise, the local stack frame must not contain pointers,
   506  and the assembly must confirm this fact by executing the 
   507  pseudo-instruction <code>NO_LOCAL_POINTERS</code>.
   508  Because stack resizing is implemented by moving the stack,
   509  the stack pointer may change during any function call:
   510  even pointers to stack data must not be kept in local variables.
   511  </p>
   512  
   513  <p>
   514  Assembly functions should always be given Go prototypes,
   515  both to provide pointer information for the arguments and results
   516  and to let <code>go</code> <code>vet</code> check that
   517  the offsets being used to access them are correct.
   518  </p>
   519  
   520  <h2 id="architectures">Architecture-specific details</h2>
   521  
   522  <p>
   523  It is impractical to list all the instructions and other details for each machine.
   524  To see what instructions are defined for a given machine, say ARM,
   525  look in the source for the <code>obj</code> support library for
   526  that architecture, located in the directory <code>src/cmd/internal/obj/arm</code>.
   527  In that directory is a file <code>a.out.go</code>; it contains
   528  a long list of constants starting with <code>A</code>, like this:
   529  </p>
   530  
   531  <pre>
   532  const (
   533  	AAND = obj.ABaseARM + obj.A_ARCHSPECIFIC + iota
   534  	AEOR
   535  	ASUB
   536  	ARSB
   537  	AADD
   538  	...
   539  </pre>
   540  
   541  <p>
   542  This is the list of instructions and their spellings as known to the assembler and linker for that architecture.
   543  Each instruction begins with an initial capital <code>A</code> in this list, so <code>AAND</code>
   544  represents the bitwise and instruction,
   545  <code>AND</code> (without the leading <code>A</code>),
   546  and is written in assembly source as <code>AND</code>.
   547  The enumeration is mostly in alphabetical order.
   548  (The architecture-independent <code>AXXX</code>, defined in the
   549  <code>cmd/internal/obj</code> package,
   550  represents an invalid instruction).
   551  The sequence of the <code>A</code> names has nothing to do with the actual
   552  encoding of the machine instructions.
   553  The <code>cmd/internal/obj</code> package takes care of that detail.
   554  </p>
   555  
   556  <p>
   557  The instructions for both the 386 and AMD64 architectures are listed in
   558  <code>cmd/internal/obj/x86/a.out.go</code>.
   559  </p>
   560  
   561  <p>
   562  The architectures share syntax for common addressing modes such as
   563  <code>(R1)</code> (register indirect),
   564  <code>4(R1)</code> (register indirect with offset), and
   565  <code>$foo(SB)</code> (absolute address).
   566  The assembler also supports some (not necessarily all) addressing modes
   567  specific to each architecture.
   568  The sections below list these.
   569  </p>
   570  
   571  <p>
   572  One detail evident in the examples from the previous sections is that data in the instructions flows from left to right:
   573  <code>MOVQ</code> <code>$0,</code> <code>CX</code> clears <code>CX</code>.
   574  This rule applies even on architectures where the conventional notation uses the opposite direction.
   575  </p>
   576  
   577  <p>
   578  Here follow some descriptions of key Go-specific details for the supported architectures.
   579  </p>
   580  
   581  <h3 id="x86">32-bit Intel 386</h3>
   582  
   583  <p>
   584  The runtime pointer to the <code>g</code> structure is maintained
   585  through the value of an otherwise unused (as far as Go is concerned) register in the MMU.
   586  A OS-dependent macro <code>get_tls</code> is defined for the assembler if the source includes
   587  a special header, <code>go_asm.h</code>:
   588  </p>
   589  
   590  <pre>
   591  #include "go_asm.h"
   592  </pre>
   593  
   594  <p>
   595  Within the runtime, the <code>get_tls</code> macro loads its argument register
   596  with a pointer to the <code>g</code> pointer, and the <code>g</code> struct
   597  contains the <code>m</code> pointer.
   598  The sequence to load <code>g</code> and <code>m</code> using <code>CX</code> looks like this:
   599  </p>
   600  
   601  <pre>
   602  get_tls(CX)
   603  MOVL	g(CX), AX     // Move g into AX.
   604  MOVL	g_m(AX), BX   // Move g.m into BX.
   605  </pre>
   606  
   607  <p>
   608  Addressing modes:
   609  </p>
   610  
   611  <ul>
   612  
   613  <li>
   614  <code>(DI)(BX*2)</code>: The location at address <code>DI</code> plus <code>BX*2</code>.
   615  </li>
   616  
   617  <li>
   618  <code>64(DI)(BX*2)</code>: The location at address <code>DI</code> plus <code>BX*2</code> plus 64.
   619  These modes accept only 1, 2, 4, and 8 as scale factors.
   620  </li>
   621  
   622  </ul>
   623  
   624  <p>
   625  When using the compiler and assembler's
   626  <code>-dynlink</code> or <code>-shared</code> modes,
   627  any load or store of a fixed memory location such as a global variable
   628  must be assumed to overwrite <code>CX</code>.
   629  Therefore, to be safe for use with these modes,
   630  assembly sources should typically avoid CX except between memory references.
   631  </p>
   632  
   633  <h3 id="amd64">64-bit Intel 386 (a.k.a. amd64)</h3>
   634  
   635  <p>
   636  The two architectures behave largely the same at the assembler level.
   637  Assembly code to access the <code>m</code> and <code>g</code>
   638  pointers on the 64-bit version is the same as on the 32-bit 386,
   639  except it uses <code>MOVQ</code> rather than <code>MOVL</code>:
   640  </p>
   641  
   642  <pre>
   643  get_tls(CX)
   644  MOVQ	g(CX), AX     // Move g into AX.
   645  MOVQ	g_m(AX), BX   // Move g.m into BX.
   646  </pre>
   647  
   648  <h3 id="arm">ARM</h3>
   649  
   650  <p>
   651  The registers <code>R10</code> and <code>R11</code>
   652  are reserved by the compiler and linker.
   653  </p>
   654  
   655  <p>
   656  <code>R10</code> points to the <code>g</code> (goroutine) structure.
   657  Within assembler source code, this pointer must be referred to as <code>g</code>;
   658  the name <code>R10</code> is not recognized.
   659  </p>
   660  
   661  <p>
   662  To make it easier for people and compilers to write assembly, the ARM linker
   663  allows general addressing forms and pseudo-operations like <code>DIV</code> or <code>MOD</code>
   664  that may not be expressible using a single hardware instruction.
   665  It implements these forms as multiple instructions, often using the <code>R11</code> register
   666  to hold temporary values.
   667  Hand-written assembly can use <code>R11</code>, but doing so requires
   668  being sure that the linker is not also using it to implement any of the other
   669  instructions in the function.
   670  </p>
   671  
   672  <p>
   673  When defining a <code>TEXT</code>, specifying frame size <code>$-4</code>
   674  tells the linker that this is a leaf function that does not need to save <code>LR</code> on entry.
   675  </p>
   676  
   677  <p>
   678  The name <code>SP</code> always refers to the virtual stack pointer described earlier.
   679  For the hardware register, use <code>R13</code>.
   680  </p>
   681  
   682  <p>
   683  Condition code syntax is to append a period and the one- or two-letter code to the instruction,
   684  as in <code>MOVW.EQ</code>.
   685  Multiple codes may be appended: <code>MOVM.IA.W</code>.
   686  The order of the code modifiers is irrelevant.
   687  </p>
   688  
   689  <p>
   690  Addressing modes:
   691  </p>
   692  
   693  <ul>
   694  
   695  <li>
   696  <code>R0-&gt;16</code>
   697  <br>
   698  <code>R0&gt;&gt;16</code>
   699  <br>
   700  <code>R0&lt;&lt;16</code>
   701  <br>
   702  <code>R0@&gt;16</code>:
   703  For <code>&lt;&lt;</code>, left shift <code>R0</code> by 16 bits.
   704  The other codes are <code>-&gt;</code> (arithmetic right shift),
   705  <code>&gt;&gt;</code> (logical right shift), and
   706  <code>@&gt;</code> (rotate right).
   707  </li>
   708  
   709  <li>
   710  <code>R0-&gt;R1</code>
   711  <br>
   712  <code>R0&gt;&gt;R1</code>
   713  <br>
   714  <code>R0&lt;&lt;R1</code>
   715  <br>
   716  <code>R0@&gt;R1</code>:
   717  For <code>&lt;&lt;</code>, left shift <code>R0</code> by the count in <code>R1</code>.
   718  The other codes are <code>-&gt;</code> (arithmetic right shift),
   719  <code>&gt;&gt;</code> (logical right shift), and
   720  <code>@&gt;</code> (rotate right).
   721  
   722  </li>
   723  
   724  <li>
   725  <code>[R0,g,R12-R15]</code>: For multi-register instructions, the set comprising
   726  <code>R0</code>, <code>g</code>, and <code>R12</code> through <code>R15</code> inclusive.
   727  </li>
   728  
   729  <li>
   730  <code>(R5, R6)</code>: Destination register pair.
   731  </li>
   732  
   733  </ul>
   734  
   735  <h3 id="arm64">ARM64</h3>
   736  
   737  <p>
   738  The ARM64 port is in an experimental state.
   739  </p>
   740  
   741  <p>
   742  Instruction modifiers are appended to the instruction following a period.
   743  The only modifiers are <code>P</code> (postincrement) and <code>W</code>
   744  (preincrement):
   745  <code>MOVW.P</code>, <code>MOVW.W</code>
   746  </p>
   747  
   748  <p>
   749  Addressing modes:
   750  </p>
   751  
   752  <ul>
   753  
   754  <li>
   755  <code>(R5, R6)</code>: Register pair for <code>LDP</code>/<code>STP</code>.
   756  </li>
   757  
   758  </ul>
   759  
   760  <h3 id="ppc64">64-bit PowerPC, a.k.a. ppc64</h3>
   761  
   762  <p>
   763  The 64-bit PowerPC port is in an experimental state.
   764  </p>
   765  
   766  <p>
   767  Addressing modes:
   768  </p>
   769  
   770  <ul>
   771  
   772  <li>
   773  <code>(R5)(R6*1)</code>: The location at <code>R5</code> plus <code>R6</code>. It is a scaled
   774  mode as on the x86, but the only scale allowed is <code>1</code>.
   775  </li>
   776  
   777  <li>
   778  <code>(R5+R6)</code>: Alias for (R5)(R6*1)
   779  </li>
   780  
   781  </ul>
   782  
   783  <h3 id="s390x">IBM z/Architecture, a.k.a. s390x</h3>
   784  
   785  <p>
   786  The registers <code>R10</code> and <code>R11</code> are reserved.
   787  The assembler uses them to hold temporary values when assembling some instructions.
   788  </p>
   789  
   790  <p>
   791  <code>R13</code> points to the <code>g</code> (goroutine) structure.
   792  This register must be referred to as <code>g</code>; the name <code>R13</code> is not recognized.
   793  </p>
   794  
   795  <p>
   796  <code>R15</code> points to the stack frame and should typically only be accessed using the
   797  virtual registers <code>SP</code> and <code>FP</code>.
   798  </p>
   799  
   800  <p>
   801  Load- and store-multiple instructions operate on a range of registers.
   802  The range of registers is specified by a start register and an end register.
   803  For example, <code>LMG</code> <code>(R9),</code> <code>R5,</code> <code>R7</code> would load
   804  <code>R5</code>, <code>R6</code> and <code>R7</code> with the 64-bit values at
   805  <code>0(R9)</code>, <code>8(R9)</code> and <code>16(R9)</code> respectively.
   806  </p>
   807  
   808  <p>
   809  Storage-and-storage instructions such as <code>MVC</code> and <code>XC</code> are written
   810  with the length as the first argument.
   811  For example, <code>XC</code> <code>$8,</code> <code>(R9),</code> <code>(R9)</code> would clear
   812  eight bytes at the address specified in <code>R9</code>.
   813  </p>
   814  
   815  <p>
   816  If a vector instruction takes a length or an index as an argument then it will be the
   817  first argument.
   818  For example, <code>VLEIF</code> <code>$1,</code> <code>$16,</code> <code>V2</code> will load
   819  the value sixteen into index one of <code>V2</code>.
   820  Care should be taken when using vector instructions to ensure that they are available at
   821  runtime.
   822  To use vector instructions a machine must have both the vector facility (bit 129 in the
   823  facility list) and kernel support.
   824  Without kernel support a vector instruction will have no effect (it will be equivalent
   825  to a <code>NOP</code> instruction).
   826  </p>
   827  
   828  <p>
   829  Addressing modes:
   830  </p>
   831  
   832  <ul>
   833  
   834  <li>
   835  <code>(R5)(R6*1)</code>: The location at <code>R5</code> plus <code>R6</code>.
   836  It is a scaled mode as on the x86, but the only scale allowed is <code>1</code>.
   837  </li>
   838  
   839  </ul>
   840  
   841  <h3 id="mips">MIPS, MIPS64</h3>
   842  
   843  <p>
   844  General purpose registers are named <code>R0</code> through <code>R31</code>,
   845  floating point registers are <code>F0</code> through <code>F31</code>.
   846  </p>
   847  
   848  <p>
   849  <code>R30</code> is reserved to point to <code>g</code>.
   850  <code>R23</code> is used as a temporary register.
   851  </p>
   852  
   853  <p>
   854  In a <code>TEXT</code> directive, the frame size <code>$-4</code> for MIPS or
   855  <code>$-8</code> for MIPS64 instructs the linker not to save <code>LR</code>.
   856  </p>
   857  
   858  <p>
   859  <code>SP</code> refers to the virtual stack pointer.
   860  For the hardware register, use <code>R29</code>.
   861  </p>
   862  
   863  <p>
   864  Addressing modes:
   865  </p>
   866  
   867  <ul>
   868  
   869  <li>
   870  <code>16(R1)</code>: The location at <code>R1</code> plus 16.
   871  </li>
   872  
   873  <li>
   874  <code>(R1)</code>: Alias for <code>0(R1)</code>.
   875  </li>
   876  
   877  </ul>
   878  
   879  <p>
   880  The value of <code>GOMIPS</code> environment variable (<code>hardfloat</code> or
   881  <code>softfloat</code>) is made available to assembly code by predefining either
   882  <code>GOMIPS_hardfloat</code> or <code>GOMIPS_softfloat</code>.
   883  </p>
   884  
   885  <h3 id="unsupported_opcodes">Unsupported opcodes</h3>
   886  
   887  <p>
   888  The assemblers are designed to support the compiler so not all hardware instructions
   889  are defined for all architectures: if the compiler doesn't generate it, it might not be there.
   890  If you need to use a missing instruction, there are two ways to proceed.
   891  One is to update the assembler to support that instruction, which is straightforward
   892  but only worthwhile if it's likely the instruction will be used again.
   893  Instead, for simple one-off cases, it's possible to use the <code>BYTE</code>
   894  and <code>WORD</code> directives
   895  to lay down explicit data into the instruction stream within a <code>TEXT</code>.
   896  Here's how the 386 runtime defines the 64-bit atomic load function.
   897  </p>
   898  
   899  <pre>
   900  // uint64 atomicload64(uint64 volatile* addr);
   901  // so actually
   902  // void atomicload64(uint64 *res, uint64 volatile *addr);
   903  TEXT runtime·atomicload64(SB), NOSPLIT, $0-12
   904  	MOVL	ptr+0(FP), AX
   905  	TESTL	$7, AX
   906  	JZ	2(PC)
   907  	MOVL	0, AX // crash with nil ptr deref
   908  	LEAL	ret_lo+4(FP), BX
   909  	// MOVQ (%EAX), %MM0
   910  	BYTE $0x0f; BYTE $0x6f; BYTE $0x00
   911  	// MOVQ %MM0, 0(%EBX)
   912  	BYTE $0x0f; BYTE $0x7f; BYTE $0x03
   913  	// EMMS
   914  	BYTE $0x0F; BYTE $0x77
   915  	RET
   916  </pre>