github.com/huandu/go@v0.0.0-20151114150818-04e615e41150/doc/asm.html

github.com/huandu/go@v0.0.0-20151114150818-04e615e41150/doc/asm.html (about)

     1  <!--{
     2  	"Title": "A Quick Guide to Go's Assembler",
     3  	"Path":  "/doc/asm"
     4  }-->
     5  
     6  <h2 id="introduction">A Quick Guide to Go's Assembler</h2>
     7  
     8  <p>
     9  This document is a quick outline of the unusual form of assembly language used by the <code>gc</code> Go compiler.
    10  The document is not comprehensive.
    11  </p>
    12  
    13  <p>
    14  The assembler is based on the input style of the Plan 9 assemblers, which is documented in detail
    15  <a href="http://plan9.bell-labs.com/sys/doc/asm.html">elsewhere</a>.
    16  If you plan to write assembly language, you should read that document although much of it is Plan 9-specific.
    17  The current document provides a summary of the syntax and the differences with
    18  what is explained in that document, and
    19  describes the peculiarities that apply when writing assembly code to interact with Go.
    20  </p>
    21  
    22  <p>
    23  The most important thing to know about Go's assembler is that it is not a direct representation of the underlying machine.
    24  Some of the details map precisely to the machine, but some do not.
    25  This is because the compiler suite (see
    26  <a href="http://plan9.bell-labs.com/sys/doc/compiler.html">this description</a>)
    27  needs no assembler pass in the usual pipeline.
    28  Instead, the compiler operates on a kind of semi-abstract instruction set,
    29  and instruction selection occurs partly after code generation.
    30  The assembler works on the semi-abstract form, so
    31  when you see an instruction like <code>MOV</code>
    32  what the tool chain actually generates for that operation might
    33  not be a move instruction at all, perhaps a clear or load.
    34  Or it might correspond exactly to the machine instruction with that name.
    35  In general, machine-specific operations tend to appear as themselves, while more general concepts like
    36  memory move and subroutine call and return are more abstract.
    37  The details vary with architecture, and we apologize for the imprecision; the situation is not well-defined.
    38  </p>
    39  
    40  <p>
    41  The assembler program is a way to parse a description of that
    42  semi-abstract instruction set and turn it into instructions to be
    43  input to the linker.
    44  If you want to see what the instructions look like in assembly for a given architecture, say amd64, there
    45  are many examples in the sources of the standard library, in packages such as
    46  <a href="/pkg/runtime/"><code>runtime</code></a> and
    47  <a href="/pkg/math/big/"><code>math/big</code></a>.
    48  You can also examine what the compiler emits as assembly code
    49  (the actual output may differ from what you see here):
    50  </p>
    51  
    52  <pre>
    53  $ cat x.go
    54  package main
    55  
    56  func main() {
    57  	println(3)
    58  }
    59  $ GOOS=linux GOARCH=amd64 go tool compile -S x.go        # or: go build -gcflags -S x.go
    60  
    61  --- prog list "main" ---
    62  0000 (x.go:3) TEXT    main+0(SB),$8-0
    63  0001 (x.go:3) FUNCDATA $0,gcargs·0+0(SB)
    64  0002 (x.go:3) FUNCDATA $1,gclocals·0+0(SB)
    65  0003 (x.go:4) MOVQ    $3,(SP)
    66  0004 (x.go:4) PCDATA  $0,$8
    67  0005 (x.go:4) CALL    ,runtime.printint+0(SB)
    68  0006 (x.go:4) PCDATA  $0,$-1
    69  0007 (x.go:4) PCDATA  $0,$0
    70  0008 (x.go:4) CALL    ,runtime.printnl+0(SB)
    71  0009 (x.go:4) PCDATA  $0,$-1
    72  0010 (x.go:5) RET     ,
    73  ...
    74  </pre>
    75  
    76  <p>
    77  The <code>FUNCDATA</code> and <code>PCDATA</code> directives contain information
    78  for use by the garbage collector; they are introduced by the compiler.
    79  </p> 
    80  
    81  <!-- Commenting out because the feature is gone but it's popular and may come back.
    82  
    83  <p>
    84  To see what gets put in the binary after linking, add the <code>-a</code> flag to the linker:
    85  </p>
    86  
    87  <pre>
    88  $ go tool 6l -a x.6        # or: go build -ldflags -a x.go
    89  codeblk [0x2000,0x1d059) at offset 0x1000
    90  002000	main.main            | (3)	TEXT	main.main+0(SB),$8
    91  002000	65488b0c25a0080000   | (3)	MOVQ	2208(GS),CX
    92  002009	483b21               | (3)	CMPQ	SP,(CX)
    93  00200c	7707                 | (3)	JHI	,2015
    94  00200e	e83da20100           | (3)	CALL	,1c250+runtime.morestack00
    95  002013	ebeb                 | (3)	JMP	,2000
    96  002015	4883ec08             | (3)	SUBQ	$8,SP
    97  002019	                     | (3)	FUNCDATA	$0,main.gcargs·0+0(SB)
    98  002019	                     | (3)	FUNCDATA	$1,main.gclocals·0+0(SB)
    99  002019	48c7042403000000     | (4)	MOVQ	$3,(SP)
   100  002021	                     | (4)	PCDATA	$0,$8
   101  002021	e8aad20000           | (4)	CALL	,f2d0+runtime.printint
   102  002026	                     | (4)	PCDATA	$0,$-1
   103  002026	                     | (4)	PCDATA	$0,$0
   104  002026	e865d40000           | (4)	CALL	,f490+runtime.printnl
   105  00202b	                     | (4)	PCDATA	$0,$-1
   106  00202b	4883c408             | (5)	ADDQ	$8,SP
   107  00202f	c3                   | (5)	RET	,
   108  ...
   109  </pre>
   110  
   111  -->
   112  
   113  <h3 id="constants">Constants</h3>
   114  
   115  <p>
   116  Although the assembler takes its guidance from the Plan 9 assemblers,
   117  it is a distinct program, so there are some differences.
   118  One is in constant evaluation.
   119  Constant expressions in the assembler are parsed using Go's operator
   120  precedence, not the C-like precedence of the original.
   121  Thus <code>3&amp;1<<2</code> is 4, not 0—it parses as <code>(3&amp;1)<<2</code>
   122  not <code>3&amp;(1<<2)</code>.
   123  Also, constants are always evaluated as 64-bit unsigned integers.
   124  Thus <code>-2</code> is not the integer value minus two,
   125  but the unsigned 64-bit integer with the same bit pattern.
   126  The distinction rarely matters but
   127  to avoid ambiguity, division or right shift where the right operand's
   128  high bit is set is rejected.
   129  </p>
   130  
   131  <h3 id="symbols">Symbols</h3>
   132  
   133  <p>
   134  Some symbols, such as <code>R1</code> or <code>LR</code>,
   135  are predefined and refer to registers.
   136  The exact set depends on the architecture.
   137  </p>
   138  
   139  <p>
   140  There are four predeclared symbols that refer to pseudo-registers.
   141  These are not real registers, but rather virtual registers maintained by
   142  the tool chain, such as a frame pointer.
   143  The set of pseudo-registers is the same for all architectures:
   144  </p>
   145  
   146  <ul>
   147  
   148  <li>
   149  <code>FP</code>: Frame pointer: arguments and locals.
   150  </li>
   151  
   152  <li>
   153  <code>PC</code>: Program counter:
   154  jumps and branches.
   155  </li>
   156  
   157  <li>
   158  <code>SB</code>: Static base pointer: global symbols.
   159  </li>
   160  
   161  <li>
   162  <code>SP</code>: Stack pointer: top of stack.
   163  </li>
   164  
   165  </ul>
   166  
   167  <p>
   168  All user-defined symbols are written as offsets to the pseudo-registers
   169  <code>FP</code> (arguments and locals) and <code>SB</code> (globals).
   170  </p>
   171  
   172  <p>
   173  The <code>SB</code> pseudo-register can be thought of as the origin of memory, so the symbol <code>foo(SB)</code>
   174  is the name <code>foo</code> as an address in memory.
   175  This form is used to name global functions and data.
   176  Adding <code>&lt;&gt;</code> to the name, as in <span style="white-space: nowrap"><code>foo&lt;&gt;(SB)</code></span>, makes the name
   177  visible only in the current source file, like a top-level <code>static</code> declaration in a C file.
   178  Adding an offset to the name refers to that offset from the symbol's address, so
   179  <code>a+4(SB)</code> is four bytes past the start of <code>foo</code>.
   180  </p>
   181  
   182  <p>
   183  The <code>FP</code> pseudo-register is a virtual frame pointer
   184  used to refer to function arguments.
   185  The compilers maintain a virtual frame pointer and refer to the arguments on the stack as offsets from that pseudo-register.
   186  Thus <code>0(FP)</code> is the first argument to the function,
   187  <code>8(FP)</code> is the second (on a 64-bit machine), and so on.
   188  However, when referring to a function argument this way, it is necessary to place a name
   189  at the beginning, as in <code>first_arg+0(FP)</code> and <code>second_arg+8(FP)</code>.
   190  (The meaning of the offset—offset from the frame pointer—distinct
   191  from its use with <code>SB</code>, where it is an offset from the symbol.)
   192  The assembler enforces this convention, rejecting plain <code>0(FP)</code> and <code>8(FP)</code>.
   193  The actual name is semantically irrelevant but should be used to document
   194  the argument's name.
   195  It is worth stressing that <code>FP</code> is always a
   196  pseudo-register, not a hardware
   197  register, even on architectures with a hardware frame pointer.
   198  </p>
   199  
   200  <p>
   201  For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the argument names
   202  and offsets match.
   203  On 32-bit systems, the low and high 32 bits of a 64-bit value are distinguished by adding
   204  a <code>_lo</code> or <code>_hi</code> suffix to the name, as in <code>arg_lo+0(FP)</code> or <code>arg_hi+4(FP)</code>.
   205  If a Go prototype does not name its result, the expected assembly name is <code>ret</code>.
   206  </p>
   207  
   208  <p>
   209  The <code>SP</code> pseudo-register is a virtual stack pointer
   210  used to refer to frame-local variables and the arguments being
   211  prepared for function calls.
   212  It points to the top of the local stack frame, so references should use negative offsets
   213  in the range [−framesize, 0):
   214  <code>x-8(SP)</code>, <code>y-4(SP)</code>, and so on.
   215  </p>
   216  
   217  <p>
   218  On architectures with a hardware register named <code>SP</code>,
   219  the name prefix distinguishes
   220  references to the virtual stack pointer from references to the architectural
   221  <code>SP</code> register.
   222  That is, <code>x-8(SP)</code> and <code>-8(SP)</code>
   223  are different memory locations:
   224  the first refers to the virtual stack pointer pseudo-register,
   225  while the second refers to the
   226  hardware's <code>SP</code> register.
   227  </p>
   228  
   229  <p>
   230  On machines where <code>SP</code> and <code>PC</code> are
   231  traditionally aliases for a physical, numbered register,
   232  in the Go assembler the names <code>SP</code> and <code>PC</code>
   233  are still treated specially;
   234  for instance, references to <code>SP</code> require a symbol,
   235  much like <code>FP</code>.
   236  To access the actual hardware register use the true <code>R</code> name.
   237  For example, on the ARM architecture the hardware
   238  <code>SP</code> and <code>PC</code> are accessible as
   239  <code>R13</code> and <code>R15</code>.
   240  </p>
   241  
   242  <p>
   243  Branches and direct jumps are always written as offsets to the PC, or as
   244  jumps to labels:
   245  </p>
   246  
   247  <pre>
   248  label:
   249  	MOVW $0, R1
   250  	JMP label
   251  </pre>
   252  
   253  <p>
   254  Each label is visible only within the function in which it is defined.
   255  It is therefore permitted for multiple functions in a file to define
   256  and use the same label names.
   257  Direct jumps and call instructions can target text symbols,
   258  such as <code>name(SB)</code>, but not offsets from symbols,
   259  such as <code>name+4(SB)</code>.
   260  </p>
   261  
   262  <p>
   263  Instructions, registers, and assembler directives are always in UPPER CASE to remind you
   264  that assembly programming is a fraught endeavor.
   265  (Exception: the <code>g</code> register renaming on ARM.)
   266  </p>
   267  
   268  <p>
   269  In Go object files and binaries, the full name of a symbol is the 
   270  package path followed by a period and the symbol name:
   271  <code>fmt.Printf</code> or <code>math/rand.Int</code>.
   272  Because the assembler's parser treats period and slash as punctuation,
   273  those strings cannot be used directly as identifier names.
   274  Instead, the assembler allows the middle dot character U+00B7
   275  and the division slash U+2215 in identifiers and rewrites them to
   276  plain period and slash.
   277  Within an assembler source file, the symbols above are written as
   278  <code>fmt·Printf</code> and <code>math∕rand·Int</code>.
   279  The assembly listings generated by the compilers when using the <code>-S</code> flag
   280  show the period and slash directly instead of the Unicode replacements
   281  required by the assemblers.
   282  </p>
   283  
   284  <p>
   285  Most hand-written assembly files do not include the full package path
   286  in symbol names, because the linker inserts the package path of the current
   287  object file at the beginning of any name starting with a period:
   288  in an assembly source file within the math/rand package implementation,
   289  the package's Int function can be referred to as <code>·Int</code>.
   290  This convention avoids the need to hard-code a package's import path in its
   291  own source code, making it easier to move the code from one location to another.
   292  </p>
   293  
   294  <h3 id="directives">Directives</h3>
   295  
   296  <p>
   297  The assembler uses various directives to bind text and data to symbol names.
   298  For example, here is a simple complete function definition. The <code>TEXT</code>
   299  directive declares the symbol <code>runtime·profileloop</code> and the instructions
   300  that follow form the body of the function.
   301  The last instruction in a <code>TEXT</code> block must be some sort of jump, usually a <code>RET</code> (pseudo-)instruction.
   302  (If it's not, the linker will append a jump-to-itself instruction; there is no fallthrough in <code>TEXTs</code>.)
   303  After the symbol, the arguments are flags (see below)
   304  and the frame size, a constant (but see below):
   305  </p>
   306  
   307  <pre>
   308  TEXT runtime·profileloop(SB),NOSPLIT,$8
   309  	MOVQ	$runtime·profileloop1(SB), CX
   310  	MOVQ	CX, 0(SP)
   311  	CALL	runtime·externalthreadhandler(SB)
   312  	RET
   313  </pre>
   314  
   315  <p>
   316  In the general case, the frame size is followed by an argument size, separated by a minus sign.
   317  (It's not a subtraction, just idiosyncratic syntax.)
   318  The frame size <code>$24-8</code> states that the function has a 24-byte frame
   319  and is called with 8 bytes of argument, which live on the caller's frame.
   320  If <code>NOSPLIT</code> is not specified for the <code>TEXT</code>,
   321  the argument size must be provided.
   322  For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the
   323  argument size is correct.
   324  </p>
   325  
   326  <p>
   327  Note that the symbol name uses a middle dot to separate the components and is specified as an offset from the
   328  static base pseudo-register <code>SB</code>.
   329  This function would be called from Go source for package <code>runtime</code> using the
   330  simple name <code>profileloop</code>.
   331  </p>
   332  
   333  <p>
   334  Global data symbols are defined by a sequence of initializing
   335  <code>DATA</code> directives followed by a <code>GLOBL</code> directive.
   336  Each <code>DATA</code> directive initializes a section of the
   337  corresponding memory.
   338  The memory not explicitly initialized is zeroed.
   339  The general form of the <code>DATA</code> directive is
   340  
   341  <pre>
   342  DATA	symbol+offset(SB)/width, value
   343  </pre>
   344  
   345  <p>
   346  which initializes the symbol memory at the given offset and width with the given value.
   347  The <code>DATA</code> directives for a given symbol must be written with increasing offsets.
   348  </p>
   349  
   350  <p>
   351  The <code>GLOBL</code> directive declares a symbol to be global.
   352  The arguments are optional flags and the size of the data being declared as a global,
   353  which will have initial value all zeros unless a <code>DATA</code> directive
   354  has initialized it.
   355  The <code>GLOBL</code> directive must follow any corresponding <code>DATA</code> directives.
   356  </p>
   357  
   358  <p>
   359  For example,
   360  </p>
   361  
   362  <pre>
   363  DATA divtab&lt;&gt;+0x00(SB)/4, $0xf4f8fcff
   364  DATA divtab&lt;&gt;+0x04(SB)/4, $0xe6eaedf0
   365  ...
   366  DATA divtab&lt;&gt;+0x3c(SB)/4, $0x81828384
   367  GLOBL divtab&lt;&gt;(SB), RODATA, $64
   368  
   369  GLOBL runtime·tlsoffset(SB), NOPTR, $4
   370  </pre>
   371  
   372  <p>
   373  declares and initializes <code>divtab&lt;&gt;</code>, a read-only 64-byte table of 4-byte integer values,
   374  and declares <code>runtime·tlsoffset</code>, a 4-byte, implicitly zeroed variable that
   375  contains no pointers.
   376  </p>
   377  
   378  <p>
   379  There may be one or two arguments to the directives.
   380  If there are two, the first is a bit mask of flags,
   381  which can be written as numeric expressions, added or or-ed together,
   382  or can be set symbolically for easier absorption by a human.
   383  Their values, defined in the standard <code>#include</code>  file <code>textflag.h</code>, are:
   384  </p>
   385  
   386  <ul>
   387  <li>
   388  <code>NOPROF</code> = 1
   389  <br>
   390  (For <code>TEXT</code> items.)
   391  Don't profile the marked function.  This flag is deprecated.
   392  </li>
   393  <li>
   394  <code>DUPOK</code> = 2
   395  <br>
   396  It is legal to have multiple instances of this symbol in a single binary.
   397  The linker will choose one of the duplicates to use.
   398  </li>
   399  <li>
   400  <code>NOSPLIT</code> = 4
   401  <br>
   402  (For <code>TEXT</code> items.)
   403  Don't insert the preamble to check if the stack must be split.
   404  The frame for the routine, plus anything it calls, must fit in the
   405  spare space at the top of the stack segment.
   406  Used to protect routines such as the stack splitting code itself.
   407  </li>
   408  <li>
   409  <code>RODATA</code> = 8
   410  <br>
   411  (For <code>DATA</code> and <code>GLOBL</code> items.)
   412  Put this data in a read-only section.
   413  </li>
   414  <li>
   415  <code>NOPTR</code> = 16
   416  <br>
   417  (For <code>DATA</code> and <code>GLOBL</code> items.)
   418  This data contains no pointers and therefore does not need to be
   419  scanned by the garbage collector.
   420  </li>
   421  <li>
   422  <code>WRAPPER</code> = 32
   423  <br>
   424  (For <code>TEXT</code> items.)
   425  This is a wrapper function and should not count as disabling <code>recover</code>.
   426  </li>
   427  <li>
   428  <code>NEEDCTXT</code> = 64
   429  <br>
   430  (For <code>TEXT</code> items.)
   431  This function is a closure so it uses its incoming context register.
   432  </li>
   433  </ul>
   434  
   435  <h3 id="runtime">Runtime Coordination</h3>
   436  
   437  <p>
   438  For garbage collection to run correctly, the runtime must know the
   439  location of pointers in all global data and in most stack frames.
   440  The Go compiler emits this information when compiling Go source files,
   441  but assembly programs must define it explicitly.
   442  </p>
   443  
   444  <p>
   445  A data symbol marked with the <code>NOPTR</code> flag (see above)
   446  is treated as containing no pointers to runtime-allocated data.
   447  A data symbol with the <code>RODATA</code> flag
   448  is allocated in read-only memory and is therefore treated
   449  as implicitly marked <code>NOPTR</code>.
   450  A data symbol with a total size smaller than a pointer
   451  is also treated as implicitly marked <code>NOPTR</code>.
   452  It is not possible to define a symbol containing pointers in an assembly source file;
   453  such a symbol must be defined in a Go source file instead.
   454  Assembly source can still refer to the symbol by name
   455  even without <code>DATA</code> and <code>GLOBL</code> directives.
   456  A good general rule of thumb is to define all non-<code>RODATA</code>
   457  symbols in Go instead of in assembly.
   458  </p>
   459  
   460  <p>
   461  Each function also needs annotations giving the location of
   462  live pointers in its arguments, results, and local stack frame.
   463  For an assembly function with no pointer results and
   464  either no local stack frame or no function calls,
   465  the only requirement is to define a Go prototype for the function
   466  in a Go source file in the same package. The name of the assembly
   467  function must not contain the package name component (for example,
   468  function <code>Syscall</code> in package <code>syscall</code> should
   469  use the name <code>·Syscall</code> instead of the equivalent name
   470  <code>syscall·Syscall</code> in its <code>TEXT</code> directive).
   471  For more complex situations, explicit annotation is needed.
   472  These annotations use pseudo-instructions defined in the standard
   473  <code>#include</code> file <code>funcdata.h</code>.
   474  </p>
   475  
   476  <p>
   477  If a function has no arguments and no results,
   478  the pointer information can be omitted.
   479  This is indicated by an argument size annotation of <code>$<i>n</i>-0</code>
   480  on the <code>TEXT</code> instruction.
   481  Otherwise, pointer information must be provided by
   482  a Go prototype for the function in a Go source file,
   483  even for assembly functions not called directly from Go.
   484  (The prototype will also let <code>go</code> <code>vet</code> check the argument references.)
   485  At the start of the function, the arguments are assumed
   486  to be initialized but the results are assumed uninitialized.
   487  If the results will hold live pointers during a call instruction,
   488  the function should start by zeroing the results and then 
   489  executing the pseudo-instruction <code>GO_RESULTS_INITIALIZED</code>.
   490  This instruction records that the results are now initialized
   491  and should be scanned during stack movement and garbage collection.
   492  It is typically easier to arrange that assembly functions do not
   493  return pointers or do not contain call instructions;
   494  no assembly functions in the standard library use
   495  <code>GO_RESULTS_INITIALIZED</code>.
   496  </p>
   497  
   498  <p>
   499  If a function has no local stack frame,
   500  the pointer information can be omitted.
   501  This is indicated by a local frame size annotation of <code>$0-<i>n</i></code>
   502  on the <code>TEXT</code> instruction.
   503  The pointer information can also be omitted if the
   504  function contains no call instructions.
   505  Otherwise, the local stack frame must not contain pointers,
   506  and the assembly must confirm this fact by executing the 
   507  pseudo-instruction <code>NO_LOCAL_POINTERS</code>.
   508  Because stack resizing is implemented by moving the stack,
   509  the stack pointer may change during any function call:
   510  even pointers to stack data must not be kept in local variables.
   511  </p>
   512  
   513  <h2 id="architectures">Architecture-specific details</h2>
   514  
   515  <p>
   516  It is impractical to list all the instructions and other details for each machine.
   517  To see what instructions are defined for a given machine, say ARM,
   518  look in the source for the <code>obj</code> support library for
   519  that architecture, located in the directory <code>src/cmd/internal/obj/arm</code>.
   520  In that directory is a file <code>a.out.go</code>; it contains
   521  a long list of constants starting with <code>A</code>, like this:
   522  </p>
   523  
   524  <pre>
   525  const (
   526  	AAND = obj.ABaseARM + obj.A_ARCHSPECIFIC + iota
   527  	AEOR
   528  	ASUB
   529  	ARSB
   530  	AADD
   531  	...
   532  </pre>
   533  
   534  <p>
   535  This is the list of instructions and their spellings as known to the assembler and linker for that architecture.
   536  Each instruction begins with an initial capital <code>A</code> in this list, so <code>AAND</code>
   537  represents the bitwise and instruction,
   538  <code>AND</code> (without the leading <code>A</code>),
   539  and is written in assembly source as <code>AND</code>.
   540  The enumeration is mostly in alphabetical order.
   541  (The architecture-independent <code>AXXX</code>, defined in the
   542  <code>cmd/internal/obj</code> package,
   543  represents an invalid instruction).
   544  The sequence of the <code>A</code> names has nothing to do with the actual
   545  encoding of the machine instructions.
   546  The <code>cmd/internal/obj</code> package takes care of that detail.
   547  </p>
   548  
   549  <p>
   550  The instructions for both the 386 and AMD64 architectures are listed in
   551  <code>cmd/internal/obj/x86/a.out.go</code>.
   552  </p>
   553  
   554  <p>
   555  The architectures share syntax for common addressing modes such as
   556  <code>(R1)</code> (register indirect),
   557  <code>4(R1)</code> (register indirect with offset), and
   558  <code>$foo(SB)</code> (absolute address).
   559  The assembler also supports some (not necessarily all) addressing modes
   560  specific to each architecture.
   561  The sections below list these.
   562  </p>
   563  
   564  <p>
   565  One detail evident in the examples from the previous sections is that data in the instructions flows from left to right:
   566  <code>MOVQ</code> <code>$0,</code> <code>CX</code> clears <code>CX</code>.
   567  This rule applies even on architectures where the conventional notation uses the opposite direction.
   568  </p>
   569  
   570  <p>
   571  Here follow some descriptions of key Go-specific details for the supported architectures.
   572  </p>
   573  
   574  <h3 id="x86">32-bit Intel 386</h3>
   575  
   576  <p>
   577  The runtime pointer to the <code>g</code> structure is maintained
   578  through the value of an otherwise unused (as far as Go is concerned) register in the MMU.
   579  A OS-dependent macro <code>get_tls</code> is defined for the assembler if the source includes
   580  a special header, <code>go_asm.h</code>:
   581  </p>
   582  
   583  <pre>
   584  #include "go_asm.h"
   585  </pre>
   586  
   587  <p>
   588  Within the runtime, the <code>get_tls</code> macro loads its argument register
   589  with a pointer to the <code>g</code> pointer, and the <code>g</code> struct
   590  contains the <code>m</code> pointer.
   591  The sequence to load <code>g</code> and <code>m</code> using <code>CX</code> looks like this:
   592  </p>
   593  
   594  <pre>
   595  get_tls(CX)
   596  MOVL	g(CX), AX     // Move g into AX.
   597  MOVL	g_m(AX), BX   // Move g.m into BX.
   598  </pre>
   599  
   600  <p>
   601  Addressing modes:
   602  </p>
   603  
   604  <ul>
   605  
   606  <li>
   607  <code>(DI)(BX*2)</code>: The location at address <code>DI</code> plus <code>BX*2</code>.
   608  </li>
   609  
   610  <li>
   611  <code>64(DI)(BX*2)</code>: The location at address <code>DI</code> plus <code>BX*2</code> plus 64.
   612  These modes accept only 1, 2, 4, and 8 as scale factors.
   613  </li>
   614  
   615  </ul>
   616  
   617  <h3 id="amd64">64-bit Intel 386 (a.k.a. amd64)</h3>
   618  
   619  <p>
   620  The two architectures behave largely the same at the assembler level.
   621  Assembly code to access the <code>m</code> and <code>g</code>
   622  pointers on the 64-bit version is the same as on the 32-bit 386,
   623  except it uses <code>MOVQ</code> rather than <code>MOVL</code>:
   624  </p>
   625  
   626  <pre>
   627  get_tls(CX)
   628  MOVQ	g(CX), AX     // Move g into AX.
   629  MOVQ	g_m(AX), BX   // Move g.m into BX.
   630  </pre>
   631  
   632  <h3 id="arm">ARM</h3>
   633  
   634  <p>
   635  The registers <code>R10</code> and <code>R11</code>
   636  are reserved by the compiler and linker.
   637  </p>
   638  
   639  <p>
   640  <code>R10</code> points to the <code>g</code> (goroutine) structure.
   641  Within assembler source code, this pointer must be referred to as <code>g</code>;
   642  the name <code>R10</code> is not recognized.
   643  </p>
   644  
   645  <p>
   646  To make it easier for people and compilers to write assembly, the ARM linker
   647  allows general addressing forms and pseudo-operations like <code>DIV</code> or <code>MOD</code>
   648  that may not be expressible using a single hardware instruction.
   649  It implements these forms as multiple instructions, often using the <code>R11</code> register
   650  to hold temporary values.
   651  Hand-written assembly can use <code>R11</code>, but doing so requires
   652  being sure that the linker is not also using it to implement any of the other
   653  instructions in the function.
   654  </p>
   655  
   656  <p>
   657  When defining a <code>TEXT</code>, specifying frame size <code>$-4</code>
   658  tells the linker that this is a leaf function that does not need to save <code>LR</code> on entry.
   659  </p>
   660  
   661  <p>
   662  The name <code>SP</code> always refers to the virtual stack pointer described earlier.
   663  For the hardware register, use <code>R13</code>.
   664  </p>
   665  
   666  <p>
   667  Condition code syntax is to append a period and the one- or two-letter code to the instruction,
   668  as in <code>MOVW.EQ</code>.
   669  Multiple codes may be appended: <code>MOVM.IA.W</code>.
   670  The order of the code modifiers is irrelevant.
   671  </p>
   672  
   673  <p>
   674  Addressing modes:
   675  </p>
   676  
   677  <ul>
   678  
   679  <li>
   680  <code>R0-&gt;16</code>
   681  <br>
   682  <code>R0&gt;&gt;16</code>
   683  <br>
   684  <code>R0&lt;&lt;16</code>
   685  <br>
   686  <code>R0@&gt;16</code>:
   687  For <code>&lt;&lt;</code>, left shift <code>R0</code> by 16 bits.
   688  The other codes are <code>-&gt;</code> (arithmetic right shift),
   689  <code>&gt;&gt;</code> (logical right shift), and
   690  <code>@&gt;</code> (rotate right).
   691  </li>
   692  
   693  <li>
   694  <code>R0-&gt;R1</code>
   695  <br>
   696  <code>R0&gt;&gt;R1</code>
   697  <br>
   698  <code>R0&lt;&lt;R1</code>
   699  <br>
   700  <code>R0@&gt;R1</code>:
   701  For <code>&lt;&lt;</code>, left shift <code>R0</code> by the count in <code>R1</code>.
   702  The other codes are <code>-&gt;</code> (arithmetic right shift),
   703  <code>&gt;&gt;</code> (logical right shift), and
   704  <code>@&gt;</code> (rotate right).
   705  
   706  </li>
   707  
   708  <li>
   709  <code>[R0,g,R12-R15]</code>: For multi-register instructions, the set comprising
   710  <code>R0</code>, <code>g</code>, and <code>R12</code> through <code>R15</code> inclusive.
   711  </li>
   712  
   713  <li>
   714  <code>(R5, R6)</code>: Destination register pair.
   715  </li>
   716  
   717  </ul>
   718  
   719  <h3 id="arm64">ARM64</h3>
   720  
   721  <p>
   722  The ARM64 port is in an experimental state.
   723  </p>
   724  
   725  <p>
   726  Instruction modifiers are appended to the instruction following a period.
   727  The only modifiers are <code>P</code> (postincrement) and <code>W</code>
   728  (preincrement):
   729  <code>MOVW.P</code>, <code>MOVW.W</code>
   730  </p>
   731  
   732  <p>
   733  Addressing modes:
   734  </p>
   735  
   736  <ul>
   737  
   738  <li>
   739  <code>(R5, R6)</code>: Register pair for <code>LDP</code>/<code>STP</code>.
   740  </li>
   741  
   742  </ul>
   743  
   744  <h3 id="ppc64">64-bit PowerPC, a.k.a. ppc64</h3>
   745  
   746  <p>
   747  The 64-bit PowerPC port is in an experimental state.
   748  </p>
   749  
   750  <p>
   751  Addressing modes:
   752  </p>
   753  
   754  <ul>
   755  
   756  <li>
   757  <code>(R5)(R6*1)</code>: The location at <code>R5</code> plus <code>R6</code>. It is a scaled
   758  mode as on the x86, but the only scale allowed is <code>1</code>.
   759  </li>
   760  
   761  <li>
   762  <code>(R5+R6)</code>: Alias for (R5)(R6*1)
   763  </li>
   764  
   765  </ul>
   766  
   767  <h3 id="unsupported_opcodes">Unsupported opcodes</h3>
   768  
   769  <p>
   770  The assemblers are designed to support the compiler so not all hardware instructions
   771  are defined for all architectures: if the compiler doesn't generate it, it might not be there.
   772  If you need to use a missing instruction, there are two ways to proceed.
   773  One is to update the assembler to support that instruction, which is straightforward
   774  but only worthwhile if it's likely the instruction will be used again.
   775  Instead, for simple one-off cases, it's possible to use the <code>BYTE</code>
   776  and <code>WORD</code> directives
   777  to lay down explicit data into the instruction stream within a <code>TEXT</code>.
   778  Here's how the 386 runtime defines the 64-bit atomic load function.
   779  </p>
   780  
   781  <pre>
   782  // uint64 atomicload64(uint64 volatile* addr);
   783  // so actually
   784  // void atomicload64(uint64 *res, uint64 volatile *addr);
   785  TEXT runtime·atomicload64(SB), NOSPLIT, $0-12
   786  	MOVL	ptr+0(FP), AX
   787  	TESTL	$7, AX
   788  	JZ	2(PC)
   789  	MOVL	0, AX // crash with nil ptr deref
   790  	LEAL	ret_lo+4(FP), BX
   791  	// MOVQ (%EAX), %MM0
   792  	BYTE $0x0f; BYTE $0x6f; BYTE $0x00
   793  	// MOVQ %MM0, 0(%EBX)
   794  	BYTE $0x0f; BYTE $0x7f; BYTE $0x03
   795  	// EMMS
   796  	BYTE $0x0F; BYTE $0x77
   797  	RET
   798  </pre>