github.com/razvanm/vanadium-go-1.3@v0.0.0-20160721203343-4a65068e5915/doc/asm.html (about)

     1  <!--{
     2  	"Title": "A Quick Guide to Go's Assembler",
     3  	"Path":  "/doc/asm"
     4  }-->
     5  
     6  <h2 id="introduction">A Quick Guide to Go's Assembler</h2>
     7  
     8  <p>
     9  This document is a quick outline of the unusual form of assembly language used by the <code>gc</code>
    10  suite of Go compilers (<code>6g</code>, <code>8g</code>, etc.).
    11  The document is not comprehensive.
    12  </p>
    13  
    14  <p>
    15  The assembler is based on the input to the Plan 9 assemblers, which is documented in detail
    16  <a href="http://plan9.bell-labs.com/sys/doc/asm.html">on the Plan 9 site</a>.
    17  If you plan to write assembly language, you should read that document although much of it is Plan 9-specific.
    18  This document provides a summary of the syntax and
    19  describes the peculiarities that apply when writing assembly code to interact with Go.
    20  </p>
    21  
    22  <p>
    23  The most important thing to know about Go's assembler is that it is not a direct representation of the underlying machine.
    24  Some of the details map precisely to the machine, but some do not.
    25  This is because the compiler suite (see
    26  <a href="http://plan9.bell-labs.com/sys/doc/compiler.html">this description</a>)
    27  needs no assembler pass in the usual pipeline.
    28  Instead, the compiler emits a kind of incompletely defined instruction set, in binary form, which the linker
    29  then completes.
    30  In particular, the linker does instruction selection, so when you see an instruction like <code>MOV</code>
    31  what the linker actually generates for that operation might not be a move instruction at all, perhaps a clear or load.
    32  Or it might correspond exactly to the machine instruction with that name.
    33  In general, machine-specific operations tend to appear as themselves, while more general concepts like
    34  memory move and subroutine call and return are more abstract.
    35  The details vary with architecture, and we apologize for the imprecision; the situation is not well-defined.
    36  </p>
    37  
    38  <p>
    39  The assembler program is a way to generate that intermediate, incompletely defined instruction sequence
    40  as input for the linker.
    41  If you want to see what the instructions look like in assembly for a given architecture, say amd64, there
    42  are many examples in the sources of the standard library, in packages such as
    43  <a href="/pkg/runtime/"><code>runtime</code></a> and
    44  <a href="/pkg/math/big/"><code>math/big</code></a>.
    45  You can also examine what the compiler emits as assembly code:
    46  </p>
    47  
    48  <pre>
    49  $ cat x.go
    50  package main
    51  
    52  func main() {
    53  	println(3)
    54  }
    55  $ go tool 6g -S x.go        # or: go build -gcflags -S x.go
    56  
    57  --- prog list "main" ---
    58  0000 (x.go:3) TEXT    main+0(SB),$8-0
    59  0001 (x.go:3) FUNCDATA $0,gcargs·0+0(SB)
    60  0002 (x.go:3) FUNCDATA $1,gclocals·0+0(SB)
    61  0003 (x.go:4) MOVQ    $3,(SP)
    62  0004 (x.go:4) PCDATA  $0,$8
    63  0005 (x.go:4) CALL    ,runtime.printint+0(SB)
    64  0006 (x.go:4) PCDATA  $0,$-1
    65  0007 (x.go:4) PCDATA  $0,$0
    66  0008 (x.go:4) CALL    ,runtime.printnl+0(SB)
    67  0009 (x.go:4) PCDATA  $0,$-1
    68  0010 (x.go:5) RET     ,
    69  ...
    70  </pre>
    71  
    72  <p>
    73  The <code>FUNCDATA</code> and <code>PCDATA</code> directives contain information
    74  for use by the garbage collector; they are introduced by the compiler.
    75  </p> 
    76  
    77  <!-- Commenting out because the feature is gone but it's popular and may come back.
    78  
    79  <p>
    80  To see what gets put in the binary after linking, add the <code>-a</code> flag to the linker:
    81  </p>
    82  
    83  <pre>
    84  $ go tool 6l -a x.6        # or: go build -ldflags -a x.go
    85  codeblk [0x2000,0x1d059) at offset 0x1000
    86  002000	main.main            | (3)	TEXT	main.main+0(SB),$8
    87  002000	65488b0c25a0080000   | (3)	MOVQ	2208(GS),CX
    88  002009	483b21               | (3)	CMPQ	SP,(CX)
    89  00200c	7707                 | (3)	JHI	,2015
    90  00200e	e83da20100           | (3)	CALL	,1c250+runtime.morestack00
    91  002013	ebeb                 | (3)	JMP	,2000
    92  002015	4883ec08             | (3)	SUBQ	$8,SP
    93  002019	                     | (3)	FUNCDATA	$0,main.gcargs·0+0(SB)
    94  002019	                     | (3)	FUNCDATA	$1,main.gclocals·0+0(SB)
    95  002019	48c7042403000000     | (4)	MOVQ	$3,(SP)
    96  002021	                     | (4)	PCDATA	$0,$8
    97  002021	e8aad20000           | (4)	CALL	,f2d0+runtime.printint
    98  002026	                     | (4)	PCDATA	$0,$-1
    99  002026	                     | (4)	PCDATA	$0,$0
   100  002026	e865d40000           | (4)	CALL	,f490+runtime.printnl
   101  00202b	                     | (4)	PCDATA	$0,$-1
   102  00202b	4883c408             | (5)	ADDQ	$8,SP
   103  00202f	c3                   | (5)	RET	,
   104  ...
   105  </pre>
   106  
   107  -->
   108  
   109  <h3 id="symbols">Symbols</h3>
   110  
   111  <p>
   112  Some symbols, such as <code>PC</code>, <code>R0</code> and <code>SP</code>, are predeclared and refer to registers.
   113  There are two other predeclared symbols, <code>SB</code> (static base) and <code>FP</code> (frame pointer).
   114  All user-defined symbols other than jump labels are written as offsets to these pseudo-registers.
   115  </p>
   116  
   117  <p>
   118  The <code>SB</code> pseudo-register can be thought of as the origin of memory, so the symbol <code>foo(SB)</code>
   119  is the name <code>foo</code> as an address in memory.
   120  </p>
   121  
   122  <p>
   123  The <code>FP</code> pseudo-register is a virtual frame pointer
   124  used to refer to function arguments.
   125  The compilers maintain a virtual frame pointer and refer to the arguments on the stack as offsets from that pseudo-register.
   126  Thus <code>0(FP)</code> is the first argument to the function,
   127  <code>8(FP)</code> is the second (on a 64-bit machine), and so on.
   128  When referring to a function argument this way, it is conventional to place the name
   129  at the beginning, as in <code>first_arg+0(FP)</code> and <code>second_arg+8(FP)</code>.
   130  Some of the assemblers enforce this convention, rejecting plain <code>0(FP)</code> and <code>8(FP)</code>.
   131  For assembly functions with Go prototypes, <code>go vet</code> will check that the argument names
   132  and offsets match.
   133  </p>
   134  
   135  <p>
   136  The <code>SP</code> pseudo-register is a virtual stack pointer
   137  used to refer to frame-local variables and the arguments being
   138  prepared for function calls.
   139  It points to the top of the local stack frame, so references should use negative offsets
   140  in the range [−framesize, 0):
   141  <code>x-8(SP)</code>, <code>y-4(SP)</code>, and so on.
   142  On architectures with a real register named <code>SP</code>, the name prefix distinguishes
   143  references to the virtual stack pointer from references to the architectural <code>SP</code> register.
   144  That is, <code>x-8(SP)</code> and <code>-8(SP)</code> are different memory locations:
   145  the first refers to the virtual stack pointer pseudo-register, while the second refers to the
   146  hardware's <code>SP</code> register.
   147  </p>
   148  
   149  <p>
   150  Instructions, registers, and assembler directives are always in UPPER CASE to remind you
   151  that assembly programming is a fraught endeavor.
   152  (Exception: the <code>g</code> register renaming on ARM.)
   153  </p>
   154  
   155  <p>
   156  In Go object files and binaries, the full name of a symbol is the 
   157  package path followed by a period and the symbol name:
   158  <code>fmt.Printf</code> or <code>math/rand.Int</code>.
   159  Because the assembler's parser treats period and slash as punctuation,
   160  those strings cannot be used directly as identifier names.
   161  Instead, the assembler allows the middle dot character U+00B7
   162  and the division slash U+2215 in identifiers and rewrites them to
   163  plain period and slash.
   164  Within an assembler source file, the symbols above are written as
   165  <code>fmt·Printf</code> and <code>math∕rand·Int</code>.
   166  The assembly listings generated by the compilers when using the <code>-S</code> flag
   167  show the period and slash directly instead of the Unicode replacements
   168  required by the assemblers.
   169  </p>
   170  
   171  <p>
   172  Most hand-written assembly files do not include the full package path
   173  in symbol names, because the linker inserts the package path of the current
   174  object file at the beginning of any name starting with a period:
   175  in an assembly source file within the math/rand package implementation,
   176  the package's Int function can be referred to as <code>·Int</code>.
   177  This convention avoids the need to hard-code a package's import path in its
   178  own source code, making it easier to move the code from one location to another.
   179  </p>
   180  
   181  <h3 id="directives">Directives</h3>
   182  
   183  <p>
   184  The assembler uses various directives to bind text and data to symbol names.
   185  For example, here is a simple complete function definition. The <code>TEXT</code>
   186  directive declares the symbol <code>runtime·profileloop</code> and the instructions
   187  that follow form the body of the function.
   188  The last instruction in a <code>TEXT</code> block must be some sort of jump, usually a <code>RET</code> (pseudo-)instruction.
   189  (If it's not, the linker will append a jump-to-itself instruction; there is no fallthrough in <code>TEXTs</code>.)
   190  After the symbol, the arguments are flags (see below)
   191  and the frame size, a constant (but see below):
   192  </p>
   193  
   194  <pre>
   195  TEXT runtime·profileloop(SB),NOSPLIT,$8
   196  	MOVQ	$runtime·profileloop1(SB), CX
   197  	MOVQ	CX, 0(SP)
   198  	CALL	runtime·externalthreadhandler(SB)
   199  	RET
   200  </pre>
   201  
   202  <p>
   203  In the general case, the frame size is followed by an argument size, separated by a minus sign.
   204  (It's not a subtraction, just idiosyncratic syntax.)
   205  The frame size <code>$24-8</code> states that the function has a 24-byte frame
   206  and is called with 8 bytes of argument, which live on the caller's frame.
   207  If <code>NOSPLIT</code> is not specified for the <code>TEXT</code>,
   208  the argument size must be provided.
   209  </p>
   210  
   211  <p>
   212  Note that the symbol name uses a middle dot to separate the components and is specified as an offset from the
   213  static base pseudo-register <code>SB</code>.
   214  This function would be called from Go source for package <code>runtime</code> using the
   215  simple name <code>profileloop</code>.
   216  </p>
   217  
   218  <p>
   219  For <code>DATA</code> directives, the symbol is followed by a slash and the number
   220  of bytes the memory associated with the symbol occupies.
   221  The arguments are optional flags and the data itself.
   222  For instance,
   223  </p>
   224  
   225  <pre>
   226  DATA  runtime·isplan9(SB)/4, $1
   227  </pre>
   228  
   229  <p>
   230  declares the local symbol <code>runtime·isplan9</code> of size 4 and value 1.
   231  Again the symbol has the middle dot and is offset from <code>SB</code>.
   232  </p>
   233  
   234  <p>
   235  The <code>GLOBL</code> directive declares a symbol to be global.
   236  The arguments are optional flags and the size of the data being declared as a global,
   237  which will have initial value all zeros unless a <code>DATA</code> directive
   238  has initialized it.
   239  The <code>GLOBL</code> directive must follow any corresponding <code>DATA</code> directives.
   240  This example
   241  </p>
   242  
   243  <pre>
   244  GLOBL runtime·tlsoffset(SB),$4
   245  </pre>
   246  
   247  <p>
   248  declares <code>runtime·tlsoffset</code> to have size 4.
   249  </p>
   250  
   251  <p>
   252  There may be one or two arguments to the directives.
   253  If there are two, the first is a bit mask of flags,
   254  which can be written as numeric expressions, added or or-ed together,
   255  or can be set symbolically for easier absorption by a human.
   256  Their values, defined in the standard <code>#include</code>  file <code>textflag.h</code>, are:
   257  </p>
   258  
   259  <ul>
   260  <li>
   261  <code>NOPROF</code> = 1
   262  <br>
   263  (For <code>TEXT</code> items.)
   264  Don't profile the marked function.  This flag is deprecated.
   265  </li>
   266  <li>
   267  <code>DUPOK</code> = 2
   268  <br>
   269  It is legal to have multiple instances of this symbol in a single binary.
   270  The linker will choose one of the duplicates to use.
   271  </li>
   272  <li>
   273  <code>NOSPLIT</code> = 4
   274  <br>
   275  (For <code>TEXT</code> items.)
   276  Don't insert the preamble to check if the stack must be split.
   277  The frame for the routine, plus anything it calls, must fit in the
   278  spare space at the top of the stack segment.
   279  Used to protect routines such as the stack splitting code itself.
   280  </li>
   281  <li>
   282  <code>RODATA</code> = 8
   283  <br>
   284  (For <code>DATA</code> and <code>GLOBL</code> items.)
   285  Put this data in a read-only section.
   286  </li>
   287  <li>
   288  <code>NOPTR</code> = 16
   289  <br>
   290  (For <code>DATA</code> and <code>GLOBL</code> items.)
   291  This data contains no pointers and therefore does not need to be
   292  scanned by the garbage collector.
   293  </li>
   294  <li>
   295  <code>WRAPPER</code>  = 32
   296  <br>
   297  (For <code>TEXT</code> items.)
   298  This is a wrapper function and should not count as disabling <code>recover</code>.
   299  </li>
   300  </ul>
   301  
   302  <h2 id="architectures">Architecture-specific details</h2>
   303  
   304  <p>
   305  It is impractical to list all the instructions and other details for each machine.
   306  To see what instructions are defined for a given machine, say 32-bit Intel x86,
   307  look in the top-level header file for the corresponding linker, in this case <code>8l</code>.
   308  That is, the file <code>$GOROOT/src/cmd/8l/8.out.h</code> contains a C enumeration, called <code>as</code>,
   309  of the instructions and their spellings as known to the assembler and linker for that architecture.
   310  In that file you'll find a declaration that begins
   311  </p>
   312  
   313  <pre>
   314  enum	as
   315  {
   316  	AXXX,
   317  	AAAA,
   318  	AAAD,
   319  	AAAM,
   320  	AAAS,
   321  	AADCB,
   322  	...
   323  </pre>
   324  
   325  <p>
   326  Each instruction begins with a  initial capital <code>A</code> in this list, so <code>AADCB</code>
   327  represents the <code>ADCB</code> (add carry byte) instruction.
   328  The enumeration is in alphabetical order, plus some late additions (<code>AXXX</code> occupies
   329  the zero slot as an invalid instruction).
   330  The sequence has nothing to do with the actual encoding of the machine instructions.
   331  Again, the linker takes care of that detail.
   332  </p>
   333  
   334  <p>
   335  One detail evident in the examples from the previous sections is that data in the instructions flows from left to right:
   336  <code>MOVQ</code> <code>$0,</code> <code>CX</code> clears <code>CX</code>.
   337  This convention applies even on architectures where the usual mode is the opposite direction.
   338  </p>
   339  
   340  <p>
   341  Here follows some descriptions of key Go-specific details for the supported architectures.
   342  </p>
   343  
   344  <h3 id="x86">32-bit Intel 386</h3>
   345  
   346  <p>
   347  The runtime pointer to the <code>g</code> structure is maintained
   348  through the value of an otherwise unused (as far as Go is concerned) register in the MMU.
   349  A OS-dependent macro <code>get_tls</code> is defined for the assembler if the source includes
   350  an architecture-dependent header file, like this:
   351  </p>
   352  
   353  <pre>
   354  #include "zasm_GOOS_GOARCH.h"
   355  </pre>
   356  
   357  <p>
   358  Within the runtime, the <code>get_tls</code> macro loads its argument register
   359  with a pointer to the <code>g</code> pointer, and the <code>g</code> struct
   360  contains the <code>m</code> pointer.
   361  The sequence to load <code>g</code> and <code>m</code> using <code>CX</code> looks like this:
   362  </p>
   363  
   364  <pre>
   365  get_tls(CX)
   366  MOVL	g(CX), AX     // Move g into AX.
   367  MOVL	g_m(AX), BX   // Move g->m into BX.
   368  </pre>
   369  
   370  <h3 id="amd64">64-bit Intel 386 (a.k.a. amd64)</h3>
   371  
   372  <p>
   373  The assembly code to access the <code>m</code> and <code>g</code>
   374  pointers is the same as on the 386, except it uses <code>MOVQ</code> rather than
   375  <code>MOVL</code>:
   376  </p>
   377  
   378  <pre>
   379  get_tls(CX)
   380  MOVQ	g(CX), AX     // Move g into AX.
   381  MOVQ	g_m(AX), BX   // Move g->m into BX.
   382  </pre>
   383  
   384  <h3 id="arm">ARM</h3>
   385  
   386  <p>
   387  The registers <code>R10</code> and <code>R11</code>
   388  are reserved by the compiler and linker.
   389  </p>
   390  
   391  <p>
   392  <code>R10</code> points to the <code>g</code> (goroutine) structure.
   393  Within assembler source code, this pointer must be referred to as <code>g</code>;
   394  the name <code>R10</code> is not recognized.
   395  </p>
   396  
   397  <p>
   398  To make it easier for people and compilers to write assembly, the ARM linker
   399  allows general addressing forms and pseudo-operations like <code>DIV</code> or <code>MOD</code>
   400  that may not be expressible using a single hardware instruction.
   401  It implements these forms as multiple instructions, often using the <code>R11</code> register
   402  to hold temporary values.
   403  Hand-written assembly can use <code>R11</code>, but doing so requires
   404  being sure that the linker is not also using it to implement any of the other
   405  instructions in the function.
   406  </p>
   407  
   408  <p>
   409  When defining a <code>TEXT</code>, specifying frame size <code>$-4</code>
   410  tells the linker that this is a leaf function that does not need to save <code>LR</code> on entry.
   411  </p>
   412  
   413  <p>
   414  The name <code>SP</code> always refers to the virtual stack pointer described earlier.
   415  For the hardware register, use <code>R13</code>.
   416  </p>
   417  
   418  <h3 id="unsupported_opcodes">Unsupported opcodes</h3>
   419  
   420  <p>
   421  The assemblers are designed to support the compiler so not all hardware instructions
   422  are defined for all architectures: if the compiler doesn't generate it, it might not be there.
   423  If you need to use a missing instruction, there are two ways to proceed.
   424  One is to update the assembler to support that instruction, which is straightforward
   425  but only worthwhile if it's likely the instruction will be used again.
   426  Instead, for simple one-off cases, it's possible to use the <code>BYTE</code>
   427  and <code>WORD</code> directives
   428  to lay down explicit data into the instruction stream within a <code>TEXT</code>.
   429  Here's how the 386 runtime defines the 64-bit atomic load function.
   430  </p>
   431  
   432  <pre>
   433  // uint64 atomicload64(uint64 volatile* addr);
   434  // so actually
   435  // void atomicload64(uint64 *res, uint64 volatile *addr);
   436  TEXT runtime·atomicload64(SB), NOSPLIT, $0-8
   437  	MOVL	4(SP), BX
   438  	MOVL	8(SP), AX
   439  	// MOVQ (%EAX), %MM0
   440  	BYTE $0x0f; BYTE $0x6f; BYTE $0x00
   441  	// MOVQ %MM0, 0(%EBX)
   442  	BYTE $0x0f; BYTE $0x7f; BYTE $0x03
   443  	// EMMS
   444  	BYTE $0x0F; BYTE $0x77
   445  	RET
   446  </pre>