github.com/hikaru7719/go@v0.0.0-20181025140707-c8b2ac68906a/src/cmd/compile/README.md (about)

     1  <!---
     2  // Copyright 2018 The Go Authors. All rights reserved.
     3  // Use of this source code is governed by a BSD-style
     4  // license that can be found in the LICENSE file.
     5  -->
     6  
     7  ## Introduction to the Go compiler
     8  
     9  `cmd/compile` contains the main packages that form the Go compiler. The compiler
    10  may be logically split in four phases, which we will briefly describe alongside
    11  the list of packages that contain their code.
    12  
    13  You may sometimes hear the terms "front-end" and "back-end" when referring to
    14  the compiler. Roughly speaking, these translate to the first two and last two
    15  phases we are going to list here. A third term, "middle-end", often refers to
    16  much of the work that happens in the second phase.
    17  
    18  Note that the `go/*` family of packages, such as `go/parser` and `go/types`,
    19  have no relation to the compiler. Since the compiler was initially written in C,
    20  the `go/*` packages were developed to enable writing tools working with Go code,
    21  such as `gofmt` and `vet`.
    22  
    23  It should be clarified that the name "gc" stands for "Go compiler", and has
    24  little to do with uppercase "GC", which stands for garbage collection.
    25  
    26  ### 1. Parsing
    27  
    28  * `cmd/compile/internal/syntax` (lexer, parser, syntax tree)
    29  
    30  In the first phase of compilation, source code is tokenized (lexical analysis),
    31  parsed (syntax analysis), and a syntax tree is constructed for each source
    32  file.
    33  
    34  Each syntax tree is an exact representation of the respective source file, with
    35  nodes corresponding to the various elements of the source such as expressions,
    36  declarations, and statements. The syntax tree also includes position information
    37  which is used for error reporting and the creation of debugging information.
    38  
    39  ### 2. Type-checking and AST transformations
    40  
    41  * `cmd/compile/internal/gc` (create compiler AST, type checking, AST transformations)
    42  
    43  The gc package includes an AST definition carried over from when it was written
    44  in C. All of its code is written in terms of it, so the first thing that the gc
    45  package must do is convert the syntax package's syntax tree to the compiler's
    46  AST representation. This extra step may be refactored away in the future.
    47  
    48  The AST is then type-checked. The first steps are name resolution and type
    49  inference, which determine which object belongs to which identifier, and what
    50  type each expression has. Type-checking includes certain extra checks, such as
    51  "declared and not used" as well as determining whether or not a function
    52  terminates.
    53  
    54  Certain transformations are also done on the AST. Some nodes are refined based
    55  on type information, such as string additions being split from the arithmetic
    56  addition node type. Some other examples are dead code elimination, function call
    57  inlining, and escape analysis.
    58  
    59  ### 3. Generic SSA
    60  
    61  * `cmd/compile/internal/gc` (converting to SSA)
    62  * `cmd/compile/internal/ssa` (SSA passes and rules)
    63  
    64  
    65  In this phase, the AST is converted into Static Single Assignment (SSA) form, a
    66  lower-level intermediate representation with specific properties that make it
    67  easier to implement optimizations and to eventually generate machine code from
    68  it.
    69  
    70  During this conversion, function intrinsics are applied. These are special
    71  functions that the compiler has been taught to replace with heavily optimized
    72  code on a case-by-case basis.
    73  
    74  Certain nodes are also lowered into simpler components during the AST to SSA
    75  conversion, so that the rest of the compiler can work with them. For instance,
    76  the copy builtin is replaced by memory moves, and range loops are rewritten into
    77  for loops. Some of these currently happen before the conversion to SSA due to
    78  historical reasons, but the long-term plan is to move all of them here.
    79  
    80  Then, a series of machine-independent passes and rules are applied. These do not
    81  concern any single computer architecture, and thus run on all `GOARCH` variants.
    82  
    83  Some examples of these generic passes include dead code elimination, removal of
    84  unneeded nil checks, and removal of unused branches. The generic rewrite rules
    85  mainly concern expressions, such as replacing some expressions with constant
    86  values, and optimizing multiplications and float operations.
    87  
    88  ### 4. Generating machine code
    89  
    90  * `cmd/compile/internal/ssa` (SSA lowering and arch-specific passes)
    91  * `cmd/internal/obj` (machine code generation)
    92  
    93  The machine-dependent phase of the compiler begins with the "lower" pass, which
    94  rewrites generic values into their machine-specific variants. For example, on
    95  amd64 memory operands are possible, so many load-store operations may be combined.
    96  
    97  Note that the lower pass runs all machine-specific rewrite rules, and thus it
    98  currently applies lots of optimizations too.
    99  
   100  Once the SSA has been "lowered" and is more specific to the target architecture,
   101  the final code optimization passes are run. This includes yet another dead code
   102  elimination pass, moving values closer to their uses, the removal of local
   103  variables that are never read from, and register allocation.
   104  
   105  Other important pieces of work done as part of this step include stack frame
   106  layout, which assigns stack offsets to local variables, and pointer liveness
   107  analysis, which computes which on-stack pointers are live at each GC safe point.
   108  
   109  At the end of the SSA generation phase, Go functions have been transformed into
   110  a series of obj.Prog instructions. These are passed to the assembler
   111  (`cmd/internal/obj`), which turns them into machine code and writes out the
   112  final object file. The object file will also contain reflect data, export data,
   113  and debugging information.
   114  
   115  ### Further reading
   116  
   117  To dig deeper into how the SSA package works, including its passes and rules,
   118  head to [cmd/compile/internal/ssa/README.md](internal/ssa/README.md).