github.com/bir3/gocompiler@v0.3.205/src/cmd/compile/README.md (about)

     1  <!---
     2  // Copyright 2018 The Go Authors. All rights reserved.
     3  // Use of this source code is governed by a BSD-style
     4  // license that can be found in the LICENSE file.
     5  -->
     6  
     7  ## Introduction to the Go compiler
     8  
     9  `cmd/compile` contains the main packages that form the Go compiler. The compiler
    10  may be logically split in four phases, which we will briefly describe alongside
    11  the list of packages that contain their code.
    12  
    13  You may sometimes hear the terms "front-end" and "back-end" when referring to
    14  the compiler. Roughly speaking, these translate to the first two and last two
    15  phases we are going to list here. A third term, "middle-end", often refers to
    16  much of the work that happens in the second phase.
    17  
    18  Note that the `go/*` family of packages, such as `go/parser` and
    19  `go/types`, are mostly unused by the compiler. Since the compiler was
    20  initially written in C, the `go/*` packages were developed to enable
    21  writing tools working with Go code, such as `gofmt` and `vet`.
    22  However, over time the compiler's internal APIs have slowly evolved to
    23  be more familiar to users of the `go/*` packages.
    24  
    25  It should be clarified that the name "gc" stands for "Go compiler", and has
    26  little to do with uppercase "GC", which stands for garbage collection.
    27  
    28  ### 1. Parsing
    29  
    30  * `cmd/compile/internal/syntax` (lexer, parser, syntax tree)
    31  
    32  In the first phase of compilation, source code is tokenized (lexical analysis),
    33  parsed (syntax analysis), and a syntax tree is constructed for each source
    34  file.
    35  
    36  Each syntax tree is an exact representation of the respective source file, with
    37  nodes corresponding to the various elements of the source such as expressions,
    38  declarations, and statements. The syntax tree also includes position information
    39  which is used for error reporting and the creation of debugging information.
    40  
    41  ### 2. Type checking
    42  
    43  * `cmd/compile/internal/types2` (type checking)
    44  
    45  The types2 package is a port of `go/types` to use the syntax package's
    46  AST instead of `go/ast`.
    47  
    48  ### 3. IR construction ("noding")
    49  
    50  * `cmd/compile/internal/types` (compiler types)
    51  * `cmd/compile/internal/ir` (compiler AST)
    52  * `cmd/compile/internal/typecheck` (AST transformations)
    53  * `cmd/compile/internal/noder` (create compiler AST)
    54  
    55  The compiler middle end uses its own AST definition and representation of Go
    56  types carried over from when it was written in C. All of its code is written in
    57  terms of these, so the next step after type checking is to convert the syntax
    58  and types2 representations to ir and types. This process is referred to as
    59  "noding."
    60  
    61  There are currently two noding implementations:
    62  
    63  1. irgen (aka "-G=3" or sometimes "noder2") is the implementation used starting
    64     with Go 1.18, and
    65  
    66  2. Unified IR is another, in-development implementation (enabled with
    67     `GOEXPERIMENT=unified`), which also implements import/export and inlining.
    68  
    69  Up through Go 1.18, there was a third noding implementation (just
    70  "noder" or "-G=0"), which directly converted the pre-type-checked
    71  syntax representation into IR and then invoked package typecheck's
    72  type checker. This implementation was removed after Go 1.18, so now
    73  package typecheck is only used for IR transformations.
    74  
    75  ### 4. Middle end
    76  
    77  * `cmd/compile/internal/deadcode` (dead code elimination)
    78  * `cmd/compile/internal/inline` (function call inlining)
    79  * `cmd/compile/internal/devirtualize` (devirtualization of known interface method calls)
    80  * `cmd/compile/internal/escape` (escape analysis)
    81  
    82  Several optimization passes are performed on the IR representation:
    83  dead code elimination, (early) devirtualization, function call
    84  inlining, and escape analysis.
    85  
    86  ### 5. Walk
    87  
    88  * `cmd/compile/internal/walk` (order of evaluation, desugaring)
    89  
    90  The final pass over the IR representation is "walk," which serves two purposes:
    91  
    92  1. It decomposes complex statements into individual, simpler statements,
    93     introducing temporary variables and respecting order of evaluation. This step
    94     is also referred to as "order."
    95  
    96  2. It desugars higher-level Go constructs into more primitive ones. For example,
    97     `switch` statements are turned into binary search or jump tables, and
    98     operations on maps and channels are replaced with runtime calls.
    99  
   100  ### 6. Generic SSA
   101  
   102  * `cmd/compile/internal/ssa` (SSA passes and rules)
   103  * `cmd/compile/internal/ssagen` (converting IR to SSA)
   104  
   105  In this phase, IR is converted into Static Single Assignment (SSA) form, a
   106  lower-level intermediate representation with specific properties that make it
   107  easier to implement optimizations and to eventually generate machine code from
   108  it.
   109  
   110  During this conversion, function intrinsics are applied. These are special
   111  functions that the compiler has been taught to replace with heavily optimized
   112  code on a case-by-case basis.
   113  
   114  Certain nodes are also lowered into simpler components during the AST to SSA
   115  conversion, so that the rest of the compiler can work with them. For instance,
   116  the copy builtin is replaced by memory moves, and range loops are rewritten into
   117  for loops. Some of these currently happen before the conversion to SSA due to
   118  historical reasons, but the long-term plan is to move all of them here.
   119  
   120  Then, a series of machine-independent passes and rules are applied. These do not
   121  concern any single computer architecture, and thus run on all `GOARCH` variants.
   122  These passes include dead code elimination, removal of
   123  unneeded nil checks, and removal of unused branches. The generic rewrite rules
   124  mainly concern expressions, such as replacing some expressions with constant
   125  values, and optimizing multiplications and float operations.
   126  
   127  ### 7. Generating machine code
   128  
   129  * `cmd/compile/internal/ssa` (SSA lowering and arch-specific passes)
   130  * `cmd/internal/obj` (machine code generation)
   131  
   132  The machine-dependent phase of the compiler begins with the "lower" pass, which
   133  rewrites generic values into their machine-specific variants. For example, on
   134  amd64 memory operands are possible, so many load-store operations may be combined.
   135  
   136  Note that the lower pass runs all machine-specific rewrite rules, and thus it
   137  currently applies lots of optimizations too.
   138  
   139  Once the SSA has been "lowered" and is more specific to the target architecture,
   140  the final code optimization passes are run. This includes yet another dead code
   141  elimination pass, moving values closer to their uses, the removal of local
   142  variables that are never read from, and register allocation.
   143  
   144  Other important pieces of work done as part of this step include stack frame
   145  layout, which assigns stack offsets to local variables, and pointer liveness
   146  analysis, which computes which on-stack pointers are live at each GC safe point.
   147  
   148  At the end of the SSA generation phase, Go functions have been transformed into
   149  a series of obj.Prog instructions. These are passed to the assembler
   150  (`cmd/internal/obj`), which turns them into machine code and writes out the
   151  final object file. The object file will also contain reflect data, export data,
   152  and debugging information.
   153  
   154  ### Further reading
   155  
   156  To dig deeper into how the SSA package works, including its passes and rules,
   157  head to [cmd/compile/internal/ssa/README.md](internal/ssa/README.md).