github.com/bir3/gocompiler@v0.3.205/src/cmd/compile/README.md (about) 1 <!--- 2 // Copyright 2018 The Go Authors. All rights reserved. 3 // Use of this source code is governed by a BSD-style 4 // license that can be found in the LICENSE file. 5 --> 6 7 ## Introduction to the Go compiler 8 9 `cmd/compile` contains the main packages that form the Go compiler. The compiler 10 may be logically split in four phases, which we will briefly describe alongside 11 the list of packages that contain their code. 12 13 You may sometimes hear the terms "front-end" and "back-end" when referring to 14 the compiler. Roughly speaking, these translate to the first two and last two 15 phases we are going to list here. A third term, "middle-end", often refers to 16 much of the work that happens in the second phase. 17 18 Note that the `go/*` family of packages, such as `go/parser` and 19 `go/types`, are mostly unused by the compiler. Since the compiler was 20 initially written in C, the `go/*` packages were developed to enable 21 writing tools working with Go code, such as `gofmt` and `vet`. 22 However, over time the compiler's internal APIs have slowly evolved to 23 be more familiar to users of the `go/*` packages. 24 25 It should be clarified that the name "gc" stands for "Go compiler", and has 26 little to do with uppercase "GC", which stands for garbage collection. 27 28 ### 1. Parsing 29 30 * `cmd/compile/internal/syntax` (lexer, parser, syntax tree) 31 32 In the first phase of compilation, source code is tokenized (lexical analysis), 33 parsed (syntax analysis), and a syntax tree is constructed for each source 34 file. 35 36 Each syntax tree is an exact representation of the respective source file, with 37 nodes corresponding to the various elements of the source such as expressions, 38 declarations, and statements. The syntax tree also includes position information 39 which is used for error reporting and the creation of debugging information. 40 41 ### 2. Type checking 42 43 * `cmd/compile/internal/types2` (type checking) 44 45 The types2 package is a port of `go/types` to use the syntax package's 46 AST instead of `go/ast`. 47 48 ### 3. IR construction ("noding") 49 50 * `cmd/compile/internal/types` (compiler types) 51 * `cmd/compile/internal/ir` (compiler AST) 52 * `cmd/compile/internal/typecheck` (AST transformations) 53 * `cmd/compile/internal/noder` (create compiler AST) 54 55 The compiler middle end uses its own AST definition and representation of Go 56 types carried over from when it was written in C. All of its code is written in 57 terms of these, so the next step after type checking is to convert the syntax 58 and types2 representations to ir and types. This process is referred to as 59 "noding." 60 61 There are currently two noding implementations: 62 63 1. irgen (aka "-G=3" or sometimes "noder2") is the implementation used starting 64 with Go 1.18, and 65 66 2. Unified IR is another, in-development implementation (enabled with 67 `GOEXPERIMENT=unified`), which also implements import/export and inlining. 68 69 Up through Go 1.18, there was a third noding implementation (just 70 "noder" or "-G=0"), which directly converted the pre-type-checked 71 syntax representation into IR and then invoked package typecheck's 72 type checker. This implementation was removed after Go 1.18, so now 73 package typecheck is only used for IR transformations. 74 75 ### 4. Middle end 76 77 * `cmd/compile/internal/deadcode` (dead code elimination) 78 * `cmd/compile/internal/inline` (function call inlining) 79 * `cmd/compile/internal/devirtualize` (devirtualization of known interface method calls) 80 * `cmd/compile/internal/escape` (escape analysis) 81 82 Several optimization passes are performed on the IR representation: 83 dead code elimination, (early) devirtualization, function call 84 inlining, and escape analysis. 85 86 ### 5. Walk 87 88 * `cmd/compile/internal/walk` (order of evaluation, desugaring) 89 90 The final pass over the IR representation is "walk," which serves two purposes: 91 92 1. It decomposes complex statements into individual, simpler statements, 93 introducing temporary variables and respecting order of evaluation. This step 94 is also referred to as "order." 95 96 2. It desugars higher-level Go constructs into more primitive ones. For example, 97 `switch` statements are turned into binary search or jump tables, and 98 operations on maps and channels are replaced with runtime calls. 99 100 ### 6. Generic SSA 101 102 * `cmd/compile/internal/ssa` (SSA passes and rules) 103 * `cmd/compile/internal/ssagen` (converting IR to SSA) 104 105 In this phase, IR is converted into Static Single Assignment (SSA) form, a 106 lower-level intermediate representation with specific properties that make it 107 easier to implement optimizations and to eventually generate machine code from 108 it. 109 110 During this conversion, function intrinsics are applied. These are special 111 functions that the compiler has been taught to replace with heavily optimized 112 code on a case-by-case basis. 113 114 Certain nodes are also lowered into simpler components during the AST to SSA 115 conversion, so that the rest of the compiler can work with them. For instance, 116 the copy builtin is replaced by memory moves, and range loops are rewritten into 117 for loops. Some of these currently happen before the conversion to SSA due to 118 historical reasons, but the long-term plan is to move all of them here. 119 120 Then, a series of machine-independent passes and rules are applied. These do not 121 concern any single computer architecture, and thus run on all `GOARCH` variants. 122 These passes include dead code elimination, removal of 123 unneeded nil checks, and removal of unused branches. The generic rewrite rules 124 mainly concern expressions, such as replacing some expressions with constant 125 values, and optimizing multiplications and float operations. 126 127 ### 7. Generating machine code 128 129 * `cmd/compile/internal/ssa` (SSA lowering and arch-specific passes) 130 * `cmd/internal/obj` (machine code generation) 131 132 The machine-dependent phase of the compiler begins with the "lower" pass, which 133 rewrites generic values into their machine-specific variants. For example, on 134 amd64 memory operands are possible, so many load-store operations may be combined. 135 136 Note that the lower pass runs all machine-specific rewrite rules, and thus it 137 currently applies lots of optimizations too. 138 139 Once the SSA has been "lowered" and is more specific to the target architecture, 140 the final code optimization passes are run. This includes yet another dead code 141 elimination pass, moving values closer to their uses, the removal of local 142 variables that are never read from, and register allocation. 143 144 Other important pieces of work done as part of this step include stack frame 145 layout, which assigns stack offsets to local variables, and pointer liveness 146 analysis, which computes which on-stack pointers are live at each GC safe point. 147 148 At the end of the SSA generation phase, Go functions have been transformed into 149 a series of obj.Prog instructions. These are passed to the assembler 150 (`cmd/internal/obj`), which turns them into machine code and writes out the 151 final object file. The object file will also contain reflect data, export data, 152 and debugging information. 153 154 ### Further reading 155 156 To dig deeper into how the SSA package works, including its passes and rules, 157 head to [cmd/compile/internal/ssa/README.md](internal/ssa/README.md).