github.com/bir3/gocompiler@v0.9.2202/src/cmd/compile/README.md (about) 1 <!--- 2 // Copyright 2018 The Go Authors. All rights reserved. 3 // Use of this source code is governed by a BSD-style 4 // license that can be found in the LICENSE file. 5 --> 6 7 ## Introduction to the Go compiler 8 9 `cmd/compile` contains the main packages that form the Go compiler. The compiler 10 may be logically split in four phases, which we will briefly describe alongside 11 the list of packages that contain their code. 12 13 You may sometimes hear the terms "front-end" and "back-end" when referring to 14 the compiler. Roughly speaking, these translate to the first two and last two 15 phases we are going to list here. A third term, "middle-end", often refers to 16 much of the work that happens in the second phase. 17 18 Note that the `go/*` family of packages, such as `go/parser` and 19 `go/types`, are mostly unused by the compiler. Since the compiler was 20 initially written in C, the `go/*` packages were developed to enable 21 writing tools working with Go code, such as `gofmt` and `vet`. 22 However, over time the compiler's internal APIs have slowly evolved to 23 be more familiar to users of the `go/*` packages. 24 25 It should be clarified that the name "gc" stands for "Go compiler", and has 26 little to do with uppercase "GC", which stands for garbage collection. 27 28 ### 1. Parsing 29 30 * `cmd/compile/internal/syntax` (lexer, parser, syntax tree) 31 32 In the first phase of compilation, source code is tokenized (lexical analysis), 33 parsed (syntax analysis), and a syntax tree is constructed for each source 34 file. 35 36 Each syntax tree is an exact representation of the respective source file, with 37 nodes corresponding to the various elements of the source such as expressions, 38 declarations, and statements. The syntax tree also includes position information 39 which is used for error reporting and the creation of debugging information. 40 41 ### 2. Type checking 42 43 * `cmd/compile/internal/types2` (type checking) 44 45 The types2 package is a port of `go/types` to use the syntax package's 46 AST instead of `go/ast`. 47 48 ### 3. IR construction ("noding") 49 50 * `cmd/compile/internal/types` (compiler types) 51 * `cmd/compile/internal/ir` (compiler AST) 52 * `cmd/compile/internal/noder` (create compiler AST) 53 54 The compiler middle end uses its own AST definition and representation of Go 55 types carried over from when it was written in C. All of its code is written in 56 terms of these, so the next step after type checking is to convert the syntax 57 and types2 representations to ir and types. This process is referred to as 58 "noding." 59 60 Noding using a process called Unified IR, which builds a node representation 61 using a serialized version of the typechecked code from step 2. 62 Unified IR is also involved in import/export of packages and inlining. 63 64 ### 4. Middle end 65 66 * `cmd/compile/internal/deadcode` (dead code elimination) 67 * `cmd/compile/internal/inline` (function call inlining) 68 * `cmd/compile/internal/devirtualize` (devirtualization of known interface method calls) 69 * `cmd/compile/internal/escape` (escape analysis) 70 71 Several optimization passes are performed on the IR representation: 72 dead code elimination, (early) devirtualization, function call 73 inlining, and escape analysis. 74 75 ### 5. Walk 76 77 * `cmd/compile/internal/walk` (order of evaluation, desugaring) 78 79 The final pass over the IR representation is "walk," which serves two purposes: 80 81 1. It decomposes complex statements into individual, simpler statements, 82 introducing temporary variables and respecting order of evaluation. This step 83 is also referred to as "order." 84 85 2. It desugars higher-level Go constructs into more primitive ones. For example, 86 `switch` statements are turned into binary search or jump tables, and 87 operations on maps and channels are replaced with runtime calls. 88 89 ### 6. Generic SSA 90 91 * `cmd/compile/internal/ssa` (SSA passes and rules) 92 * `cmd/compile/internal/ssagen` (converting IR to SSA) 93 94 In this phase, IR is converted into Static Single Assignment (SSA) form, a 95 lower-level intermediate representation with specific properties that make it 96 easier to implement optimizations and to eventually generate machine code from 97 it. 98 99 During this conversion, function intrinsics are applied. These are special 100 functions that the compiler has been taught to replace with heavily optimized 101 code on a case-by-case basis. 102 103 Certain nodes are also lowered into simpler components during the AST to SSA 104 conversion, so that the rest of the compiler can work with them. For instance, 105 the copy builtin is replaced by memory moves, and range loops are rewritten into 106 for loops. Some of these currently happen before the conversion to SSA due to 107 historical reasons, but the long-term plan is to move all of them here. 108 109 Then, a series of machine-independent passes and rules are applied. These do not 110 concern any single computer architecture, and thus run on all `GOARCH` variants. 111 These passes include dead code elimination, removal of 112 unneeded nil checks, and removal of unused branches. The generic rewrite rules 113 mainly concern expressions, such as replacing some expressions with constant 114 values, and optimizing multiplications and float operations. 115 116 ### 7. Generating machine code 117 118 * `cmd/compile/internal/ssa` (SSA lowering and arch-specific passes) 119 * `cmd/internal/obj` (machine code generation) 120 121 The machine-dependent phase of the compiler begins with the "lower" pass, which 122 rewrites generic values into their machine-specific variants. For example, on 123 amd64 memory operands are possible, so many load-store operations may be combined. 124 125 Note that the lower pass runs all machine-specific rewrite rules, and thus it 126 currently applies lots of optimizations too. 127 128 Once the SSA has been "lowered" and is more specific to the target architecture, 129 the final code optimization passes are run. This includes yet another dead code 130 elimination pass, moving values closer to their uses, the removal of local 131 variables that are never read from, and register allocation. 132 133 Other important pieces of work done as part of this step include stack frame 134 layout, which assigns stack offsets to local variables, and pointer liveness 135 analysis, which computes which on-stack pointers are live at each GC safe point. 136 137 At the end of the SSA generation phase, Go functions have been transformed into 138 a series of obj.Prog instructions. These are passed to the assembler 139 (`cmd/internal/obj`), which turns them into machine code and writes out the 140 final object file. The object file will also contain reflect data, export data, 141 and debugging information. 142 143 ### 8. Tips 144 145 #### Getting Started 146 147 * If you have never contributed to the compiler before, a simple way to begin 148 can be adding a log statement or `panic("here")` to get some 149 initial insight into whatever you are investigating. 150 151 * The compiler itself provides logging, debugging and visualization capabilities, 152 such as: 153 ``` 154 $ go build -gcflags=-m=2 # print optimization info, including inlining, escape analysis 155 $ go build -gcflags=-d=ssa/check_bce/debug # print bounds check info 156 $ go build -gcflags=-W # print internal parse tree after type checking 157 $ GOSSAFUNC=Foo go build # generate ssa.html file for func Foo 158 $ go build -gcflags=-S # print assembly 159 $ go tool compile -bench=out.txt x.go # print timing of compiler phases 160 ``` 161 162 Some flags alter the compiler behavior, such as: 163 ``` 164 $ go tool compile -h file.go # panic on first compile error encountered 165 $ go build -gcflags=-d=checkptr=2 # enable additional unsafe pointer checking 166 ``` 167 168 There are many additional flags. Some descriptions are available via: 169 ``` 170 $ go tool compile -h # compiler flags, e.g., go build -gcflags='-m=1 -l' 171 $ go tool compile -d help # debug flags, e.g., go build -gcflags=-d=checkptr=2 172 $ go tool compile -d ssa/help # ssa flags, e.g., go build -gcflags=-d=ssa/prove/debug=2 173 ``` 174 175 There are some additional details about `-gcflags` and the differences between `go build` 176 vs. `go tool compile` in a [section below](#-gcflags-and-go-build-vs-go-tool-compile). 177 178 * In general, when investigating a problem in the compiler you usually want to 179 start with the simplest possible reproduction and understand exactly what is 180 happening with it. 181 182 #### Testing your changes 183 184 * Be sure to read the [Quickly testing your changes](https://go.dev/doc/contribute#quick_test) 185 section of the Go Contribution Guide. 186 187 * Some tests live within the cmd/compile packages and can be run by `go test ./...` or similar, 188 but many cmd/compile tests are in the top-level 189 [test](https://github.com/golang/go/tree/master/test) directory: 190 191 ``` 192 $ go test cmd/internal/testdir # all tests in 'test' dir 193 $ go test cmd/internal/testdir -run='Test/escape.*.go' # test specific files in 'test' dir 194 ``` 195 For details, see the [testdir README](https://github.com/golang/go/tree/master/test#readme). 196 The `errorCheck` method in [testdir_test.go](https://github.com/golang/go/blob/master/src/cmd/internal/testdir/testdir_test.go) 197 is helpful for a description of the `ERROR` comments used in many of those tests. 198 199 In addition, the `go/types` package from the standard library and `cmd/compile/internal/types2` 200 have shared tests in `src/internal/types/testdata`, and both type checkers 201 should be checked if anything changes there. 202 203 * The new [application-based coverage profiling](https://go.dev/testing/coverage/) can be used 204 with the compiler, such as: 205 206 ``` 207 $ go install -cover -coverpkg=cmd/compile/... cmd/compile # build compiler with coverage instrumentation 208 $ mkdir /tmp/coverdir # pick location for coverage data 209 $ GOCOVERDIR=/tmp/coverdir go test [...] # use compiler, saving coverage data 210 $ go tool covdata textfmt -i=/tmp/coverdir -o coverage.out # convert to traditional coverage format 211 $ go tool cover -html coverage.out # view coverage via traditional tools 212 ``` 213 214 #### Juggling compiler versions 215 216 * Many of the compiler tests use the version of the `go` command found in your PATH and 217 its corresponding `compile` binary. 218 219 * If you are in a branch and your PATH includes `<go-repo>/bin`, 220 doing `go install cmd/compile` will build the compiler using the code from your 221 branch and install it to the proper location so that subsequent `go` commands 222 like `go build` or `go test ./...` will exercise your freshly built compiler. 223 224 * [toolstash](https://pkg.go.dev/golang.org/x/tools/cmd/toolstash) provides a way 225 to save, run, and restore a known good copy of the Go toolchain. For example, it can be 226 a good practice to initially build your branch, save that version of 227 the toolchain, then restore the known good version of the tools to compile 228 your work-in-progress version of the compiler. 229 230 Sample set up steps: 231 ``` 232 $ go install golang.org/x/tools/cmd/toolstash@latest 233 $ git clone https://go.googlesource.com/go 234 $ cd go 235 $ git checkout -b mybranch 236 $ ./src/all.bash # build and confirm good starting point 237 $ export PATH=$PWD/bin:$PATH 238 $ toolstash save # save current tools 239 ``` 240 After that, your edit/compile/test cycle can be similar to: 241 ``` 242 <... make edits to cmd/compile source ...> 243 $ toolstash restore && go install cmd/compile # restore known good tools to build compiler 244 <... 'go build', 'go test', etc. ...> # use freshly built compiler 245 ``` 246 247 * toolstash also allows comparing the installed vs. stashed copy of 248 the compiler, such as if you expect equivalent behavior after a refactor. 249 For example, to check that your changed compiler produces identical object files to 250 the stashed compiler while building the standard library: 251 ``` 252 $ toolstash restore && go install cmd/compile # build latest compiler 253 $ go build -toolexec "toolstash -cmp" -a -v std # compare latest vs. saved compiler 254 ``` 255 256 * If versions appear to get out of sync (for example, with errors like 257 `linked object header mismatch` with version strings like 258 `devel go1.21-db3f952b1f`), you might need to do 259 `toolstash restore && go install cmd/...` to update all the tools under cmd. 260 261 #### Additional helpful tools 262 263 * [compilebench](https://pkg.go.dev/golang.org/x/tools/cmd/compilebench) benchmarks 264 the speed of the compiler. 265 266 * [benchstat](https://pkg.go.dev/golang.org/x/perf/cmd/benchstat) is the standard tool 267 for reporting performance changes resulting from compiler modifications, 268 including whether any improvements are statistically significant: 269 ``` 270 $ go test -bench=SomeBenchmarks -count=20 > new.txt # use new compiler 271 $ toolstash restore # restore old compiler 272 $ go test -bench=SomeBenchmarks -count=20 > old.txt # use old compiler 273 $ benchstat old.txt new.txt # compare old vs. new 274 ``` 275 276 * [bent](https://pkg.go.dev/golang.org/x/benchmarks/cmd/bent) facilitates running a 277 large set of benchmarks from various community Go projects inside a Docker container. 278 279 * [perflock](https://github.com/aclements/perflock) helps obtain more consistent 280 benchmark results, including by manipulating CPU frequency scaling settings on Linux. 281 282 * [view-annotated-file](https://github.com/loov/view-annotated-file) (from the community) 283 overlays inlining, bounds check, and escape info back onto the source code. 284 285 * [godbolt.org](https://go.godbolt.org) is widely used to examine 286 and share assembly output from many compilers, including the Go compiler. It can also 287 [compare](https://go.godbolt.org/z/5Gs1G4bKG) assembly for different versions of 288 a function or across Go compiler versions, which can be helpful for investigations and 289 bug reports. 290 291 #### -gcflags and 'go build' vs. 'go tool compile' 292 293 * `-gcflags` is a go command [build flag](https://pkg.go.dev/cmd/go#hdr-Compile_packages_and_dependencies). 294 `go build -gcflags=<args>` passes the supplied `<args>` to the underlying 295 `compile` invocation(s) while still doing everything that the `go build` command 296 normally does (e.g., handling the build cache, modules, and so on). In contrast, 297 `go tool compile <args>` asks the `go` command to invoke `compile <args>` a single time 298 without involving the standard `go build` machinery. In some cases, it can be helpful to have 299 fewer moving parts by doing `go tool compile <args>`, such as if you have a 300 small standalone source file that can be compiled without any assistance from `go build`. 301 In other cases, it is more convenient to pass `-gcflags` to a build command like 302 `go build`, `go test`, or `go install`. 303 304 * `-gcflags` by default applies to the packages named on the command line, but can 305 use package patterns such as `-gcflags='all=-m=1 -l'`, or multiple package patterns such as 306 `-gcflags='all=-m=1' -gcflags='fmt=-m=2'`. For details, see the 307 [cmd/go documentation](https://pkg.go.dev/cmd/go#hdr-Compile_packages_and_dependencies). 308 309 ### Further reading 310 311 To dig deeper into how the SSA package works, including its passes and rules, 312 head to [cmd/compile/internal/ssa/README.md](internal/ssa/README.md). 313 314 Finally, if something in this README or the SSA README is unclear 315 or if you have an idea for an improvement, feel free to leave a comment in 316 [issue 30074](https://go.dev/issue/30074).