github.com/bir3/gocompiler@v0.9.2202/src/cmd/compile/README.md (about)

     1  <!---
     2  // Copyright 2018 The Go Authors. All rights reserved.
     3  // Use of this source code is governed by a BSD-style
     4  // license that can be found in the LICENSE file.
     5  -->
     6  
     7  ## Introduction to the Go compiler
     8  
     9  `cmd/compile` contains the main packages that form the Go compiler. The compiler
    10  may be logically split in four phases, which we will briefly describe alongside
    11  the list of packages that contain their code.
    12  
    13  You may sometimes hear the terms "front-end" and "back-end" when referring to
    14  the compiler. Roughly speaking, these translate to the first two and last two
    15  phases we are going to list here. A third term, "middle-end", often refers to
    16  much of the work that happens in the second phase.
    17  
    18  Note that the `go/*` family of packages, such as `go/parser` and
    19  `go/types`, are mostly unused by the compiler. Since the compiler was
    20  initially written in C, the `go/*` packages were developed to enable
    21  writing tools working with Go code, such as `gofmt` and `vet`.
    22  However, over time the compiler's internal APIs have slowly evolved to
    23  be more familiar to users of the `go/*` packages.
    24  
    25  It should be clarified that the name "gc" stands for "Go compiler", and has
    26  little to do with uppercase "GC", which stands for garbage collection.
    27  
    28  ### 1. Parsing
    29  
    30  * `cmd/compile/internal/syntax` (lexer, parser, syntax tree)
    31  
    32  In the first phase of compilation, source code is tokenized (lexical analysis),
    33  parsed (syntax analysis), and a syntax tree is constructed for each source
    34  file.
    35  
    36  Each syntax tree is an exact representation of the respective source file, with
    37  nodes corresponding to the various elements of the source such as expressions,
    38  declarations, and statements. The syntax tree also includes position information
    39  which is used for error reporting and the creation of debugging information.
    40  
    41  ### 2. Type checking
    42  
    43  * `cmd/compile/internal/types2` (type checking)
    44  
    45  The types2 package is a port of `go/types` to use the syntax package's
    46  AST instead of `go/ast`.
    47  
    48  ### 3. IR construction ("noding")
    49  
    50  * `cmd/compile/internal/types` (compiler types)
    51  * `cmd/compile/internal/ir` (compiler AST)
    52  * `cmd/compile/internal/noder` (create compiler AST)
    53  
    54  The compiler middle end uses its own AST definition and representation of Go
    55  types carried over from when it was written in C. All of its code is written in
    56  terms of these, so the next step after type checking is to convert the syntax
    57  and types2 representations to ir and types. This process is referred to as
    58  "noding."
    59  
    60  Noding using a process called Unified IR, which builds a node representation
    61  using a serialized version of the typechecked code from step 2.
    62  Unified IR is also involved in import/export of packages and inlining.
    63  
    64  ### 4. Middle end
    65  
    66  * `cmd/compile/internal/deadcode` (dead code elimination)
    67  * `cmd/compile/internal/inline` (function call inlining)
    68  * `cmd/compile/internal/devirtualize` (devirtualization of known interface method calls)
    69  * `cmd/compile/internal/escape` (escape analysis)
    70  
    71  Several optimization passes are performed on the IR representation:
    72  dead code elimination, (early) devirtualization, function call
    73  inlining, and escape analysis.
    74  
    75  ### 5. Walk
    76  
    77  * `cmd/compile/internal/walk` (order of evaluation, desugaring)
    78  
    79  The final pass over the IR representation is "walk," which serves two purposes:
    80  
    81  1. It decomposes complex statements into individual, simpler statements,
    82     introducing temporary variables and respecting order of evaluation. This step
    83     is also referred to as "order."
    84  
    85  2. It desugars higher-level Go constructs into more primitive ones. For example,
    86     `switch` statements are turned into binary search or jump tables, and
    87     operations on maps and channels are replaced with runtime calls.
    88  
    89  ### 6. Generic SSA
    90  
    91  * `cmd/compile/internal/ssa` (SSA passes and rules)
    92  * `cmd/compile/internal/ssagen` (converting IR to SSA)
    93  
    94  In this phase, IR is converted into Static Single Assignment (SSA) form, a
    95  lower-level intermediate representation with specific properties that make it
    96  easier to implement optimizations and to eventually generate machine code from
    97  it.
    98  
    99  During this conversion, function intrinsics are applied. These are special
   100  functions that the compiler has been taught to replace with heavily optimized
   101  code on a case-by-case basis.
   102  
   103  Certain nodes are also lowered into simpler components during the AST to SSA
   104  conversion, so that the rest of the compiler can work with them. For instance,
   105  the copy builtin is replaced by memory moves, and range loops are rewritten into
   106  for loops. Some of these currently happen before the conversion to SSA due to
   107  historical reasons, but the long-term plan is to move all of them here.
   108  
   109  Then, a series of machine-independent passes and rules are applied. These do not
   110  concern any single computer architecture, and thus run on all `GOARCH` variants.
   111  These passes include dead code elimination, removal of
   112  unneeded nil checks, and removal of unused branches. The generic rewrite rules
   113  mainly concern expressions, such as replacing some expressions with constant
   114  values, and optimizing multiplications and float operations.
   115  
   116  ### 7. Generating machine code
   117  
   118  * `cmd/compile/internal/ssa` (SSA lowering and arch-specific passes)
   119  * `cmd/internal/obj` (machine code generation)
   120  
   121  The machine-dependent phase of the compiler begins with the "lower" pass, which
   122  rewrites generic values into their machine-specific variants. For example, on
   123  amd64 memory operands are possible, so many load-store operations may be combined.
   124  
   125  Note that the lower pass runs all machine-specific rewrite rules, and thus it
   126  currently applies lots of optimizations too.
   127  
   128  Once the SSA has been "lowered" and is more specific to the target architecture,
   129  the final code optimization passes are run. This includes yet another dead code
   130  elimination pass, moving values closer to their uses, the removal of local
   131  variables that are never read from, and register allocation.
   132  
   133  Other important pieces of work done as part of this step include stack frame
   134  layout, which assigns stack offsets to local variables, and pointer liveness
   135  analysis, which computes which on-stack pointers are live at each GC safe point.
   136  
   137  At the end of the SSA generation phase, Go functions have been transformed into
   138  a series of obj.Prog instructions. These are passed to the assembler
   139  (`cmd/internal/obj`), which turns them into machine code and writes out the
   140  final object file. The object file will also contain reflect data, export data,
   141  and debugging information.
   142  
   143  ### 8. Tips
   144  
   145  #### Getting Started
   146  
   147  * If you have never contributed to the compiler before, a simple way to begin
   148    can be adding a log statement or `panic("here")` to get some
   149    initial insight into whatever you are investigating.
   150  
   151  * The compiler itself provides logging, debugging and visualization capabilities,
   152    such as:
   153     ```
   154     $ go build -gcflags=-m=2                   # print optimization info, including inlining, escape analysis
   155     $ go build -gcflags=-d=ssa/check_bce/debug # print bounds check info
   156     $ go build -gcflags=-W                     # print internal parse tree after type checking
   157     $ GOSSAFUNC=Foo go build                   # generate ssa.html file for func Foo
   158     $ go build -gcflags=-S                     # print assembly
   159     $ go tool compile -bench=out.txt x.go      # print timing of compiler phases
   160     ```
   161  
   162    Some flags alter the compiler behavior, such as:
   163     ```
   164     $ go tool compile -h file.go               # panic on first compile error encountered
   165     $ go build -gcflags=-d=checkptr=2          # enable additional unsafe pointer checking
   166     ```
   167  
   168    There are many additional flags. Some descriptions are available via:
   169     ```
   170     $ go tool compile -h              # compiler flags, e.g., go build -gcflags='-m=1 -l'
   171     $ go tool compile -d help         # debug flags, e.g., go build -gcflags=-d=checkptr=2
   172     $ go tool compile -d ssa/help     # ssa flags, e.g., go build -gcflags=-d=ssa/prove/debug=2
   173     ```
   174  
   175    There are some additional details about `-gcflags` and the differences between `go build`
   176    vs. `go tool compile` in a [section below](#-gcflags-and-go-build-vs-go-tool-compile).
   177  
   178  * In general, when investigating a problem in the compiler you usually want to
   179    start with the simplest possible reproduction and understand exactly what is
   180    happening with it.
   181  
   182  #### Testing your changes
   183  
   184  * Be sure to read the [Quickly testing your changes](https://go.dev/doc/contribute#quick_test)
   185    section of the Go Contribution Guide.
   186  
   187  * Some tests live within the cmd/compile packages and can be run by `go test ./...` or similar,
   188    but many cmd/compile tests are in the top-level
   189    [test](https://github.com/golang/go/tree/master/test) directory:
   190  
   191    ```
   192    $ go test cmd/internal/testdir                           # all tests in 'test' dir
   193    $ go test cmd/internal/testdir -run='Test/escape.*.go'   # test specific files in 'test' dir
   194    ```
   195    For details, see the [testdir README](https://github.com/golang/go/tree/master/test#readme).
   196    The `errorCheck` method in [testdir_test.go](https://github.com/golang/go/blob/master/src/cmd/internal/testdir/testdir_test.go)
   197    is helpful for a description of the `ERROR` comments used in many of those tests.
   198  
   199    In addition, the `go/types` package from the standard library and `cmd/compile/internal/types2`
   200    have shared tests in `src/internal/types/testdata`, and both type checkers
   201    should be checked if anything changes there.
   202  
   203  * The new [application-based coverage profiling](https://go.dev/testing/coverage/) can be used
   204    with the compiler, such as:
   205  
   206    ```
   207    $ go install -cover -coverpkg=cmd/compile/... cmd/compile  # build compiler with coverage instrumentation
   208    $ mkdir /tmp/coverdir                                      # pick location for coverage data
   209    $ GOCOVERDIR=/tmp/coverdir go test [...]                   # use compiler, saving coverage data
   210    $ go tool covdata textfmt -i=/tmp/coverdir -o coverage.out # convert to traditional coverage format
   211    $ go tool cover -html coverage.out                         # view coverage via traditional tools
   212    ```
   213  
   214  #### Juggling compiler versions
   215  
   216  * Many of the compiler tests use the version of the `go` command found in your PATH and
   217    its corresponding `compile` binary.
   218  
   219  * If you are in a branch and your PATH includes `<go-repo>/bin`,
   220    doing `go install cmd/compile` will build the compiler using the code from your
   221    branch and install it to the proper location so that subsequent `go` commands
   222    like `go build` or `go test ./...` will exercise your freshly built compiler.
   223  
   224  * [toolstash](https://pkg.go.dev/golang.org/x/tools/cmd/toolstash) provides a way
   225    to save, run, and restore a known good copy of the Go toolchain. For example, it can be
   226    a good practice to initially build your branch, save that version of
   227    the toolchain, then restore the known good version of the tools to compile
   228    your work-in-progress version of the compiler.
   229  
   230    Sample set up steps:
   231    ```
   232    $ go install golang.org/x/tools/cmd/toolstash@latest
   233    $ git clone https://go.googlesource.com/go
   234    $ cd go
   235    $ git checkout -b mybranch
   236    $ ./src/all.bash               # build and confirm good starting point
   237    $ export PATH=$PWD/bin:$PATH
   238    $ toolstash save               # save current tools
   239    ```
   240    After that, your edit/compile/test cycle can be similar to:
   241    ```
   242    <... make edits to cmd/compile source ...>
   243    $ toolstash restore && go install cmd/compile   # restore known good tools to build compiler
   244    <... 'go build', 'go test', etc. ...>           # use freshly built compiler
   245    ```
   246  
   247  * toolstash also allows comparing the installed vs. stashed copy of
   248    the compiler, such as if you expect equivalent behavior after a refactor.
   249    For example, to check that your changed compiler produces identical object files to
   250    the stashed compiler while building the standard library:
   251    ```
   252    $ toolstash restore && go install cmd/compile   # build latest compiler
   253    $ go build -toolexec "toolstash -cmp" -a -v std # compare latest vs. saved compiler
   254    ```
   255  
   256  * If versions appear to get out of sync (for example, with errors like
   257    `linked object header mismatch` with version strings like
   258    `devel go1.21-db3f952b1f`), you might need to do
   259    `toolstash restore && go install cmd/...` to update all the tools under cmd.
   260  
   261  #### Additional helpful tools
   262  
   263  * [compilebench](https://pkg.go.dev/golang.org/x/tools/cmd/compilebench) benchmarks
   264    the speed of the compiler.
   265  
   266  * [benchstat](https://pkg.go.dev/golang.org/x/perf/cmd/benchstat) is the standard tool
   267    for reporting performance changes resulting from compiler modifications,
   268    including whether any improvements are statistically significant:
   269    ```
   270    $ go test -bench=SomeBenchmarks -count=20 > new.txt   # use new compiler
   271    $ toolstash restore                                   # restore old compiler
   272    $ go test -bench=SomeBenchmarks -count=20 > old.txt   # use old compiler
   273    $ benchstat old.txt new.txt                           # compare old vs. new
   274    ```
   275  
   276  * [bent](https://pkg.go.dev/golang.org/x/benchmarks/cmd/bent) facilitates running a
   277    large set of benchmarks from various community Go projects inside a Docker container.
   278  
   279  * [perflock](https://github.com/aclements/perflock) helps obtain more consistent
   280    benchmark results, including by manipulating CPU frequency scaling settings on Linux.
   281  
   282  * [view-annotated-file](https://github.com/loov/view-annotated-file) (from the community)
   283     overlays inlining, bounds check, and escape info back onto the source code.
   284  
   285  * [godbolt.org](https://go.godbolt.org) is widely used to examine
   286    and share assembly output from many compilers, including the Go compiler. It can also
   287    [compare](https://go.godbolt.org/z/5Gs1G4bKG) assembly for different versions of
   288    a function or across Go compiler versions, which can be helpful for investigations and
   289    bug reports.
   290  
   291  #### -gcflags and 'go build' vs. 'go tool compile'
   292  
   293  * `-gcflags` is a go command [build flag](https://pkg.go.dev/cmd/go#hdr-Compile_packages_and_dependencies).
   294    `go build -gcflags=<args>` passes the supplied `<args>` to the underlying
   295    `compile` invocation(s) while still doing everything that the `go build` command
   296    normally does (e.g., handling the build cache, modules, and so on). In contrast,
   297    `go tool compile <args>` asks the `go` command to invoke `compile <args>` a single time
   298    without involving the standard `go build` machinery. In some cases, it can be helpful to have
   299    fewer moving parts by doing `go tool compile <args>`, such as if you have a
   300    small standalone source file that can be compiled without any assistance from `go build`.
   301    In other cases, it is more convenient to pass `-gcflags` to a build command like
   302    `go build`, `go test`, or `go install`.
   303  
   304  * `-gcflags` by default applies to the packages named on the command line, but can
   305    use package patterns such as `-gcflags='all=-m=1 -l'`, or multiple package patterns such as
   306    `-gcflags='all=-m=1' -gcflags='fmt=-m=2'`. For details, see the
   307    [cmd/go documentation](https://pkg.go.dev/cmd/go#hdr-Compile_packages_and_dependencies).
   308  
   309  ### Further reading
   310  
   311  To dig deeper into how the SSA package works, including its passes and rules,
   312  head to [cmd/compile/internal/ssa/README.md](internal/ssa/README.md).
   313  
   314  Finally, if something in this README or the SSA README is unclear
   315  or if you have an idea for an improvement, feel free to leave a comment in
   316  [issue 30074](https://go.dev/issue/30074).