github.com/tetratelabs/wazero@v1.7.3-0.20240513003603-48f702e154b5/site/content/docs/how_the_optimizing_compiler_works/frontend.md (about) 1 +++ 2 title = "How the Optimizing Compiler Works: Front-End" 3 layout = "single" 4 +++ 5 6 In this section we will discuss the phases in the front-end of the optimizing compiler: 7 8 - [Translation to SSA](#translation-to-ssa) 9 - [Optimization](#optimization) 10 - [Block Layout](#block-layout) 11 12 Every section includes an explanation of the phase; the subsection **Code** 13 will include high-level pointers to functions and packages; the subsection **Debug Flags** 14 indicates the flags that can be used to enable advanced logging of the phase. 15 16 ## Translation to SSA 17 18 We mentioned earlier that wazero uses an internal representation called an "SSA" 19 form or "Static Single-Assignment" form, but we never explained what that is. 20 21 In short terms, every program, or, in our case, every Wasm function, can be 22 translated in a control-flow graph. The control-flow graph is a directed graph where 23 each node is a sequence of statements that do not contain a control flow instruction, 24 called a **basic block**. Instead, control-flow instructions are translated into edges. 25 26 For instance, take the following implementation of the `abs` function: 27 28 ```wasm 29 (module 30 (func (;0;) (param i32) (result i32) 31 (if (result i32) (i32.lt_s (local.get 0) (i32.const 0)) 32 (then 33 (i32.sub (i32.const 0) (local.get 0))) 34 (else 35 (local.get 0)) 36 ) 37 ) 38 (export "f" (func 0)) 39 ) 40 ``` 41 42 This is translated to the following block diagram: 43 44 ```goat {width="100%" height="500"} 45 +---------------------------------------------+ 46 |blk0: (exec_ctx:i64, module_ctx:i64, v2:i32) | 47 | v3:i32 = Iconst_32 0x0 | 48 | v4:i32 = Icmp lt_s, v2, v3 | 49 | Brz v4, blk2 | 50 | Jump blk1 | 51 +---------------------------------------------+ 52 | 53 | 54 +---`(v4 != 0)`-+-`(v4 == 0)`---+ 55 | | 56 v v 57 +---------------------------+ +---------------------------+ 58 |blk1: () <-- (blk0) | |blk2: () <-- (blk0) | 59 | v6:i32 = Iconst_32 0x0 | | Jump blk3, v2 | 60 | v7:i32 = Isub v6, v2 | | | 61 | Jump blk3, v7 | | | 62 +---------------------------+ +---------------------------+ 63 | | 64 | | 65 +-`{v5 := v7}`--+--`{v5 := v2}`-+ 66 | 67 v 68 +------------------------------+ 69 |blk3: (v5:i32) <-- (blk1,blk2)| 70 | Jump blk_ret, v5 | 71 +------------------------------+ 72 | 73 {return v5} 74 | 75 v 76 ``` 77 78 We use the ["block argument" variant of SSA][ssa-blocks], which is also the same 79 representation [used in LLVM's MLIR][llvm-mlir]. In this variant, each block 80 takes a list of arguments. Each block ends with a branching instruction (Branch, Return, 81 Jump, etc...) with an optional list of arguments; these arguments are assigned 82 to the target block's arguments like a function. 83 84 Consider the first block `blk0`. 85 86 ``` 87 blk0: (exec_ctx:i64, module_ctx:i64, v2:i32) 88 v3:i32 = Iconst_32 0x0 89 v4:i32 = Icmp lt_s, v2, v3 90 Brz v4, blk2 91 Jump blk1 92 ``` 93 94 You will notice that, compared to the original function, it takes two extra 95 parameters (`exec_ctx` and `module_ctx`): 96 97 1. `exec_ctx` is a pointer to `wazevo.executionContext`. This is used to exit the execution 98 in the face of traps or for host function calls. 99 2. `module_ctx`: pointer to `wazevo.moduleContextOpaque`. This is used, among other things, 100 to access memory. 101 102 It then takes one parameter `v2`, corresponding to the function parameter, and 103 it defines two variables `v3`, `v4`. `v3` is the constant 0, `v4` is the result of 104 comparing `v2` to `v3` using the `i32.lt_s` instruction. Then, it branches to 105 `blk2` if `v4` is zero, otherwise it jumps to `blk1`. 106 107 You might also have noticed that the instructions do not correspond strictly to 108 the original Wasm opcodes. This is because, similarly to the wazero IR used by 109 the old compiler, this is a custom IR. 110 111 You will also notice that, _on the right-hand side of the assignments_ of any statement, 112 no name occurs _twice_: this is why this form is called **single-assignment**. 113 114 Finally, notice how `blk1` and `blk2` end with a jump to the last block `blk3`. 115 116 ``` 117 blk1: () 118 ... 119 Jump blk3, v7 120 121 blk2: () 122 Jump blk3, v2 123 124 blk3: (v5:i32) 125 ... 126 ``` 127 128 `blk3` takes an argument `v5`: `blk1` jumps to `bl3` with `v7` and `blk2` jumps 129 to `blk3` with `v2`, meaning `v5` is effectively a rename of `v5` or `v7`, 130 depending on the originating block. If you are familiar with the traditional 131 representation of an SSA form, you will recognize that the role of block 132 arguments is equivalent to the role of the *Phi (Φ) function*, a special 133 function that returns a different value depending on the incoming edge; e.g., in 134 this case: `v5 := Φ(v7, v2)`. 135 136 ### Code 137 138 The relevant APIs can be found under sub-package `ssa` and `frontend`. 139 In the code, the terms *lower* or *lowering* are often used to indicate a mapping or a translation, 140 because such transformations usually correspond to targeting a lower abstraction level. 141 142 - Basic Blocks are represented by the type `ssa.Block`. 143 - The SSA form is constructed using an `ssa.Builder`. The `ssa.Builder` is instantiated 144 in the context of `wasm.Engine.CompileModule()`, more specifically in the method 145 `frontend.Compiler.LowerToSSA()`. 146 - The mapping between Wasm opcodes and the IR happens in `frontend/lower.go`, 147 more specifically in the method `frontend.Compiler.lowerCurrentOpcode()`. 148 - Because they are semantically equivalent, in the code, basic block parameters 149 are sometimes referred to as "Phi values". 150 151 #### Instructions and Values 152 153 An `ssa.Instruction` is a single instruction in the SSA form. Each instruction might 154 consume zero or more `ssa.Value`s, and it usually produces a single `ssa.Value`; some 155 instructions may not produce any value (for instance, a `Jump` instruction). 156 An `ssa.Value` is an abstraction that represents a typed name binding, and it is used 157 to represent the result of an instruction, or the input to an instruction. 158 159 For instance: 160 161 ``` 162 blk1: () <-- (blk0) 163 v6:i32 = Iconst_32 0x0 164 v7:i32 = Isub v6, v2 165 Jump blk3, v7 166 ``` 167 168 `Iconst_32` takes no input value and produce value `v6`; `Isub` takes two input values (`v6`, `v2`) 169 and produces value `v7`; `Jump` takes one input value (`v7`) and produces no value. All 170 such values have the `i32` type. The wazero SSA's type system (`ssa.Type`) allows the following types: 171 172 - `i32`: 32-bit integer 173 - `i64`: 64-bit integer 174 - `f32`: 32-bit floating point 175 - `f64`: 64-bit floating point 176 - `v128`: 128-bit SIMD vector 177 178 For simplicity, we don't have a dedicated type for pointers. Instead, we use the `i64` 179 type to represent pointer values since we only support 64-bit architectures, 180 unlike traditional compilers such as LLVM. 181 182 Values and instructions are both allocated from pools to minimize memory allocations. 183 184 ### Debug Flags 185 186 - `wazevoapi.PrintSSA` dumps the SSA form to the console. 187 - `wazevoapi.FrontEndLoggingEnabled` dumps progress of the translation between Wasm 188 opcodes and SSA instructions to the console. 189 190 ## Optimization 191 192 The SSA form makes it easier to perform a number of optimizations. For instance, 193 we can perform constant propagation, dead code elimination, and common 194 subexpression elimination. These optimizations either act upon the instructions 195 within a basic block, or they act upon the control-flow graph as a whole. 196 197 On a high, level, consider the following basic block, derived from the previous 198 example: 199 200 ``` 201 blk0: (exec_ctx:i64, module_ctx:i64) 202 v2:i32 = Iconst_32 -5 203 v3:i32 = Iconst_32 0 204 v4:i32 = Icmp lt_s, v2, v3 205 Brz v4, blk2 206 Jump blk1 207 ``` 208 209 It is pretty easy to see that the comparison in `v4` can be replaced by a 210 constant `1`, because the comparison is between two constant values (-5, 0). 211 Therefore, the block can be rewritten as such: 212 213 ``` 214 blk0: (exec_ctx:i64, module_ctx:i64) 215 v4:i32 = Iconst_32 1 216 Brz v4, blk2 217 Jump blk1 218 ``` 219 220 However, we can now also see that the branch is always taken, and that the block 221 `blk2` is never executed, so even the branch instruction and the constant 222 definition `v4` can be removed: 223 224 ``` 225 blk0: (exec_ctx:i64, module_ctx:i64) 226 Jump blk1 227 ``` 228 229 This is a simple example of constant propagation and dead code elimination 230 occurring within a basic block. However, now `blk2` is unreachable, because 231 there is no other edge in the edge that points to it; thus it can be removed 232 from the control-flow graph. This is an example of dead-code elimination that 233 occurs at the control-flow graph level. 234 235 In practice, because WebAssembly is a compilation target, these simple 236 optimizations are often unnecessary. The optimization passes implemented in 237 wazero are also work-in-progress and, at the time of writing, further work is 238 expected to implement more advanced optimizations. 239 240 ### Code 241 242 Optimization passes are implemented by `ssa.Builder.RunPasses()`. An optimization 243 pass is just a function that takes a ssa builder as a parameter. 244 245 Passes iterate over the basic blocks, and, for each basic block, they iterate 246 over the instructions. Each pass may mutate the basic block by modifying the instructions 247 it contains, or it might change the entire shape of the control-flow graph (e.g. by removing 248 blocks). 249 250 Currently, there are two dead-code elimination passes: 251 252 - `passDeadBlockEliminationOpt` acting at the block-level. 253 - `passDeadCodeEliminationOpt` acting at instruction-level. 254 255 Notably, `passDeadCodeEliminationOpt` also assigns an `InstructionGroupID` to each 256 instruction. This is used to determine whether a sequence of instructions can be 257 replaced by a single machine instruction during the back-end phase. For more details, 258 see also the relevant documentation in `ssa/instructions.go` 259 260 There are also simple constant folding passes such as `passNopInstElimination`, which 261 folds and delete instructions that are essentially no-ops (e.g. shifting by a 0 amount). 262 263 ### Debug Flags 264 265 `wazevoapi.PrintOptimizedSSA` dumps the SSA form to the console after optimization. 266 267 268 ## Block Layout 269 270 As we have seen earlier, the SSA form instructions are contained within basic 271 blocks, and the basic blocks are connected by edges of the control-flow graph. 272 However, machine code is not laid out in a graph, but it is just a linear 273 sequence of instructions. 274 275 Thus, the last step of the front-end is to lay out the basic blocks in a linear 276 sequence. Because each basic block, by design, ends with a control-flow 277 instruction, one of the goals of the block layout phase is to maximize the number of 278 **fall-through opportunities**. A fall-through opportunity occurs when a block ends 279 with a jump instruction whose target is exactly the next block in the 280 sequence. In order to maximize the number of fall-through opportunities, the 281 block layout phase might reorder the basic blocks in the control-flow graph, 282 and transform the control-flow instructions. For instance, it might _invert_ 283 some branching conditions. 284 285 The end goal is to effectively minimize the number of jumps and branches in 286 the machine code that will be generated later. 287 288 289 ### Critical Edges 290 291 Special attention must be taken when a basic block has multiple predecessors, 292 i.e., when it has multiple incoming edges. In particular, an edge between two 293 basic blocks is called a **critical edge** when, at the same time: 294 - the predecessor has multiple successors **and** 295 - the successor has multiple predecessors. 296 297 For instance, in the example below the edge between `BB0` and `BB3` 298 is a critical edge. 299 300 ```goat { width="300" } 301 ┌───────┐ ┌───────┐ 302 │ BB0 │━┓ │ BB1 │ 303 └───────┘ ┃ └───────┘ 304 │ ┃ │ 305 ▼ ┃ ▼ 306 ┌───────┐ ┃ ┌───────┐ 307 │ BB2 │ ┗━▶│ BB3 │ 308 └───────┘ └───────┘ 309 ``` 310 311 In these cases the critical edge is split by introducing a new basic block, 312 called a **trampoline**, where the critical edge was. 313 314 ```goat { width="300" } 315 ┌───────┐ ┌───────┐ 316 │ BB0 │──────┐ │ BB1 │ 317 └───────┘ ▼ └───────┘ 318 │ ┌──────────┐ │ 319 │ │trampoline│ │ 320 ▼ └──────────┘ ▼ 321 ┌───────┐ │ ┌───────┐ 322 │ BB2 │ └────▶│ BB3 │ 323 └───────┘ └───────┘ 324 ``` 325 326 For more details on critical edges read more at 327 328 - https://en.wikipedia.org/wiki/Control-flow_graph 329 - https://nickdesaulniers.github.io/blog/2023/01/27/critical-edge-splitting/ 330 331 ### Example 332 333 At the end of the block layout phase, the laid out SSA for the `abs` function 334 looks as follows: 335 336 ``` 337 blk0: (exec_ctx:i64, module_ctx:i64, v2:i32) 338 v3:i32 = Iconst_32 0x0 339 v4:i32 = Icmp lt_s, v2, v3 340 Brz v4, blk2 341 Jump fallthrough 342 343 blk1: () <-- (blk0) 344 v6:i32 = Iconst_32 0x0 345 v7:i32 = Isub v6, v2 346 Jump blk3, v7 347 348 blk2: () <-- (blk0) 349 Jump fallthrough, v2 350 351 blk3: (v5:i32) <-- (blk1,blk2) 352 Jump blk_ret, v5 353 ``` 354 355 ### Code 356 357 `passLayoutBlocks` implements the block layout phase. 358 359 ### Debug Flags 360 361 - `wazevoapi.PrintBlockLaidOutSSA` dumps the SSA form to the console after block layout. 362 - `wazevoapi.SSALoggingEnabled` logs the transformations that are applied during this phase, 363 such as inverting branching conditions or splitting critical edges. 364 365 <hr> 366 367 * Previous Section: [How the Optimizing Compiler Works](../) 368 * Next Section: [Back-End](../backend/) 369 370 [ssa-blocks]: https://en.wikipedia.org/wiki/Static_single-assignment_form#Block_arguments 371 [llvm-mlir]: https://mlir.llvm.org/docs/Rationale/Rationale/#block-arguments-vs-phi-nodes