github.com/tetratelabs/wazero@v1.7.3-0.20240513003603-48f702e154b5/site/content/docs/how_the_optimizing_compiler_works/frontend.md

github.com/tetratelabs/wazero@v1.7.3-0.20240513003603-48f702e154b5/site/content/docs/how_the_optimizing_compiler_works/frontend.md (about)

     1  +++
     2  title = "How the Optimizing Compiler Works: Front-End"
     3  layout = "single"
     4  +++
     5  
     6  In this section we will discuss the phases in the front-end of the optimizing compiler:
     7  
     8  - [Translation to SSA](#translation-to-ssa)
     9  - [Optimization](#optimization)
    10  - [Block Layout](#block-layout)
    11  
    12  Every section includes an explanation of the phase; the subsection **Code**
    13  will include high-level pointers to functions and packages; the subsection **Debug Flags**
    14  indicates the flags that can be used to enable advanced logging of the phase.
    15  
    16  ## Translation to SSA
    17  
    18  We mentioned earlier that wazero uses an internal representation called an "SSA"
    19  form or "Static Single-Assignment" form, but we never explained what that is.
    20  
    21  In short terms, every program, or, in our case, every Wasm function, can be
    22  translated in a control-flow graph. The control-flow graph is a directed graph where
    23  each node is a sequence of statements that do not contain a control flow instruction,
    24  called a **basic block**. Instead, control-flow instructions are translated into edges.
    25  
    26  For instance, take the following implementation of the `abs` function:
    27  
    28  ```wasm
    29  (module
    30    (func (;0;) (param i32) (result i32)
    31       (if (result i32) (i32.lt_s (local.get 0) (i32.const 0))
    32          (then
    33              (i32.sub (i32.const 0) (local.get 0)))
    34          (else
    35              (local.get 0))
    36       )
    37    )
    38    (export "f" (func 0))
    39  )
    40  ```
    41  
    42  This is translated to the following block diagram:
    43  
    44  ```goat {width="100%" height="500"}
    45                 +---------------------------------------------+
    46                 |blk0: (exec_ctx:i64, module_ctx:i64, v2:i32) |
    47                 |    v3:i32 = Iconst_32 0x0                   |
    48                 |    v4:i32 = Icmp lt_s, v2, v3               |
    49                 |    Brz v4, blk2                             |
    50                 |    Jump blk1                                |
    51                 +---------------------------------------------+
    52                                        |
    53                                        |
    54                        +---`(v4 != 0)`-+-`(v4 == 0)`---+
    55                        |                               |
    56                        v                               v
    57          +---------------------------+   +---------------------------+
    58          |blk1: () <-- (blk0)        |   |blk2: () <-- (blk0)        |
    59          |    v6:i32 = Iconst_32 0x0 |   |    Jump blk3, v2          |
    60          |    v7:i32 = Isub v6, v2   |   |                           |
    61          |    Jump blk3, v7          |   |                           |
    62          +---------------------------+   +---------------------------+
    63                        |                               |
    64                        |                               |
    65                        +-`{v5 := v7}`--+--`{v5 := v2}`-+
    66                                        |
    67                                        v
    68                        +------------------------------+
    69                        |blk3: (v5:i32) <-- (blk1,blk2)|
    70                        |    Jump blk_ret, v5          |
    71                        +------------------------------+
    72                                        |
    73                                   {return v5}
    74                                        |
    75                                        v
    76  ```
    77  
    78  We use the ["block argument" variant of SSA][ssa-blocks], which is also the same
    79  representation [used in LLVM's MLIR][llvm-mlir]. In this variant, each block
    80  takes a list of arguments. Each block ends with a branching instruction (Branch, Return,
    81  Jump, etc...) with an optional list of arguments; these arguments are assigned
    82  to the target block's arguments like a function.
    83  
    84  Consider the first block `blk0`.
    85  
    86  ```
    87  blk0: (exec_ctx:i64, module_ctx:i64, v2:i32)
    88      v3:i32 = Iconst_32 0x0
    89      v4:i32 = Icmp lt_s, v2, v3
    90      Brz v4, blk2
    91      Jump blk1
    92  ```
    93  
    94  You will notice that, compared to the original function, it takes two extra
    95  parameters (`exec_ctx` and `module_ctx`):
    96  
    97  1. `exec_ctx` is a pointer to `wazevo.executionContext`. This is used to exit the execution
    98     in the face of traps or for host function calls.
    99  2. `module_ctx`: pointer to `wazevo.moduleContextOpaque`. This is used, among other things,
   100     to access memory.
   101  
   102  It then takes one parameter `v2`, corresponding to the function parameter, and
   103  it defines two variables `v3`, `v4`. `v3` is the constant 0, `v4` is the result of
   104  comparing `v2` to `v3` using the `i32.lt_s` instruction. Then, it branches to
   105  `blk2` if `v4` is zero, otherwise it jumps to `blk1`.
   106  
   107  You might also have noticed that the instructions do not correspond strictly to
   108  the original Wasm opcodes. This is because, similarly to the wazero IR used by
   109  the old compiler, this is a custom IR.
   110  
   111  You will also notice that, _on the right-hand side of the assignments_ of any statement,
   112  no name occurs _twice_: this is why this form is called **single-assignment**.
   113  
   114  Finally, notice how `blk1` and `blk2` end with a jump to the last block `blk3`.
   115  
   116  ```
   117  blk1: ()
   118      ...
   119  	Jump blk3, v7
   120  
   121  blk2: ()
   122  	Jump blk3, v2
   123  
   124  blk3: (v5:i32)
   125      ...
   126  ```
   127  
   128  `blk3` takes an argument `v5`: `blk1` jumps to `bl3` with `v7` and `blk2` jumps
   129  to `blk3` with `v2`, meaning `v5` is effectively a rename of `v5` or `v7`,
   130  depending on the originating block. If you are familiar with the traditional
   131  representation of an SSA form, you will recognize that the role of block
   132  arguments is equivalent to the role of the *Phi (Φ) function*, a special
   133  function that returns a different value depending on the incoming edge; e.g., in
   134  this case: `v5 := Φ(v7, v2)`.
   135  
   136  ### Code
   137  
   138  The relevant APIs can be found under sub-package `ssa` and `frontend`.
   139  In the code, the terms *lower* or *lowering* are often used to indicate a mapping or a translation,
   140  because such transformations usually correspond to targeting a lower abstraction level.
   141  
   142  - Basic Blocks are represented by the type `ssa.Block`.
   143  - The SSA form is constructed using an `ssa.Builder`. The `ssa.Builder` is instantiated
   144    in the context of `wasm.Engine.CompileModule()`, more specifically in the method
   145    `frontend.Compiler.LowerToSSA()`.
   146  - The mapping between Wasm opcodes and the IR happens in `frontend/lower.go`,
   147    more specifically in the method `frontend.Compiler.lowerCurrentOpcode()`.
   148  - Because they are semantically equivalent, in the code, basic block parameters
   149    are sometimes referred to as "Phi values".
   150  
   151  #### Instructions and Values
   152  
   153  An `ssa.Instruction` is a single instruction in the SSA form. Each instruction might
   154  consume zero or more `ssa.Value`s, and it usually produces a single `ssa.Value`; some
   155  instructions may not produce any value (for instance, a `Jump` instruction).
   156  An `ssa.Value` is an abstraction that represents a typed name binding, and it is used
   157  to represent the result of an instruction, or the input to an instruction.
   158  
   159  For instance:
   160  
   161  ```
   162  blk1: () <-- (blk0)
   163      v6:i32 = Iconst_32 0x0
   164      v7:i32 = Isub v6, v2
   165      Jump blk3, v7
   166  ```
   167  
   168  `Iconst_32` takes no input value and produce value `v6`; `Isub` takes two input values (`v6`, `v2`)
   169  and produces value `v7`; `Jump` takes one input value (`v7`) and produces no value. All
   170  such values have the `i32` type. The wazero SSA's type system (`ssa.Type`) allows the following types:
   171  
   172  - `i32`: 32-bit integer
   173  - `i64`: 64-bit integer
   174  - `f32`: 32-bit floating point
   175  - `f64`: 64-bit floating point
   176  - `v128`: 128-bit SIMD vector
   177  
   178  For simplicity, we don't have a dedicated type for pointers. Instead, we use the `i64`
   179  type to represent pointer values since we only support 64-bit architectures,
   180  unlike traditional compilers such as LLVM.
   181  
   182  Values and instructions are both allocated from pools to minimize memory allocations.
   183  
   184  ### Debug Flags
   185  
   186  - `wazevoapi.PrintSSA` dumps the SSA form to the console.
   187  - `wazevoapi.FrontEndLoggingEnabled` dumps progress of the translation between Wasm
   188    opcodes and SSA instructions to the console.
   189  
   190  ## Optimization
   191  
   192  The SSA form makes it easier to perform a number of optimizations. For instance,
   193  we can perform constant propagation, dead code elimination, and common
   194  subexpression elimination. These optimizations either act upon the instructions
   195  within a basic block, or they act upon the control-flow graph as a whole.
   196  
   197  On a high, level, consider the following basic block, derived from the previous
   198  example:
   199  
   200  ```
   201  blk0: (exec_ctx:i64, module_ctx:i64)
   202      v2:i32 = Iconst_32 -5
   203      v3:i32 = Iconst_32  0
   204      v4:i32 = Icmp lt_s, v2, v3
   205      Brz v4, blk2
   206      Jump blk1
   207  ```
   208  
   209  It is pretty easy to see that the comparison in `v4` can be replaced by a
   210  constant `1`, because the comparison is between two constant values (-5, 0).
   211  Therefore, the block can be rewritten as such:
   212  
   213  ```
   214  blk0: (exec_ctx:i64, module_ctx:i64)
   215      v4:i32 = Iconst_32 1
   216      Brz v4, blk2
   217      Jump blk1
   218  ```
   219  
   220  However, we can now also see that the branch is always taken, and that the block
   221  `blk2` is never executed, so even the branch instruction and the constant
   222  definition `v4` can be removed:
   223  
   224  ```
   225  blk0: (exec_ctx:i64, module_ctx:i64)
   226      Jump blk1
   227  ```
   228  
   229  This is a simple example of constant propagation and dead code elimination
   230  occurring within a basic block. However, now  `blk2` is unreachable, because
   231  there is no other edge in the edge that points to it; thus it can be removed
   232  from the control-flow graph. This is an example of dead-code elimination that
   233  occurs at the control-flow graph level.
   234  
   235  In practice, because WebAssembly is a compilation target, these simple
   236  optimizations are often unnecessary. The optimization passes implemented in
   237  wazero are also work-in-progress and, at the time of writing, further work is
   238  expected to implement more advanced optimizations.
   239  
   240  ### Code
   241  
   242  Optimization passes are implemented by `ssa.Builder.RunPasses()`. An optimization
   243  pass is just a function that takes a ssa builder as a parameter.
   244  
   245  Passes iterate over the basic blocks, and, for each basic block, they iterate
   246  over the instructions. Each pass may mutate the basic block by modifying the instructions
   247  it contains, or it might change the entire shape of the control-flow graph (e.g. by removing
   248  blocks).
   249  
   250  Currently, there are two dead-code elimination passes:
   251  
   252  - `passDeadBlockEliminationOpt` acting at the block-level.
   253  - `passDeadCodeEliminationOpt` acting at instruction-level.
   254  
   255  Notably, `passDeadCodeEliminationOpt` also assigns an `InstructionGroupID` to each
   256  instruction. This is used to determine whether a sequence of instructions can be
   257  replaced by a single machine instruction during the back-end phase. For more details,
   258  see also the relevant documentation in `ssa/instructions.go`
   259  
   260  There are also simple constant folding passes such as `passNopInstElimination`, which
   261  folds and delete instructions that are essentially no-ops (e.g. shifting by a 0 amount).
   262  
   263  ### Debug Flags
   264  
   265  `wazevoapi.PrintOptimizedSSA` dumps the SSA form to the console after optimization.
   266  
   267  
   268  ## Block Layout
   269  
   270  As we have seen earlier, the SSA form instructions are contained within basic
   271  blocks, and the basic blocks are connected by edges of the control-flow graph.
   272  However, machine code is not laid out in a graph, but it is just a linear
   273  sequence of instructions.
   274  
   275  Thus, the last step of the front-end is to lay out the basic blocks in a linear
   276  sequence. Because each basic block, by design, ends with a control-flow
   277  instruction, one of the goals of the block layout phase is to maximize the number of
   278  **fall-through opportunities**. A fall-through opportunity occurs when a block ends
   279  with a jump instruction whose target is exactly the next block in the
   280  sequence. In order to maximize the number of fall-through opportunities, the
   281  block layout phase might reorder the basic blocks in the control-flow graph,
   282  and transform the control-flow instructions. For instance, it might _invert_
   283  some branching conditions.
   284  
   285  The end goal is to effectively minimize the number of jumps and branches in
   286  the machine code that will be generated later.
   287  
   288  
   289  ### Critical Edges
   290  
   291  Special attention must be taken when a basic block has multiple predecessors,
   292  i.e., when it has multiple incoming edges. In particular, an edge between two
   293  basic blocks is called a **critical edge** when, at the same time:
   294  - the predecessor has multiple successors **and**
   295  - the successor has multiple predecessors.
   296  
   297  For instance, in the example below the edge between `BB0` and `BB3`
   298  is a critical edge.
   299  
   300  ```goat { width="300" }
   301  ┌───────┐    ┌───────┐
   302  │  BB0  │━┓  │  BB1  │
   303  └───────┘ ┃  └───────┘
   304      │     ┃      │
   305      ▼     ┃      ▼
   306  ┌───────┐ ┃  ┌───────┐
   307  │  BB2  │ ┗━▶│  BB3  │
   308  └───────┘    └───────┘
   309  ```
   310  
   311  In these cases the critical edge is split by introducing a new basic block,
   312  called a **trampoline**, where the critical edge was.
   313  
   314  ```goat  { width="300" }
   315  ┌───────┐            ┌───────┐
   316  │  BB0  │──────┐     │  BB1  │
   317  └───────┘      ▼     └───────┘
   318      │    ┌──────────┐    │
   319      │    │trampoline│    │
   320      ▼    └──────────┘    ▼
   321  ┌───────┐      │     ┌───────┐
   322  │  BB2  │      └────▶│  BB3  │
   323  └───────┘            └───────┘
   324  ```
   325  
   326  For more details on critical edges read more at
   327  
   328  - https://en.wikipedia.org/wiki/Control-flow_graph
   329  - https://nickdesaulniers.github.io/blog/2023/01/27/critical-edge-splitting/
   330  
   331  ### Example
   332  
   333  At the end of the block layout phase, the laid out SSA for the `abs` function
   334  looks as follows:
   335  
   336  ```
   337  blk0: (exec_ctx:i64, module_ctx:i64, v2:i32)
   338  	v3:i32 = Iconst_32 0x0
   339  	v4:i32 = Icmp lt_s, v2, v3
   340  	Brz v4, blk2
   341  	Jump fallthrough
   342  
   343  blk1: () <-- (blk0)
   344  	v6:i32 = Iconst_32 0x0
   345  	v7:i32 = Isub v6, v2
   346  	Jump blk3, v7
   347  
   348  blk2: () <-- (blk0)
   349  	Jump fallthrough, v2
   350  
   351  blk3: (v5:i32) <-- (blk1,blk2)
   352  	Jump blk_ret, v5
   353  ```
   354  
   355  ### Code
   356  
   357  `passLayoutBlocks` implements the block layout phase.
   358  
   359  ### Debug Flags
   360  
   361  - `wazevoapi.PrintBlockLaidOutSSA` dumps the SSA form to the console after block layout.
   362  - `wazevoapi.SSALoggingEnabled` logs the transformations that are applied during this phase,
   363    such as inverting branching conditions or splitting critical edges.
   364  
   365  <hr>
   366  
   367  * Previous Section: [How the Optimizing Compiler Works](../)
   368  * Next Section: [Back-End](../backend/)
   369  
   370  [ssa-blocks]: https://en.wikipedia.org/wiki/Static_single-assignment_form#Block_arguments
   371  [llvm-mlir]: https://mlir.llvm.org/docs/Rationale/Rationale/#block-arguments-vs-phi-nodes