github.com/wasilibs/wazerox@v0.0.0-20240124024944-4923be63ab5f/site/content/docs/how_do_compiler_functions_work.md (about)

     1  # How do compiler functions work?
     2  
     3  WebAssembly runtimes let you call functions defined in wasm. How this works in
     4  wazero is different depending on your `RuntimeConfig`.
     5  
     6  * `RuntimeConfigCompiler` compiles machine code from your wasm, and jumps to
     7    that when invoking a function.
     8  * `RuntimeConfigInterpreter` does not generate code. It interprets wasm and
     9    executes go statements that correspond to WebAssembly instructions.
    10  
    11  How the compiler works precisely is a large topic, and discussed at length on
    12  this page. For more general information on architecture, etc., please refer to
    13  [Docs](..).
    14  
    15  ## Engines
    16  
    17  Our [Docs](..) introduce the "engine" concept of wazero. More precisely, there
    18  are three types of engines, `Engine`, `ModuleEngine` and `callEngine`. Each has
    19  a different scope and role:
    20  
    21  - `Engine` has the same lifetime as `Runtime`. This compiles a `CompiledModule`
    22    into machine code, which is both cached and memory-mapped as an executable.
    23  - `ModuleEngine` is a virtual machine with the same lifetime as its [Module][api-module].
    24    Notably, this binds each [function instance][spec-function-instance] to
    25    corresponding machine code owned by its `Engine`.
    26  - `callEngine` is the implementation of [api.Function][api-function] in a
    27    [Module][api-module]. This implements `Function.Call(...)` by invoking
    28    machine code corresponding to a function instance in `ModuleEngine` and
    29    managing the [call stack][call-stack] representing the invocation.
    30  
    31  Here is a diagram showing the relationships of these engines:
    32  
    33  ```goat
    34        .-----------> Instantiated module                                 Exported Function
    35       /1:N                   |                                                  |
    36      /                       |                                                  v
    37     |     +----------+       v        +----------------+                  +------------+
    38     |     |  Engine  |--------------->|  ModuleEngine  |----------------->| callEngine |
    39     |     +----------+                +----------------+                  +------------+
    40     |          |                               |                            |      |
    41     .          |                               |                            |      |
    42   main.wasm -->|        .--------------------->|          '-----------------+      |
    43                |       /                       |          |                        |
    44                v      .                        v          v                        v
    45        +--------------+      +-----------------------------------+            +----------+
    46        | Machine Code |      |[(func_instance, machine_code),...]|            |Call Stack|
    47        +--------------+      +-----------------------------------+            +----------+
    48                                                 ^                                  ^
    49                                                 |                                  |
    50                                                 |                                  |
    51                                                 +----------------------------------+
    52                                                                 |
    53                                                                 |
    54                                                                 |
    55                                                          Function.Call()
    56  ```
    57  
    58  ## Callbacks from machine code to Go
    59  
    60  Go source can be compiled to invoke native library functions using CGO.
    61  However, [CGO is not GO][cgo-not-go]. To call native functions in pure Go, we
    62  need a different approach with unique constraints.
    63  
    64  The most notable constraints are:
    65  * machine code must not manipulate the Goroutine or system stack
    66  * we cannot modify the signal handler of Go at runtime
    67  
    68  ### Handling the call stack
    69  
    70  One constraint is the generated machine code must not manipulate Goroutine
    71  (or system) stack. Otherwise, the Go runtime gets corrupted, which results in
    72  fatal execution errors. This means we cannot[^1] call Go functions (host
    73  functions) directly from machine code (compiled from wasm). This is routinely
    74  needed in WebAssembly, as system calls such as WASI are defined in Go, but
    75  invoked from Wasm. To handle this, we employ a "trampoline strategy".
    76  
    77  Let's explain the "trampoline strategy" with an example. `random_get` is a host
    78  function defined in Go, called from machine code compiled from guest `main`
    79  function. Let's say the wasm function corresponding to that is called `_start`.
    80  `_start` function is called by wazero by default on `Instantiate`.
    81  
    82  Here is a TinyGo source file describing this.
    83  ```go
    84  //go:import wasi_snapshot_preview1 random_get
    85  func random_get(age int32)package main
    86  
    87  import "unsafe"
    88  
    89  // random_get is a function defined on the host, specifically, the wazero
    90  // program written in Go.
    91  //
    92  //go:wasmimport wasi_snapshot_preview1 random_get
    93  func random_get(ptr uintptr, size uint32) (errno uint32)
    94  
    95  // main is compiled to wasm, so this is the guest. Conventionally, this ends up
    96  // named `_start`.
    97  func main() {
    98      // Define a buffer to hold random data
    99  	size := uint32(8)
   100      buf := make([]byte, size)
   101  
   102  	// Fill the buffer with random data using an imported host function.
   103      // The host needs to know where in guest memory to place the random data.
   104  	// To communicate this, we have to convert buf to a uintptr.
   105      errno := random_get(uintptr(unsafe.Pointer(&buf[0])), size)
   106      if errno != 0 {
   107          panic(errno)
   108      }
   109  }
   110  ```
   111  
   112  When `_start` calls `random_get`, it exits execution first. wazero calls the Go
   113  function mapped to `random_get` like a usual Go program. Finally, wazero
   114  transfers control back to machine code again, resuming `_start` after the call
   115  instruction to `random_get`.
   116  
   117  Here's what the "trampoline strategy" looks like in a diagram. For simplicity,
   118  we'll say the wasm memory offset of the `buf` is zero, but it will be different
   119  in real execution.
   120  ```goat
   121     |                                     Go              |           Machine Code
   122     |                                                           (compiled from main.wasm)
   123     |                                                     |
   124     v
   125     |                        `Instantiate(ctx, mainWasm)` |
   126     |                                     |
   127     v                                     v               |
   128     |                            +----------------+                  +------------+
   129     |                            |func exec_native|-------|--------> |func _start |
   130     v                            +----------------+                  +------------+
   131     |                                                     |         /
   132     |            Go func call    +----------------+                / ptr=0,size=8
   133     v           .----------------|func exec_native|<------|-------. status=call_host_fn(name=rand_get)
   134     |          /  ptr=0,size=8   +----------------+     exit
   135     |         v                                           |
   136     v   +-------------+          +----------------+
   137     |   |func rand_get|--------->|func exec_native|-------|-------.
   138     |   +-------------+ errno=0  +----------------+    continue    \ errno=0
   139     v                                                     |         \
   140     |                                                     |          +------------+
   141     |                                                     |          |func _start |
   142     v                                                     |          +------------+
   143  ```
   144  
   145  ### Signal handling
   146  
   147  Code compiled to wasm use [runtime traps][spec-trap] to abort execution. For
   148  example, a `panic` compiled with TinyGo becomes a wasm function named
   149  `runtime._panic`, which issues an [unreachable][spec-unreachable] instruction
   150  after printing the message to STDERR.
   151  
   152  ```go
   153  package main
   154  
   155  func main() {
   156  	panic("help")
   157  }
   158  ```
   159  
   160  Native JIT compilers set custom signal handlers for [Wasm runtime traps][spec-trap],
   161  such as the [unreachable][spec-unreachable] instruction. However, we cannot
   162  safely [modify the signal handler of Go at runtime][signal-handler-discussion].
   163  As described in the first section, wazero always exits the execution of machine
   164  code. Machine code sets status when it encounters an `unreachable` instruction.
   165  This is read by wazero, which propagates it back with `ErrRuntimeUnreachable`.
   166  
   167  Here's a diagram showing this:
   168  ```goat
   169     |                               Go                 |                             Machine Code
   170     |                                                                          (compiled from main.wasm)
   171     |                                                  |
   172     v
   173     |                   `Instantiate(ctx, mainWasm)`   |
   174     |                                |
   175     v                                v                 |
   176     |                       +----------------+                                     +------------+
   177     |                       |func exec_native|---------|-------------------------> |func _start |
   178     v                       +----------------+                                     +------------+
   179     |                                                  |                                 |
   180     |                       +----------------+                  exit           +--------------------+
   181     v                       |func exec_native|<--------|---------------------- |func runtime._panic |
   182     |                       +----------------+            status=unreachable   +--------------------+
   183     |                              |                   |
   184     v                              |
   185     |                panic(WasmRuntimeErrUnreachable)  |
   186  ```
   187  
   188  One thing you will notice above is that the calls between wasm functions, such
   189  as from `_start` to `runtime._panic` do not use a trampoline. The trampoline
   190  strategy is only used between wasm and the host.
   191  
   192  ## Summary
   193  
   194  When an exported wasm function is called, using a wazero API, such as
   195  `Function.Call()`, wazero allocates a `callEngine` and starts invocation. This
   196  begins with jumping to machine code compiled from the Wasm binary. When that
   197  code makes a callback to the host, it exits execution, passing control back to
   198  `exec_native` which then calls a Go function and resumes the machine code
   199  afterwards. In the face of Wasm runtime errors, we exit the machine code
   200  execution with the proper status, and return the control back to `exec_native`
   201  function, just like host function calls. Just instead of calling a Go function,
   202  we call `panic` with a corresponding error. This jumping is why the strategy is
   203  called a trampoline, and only used between the guest wasm and the host running
   204  it.
   205  
   206  For more details, see [RATIONALE.md][compiler-rationale].
   207  
   208  [call-stack]: https://en.wikipedia.org/wiki/Call_stack
   209  [api-function]: https://pkg.go.dev/github.com/tetratelabs/wazero@v1.0.0-rc.1/api#Function
   210  [api-module]: https://pkg.go.dev/github.com/tetratelabs/wazero@v1.0.0-rc.1/api#Module
   211  [spec-function-instance]: https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#function-instances%E2%91%A0
   212  [spec-trap]: https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#trap
   213  [spec-unreachable]: https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#syntax-instr-control
   214  [compiler-rationale]: https://github.com/tetratelabs/wazero/blob/v1.0.0-rc.1/internal/engine/compiler/RATIONALE.md
   215  [signal-handler-discussion]: https://gophers.slack.com/archives/C1C1YSQBT/p1675992411241409
   216  [cgo-not-go]: https://www.youtube.com/watch?v=PAAkCSZUG1c&t=757s
   217  
   218  [^1]: it's technically possible to call it directly, but that would come with performing "stack switching" in the native code.
   219    It's almost the same as what wazero does: exiting the execution of machine code, then call the target Go function (using the caller of machine code as a "trampoline").