github.com/tetratelabs/wazero@v1.7.3-0.20240513003603-48f702e154b5/site/content/docs/how_do_compiler_functions_work.md (about) 1 # How do compiler functions work? 2 3 WebAssembly runtimes let you call functions defined in wasm. How this works in 4 wazero is different depending on your `RuntimeConfig`. 5 6 * `RuntimeConfigCompiler` compiles machine code from your wasm, and jumps to 7 that when invoking a function. 8 * `RuntimeConfigInterpreter` does not generate code. It interprets wasm and 9 executes go statements that correspond to WebAssembly instructions. 10 11 How the compiler works precisely is a large topic, and discussed at length on 12 this page. For more general information on architecture, etc., please refer to 13 [Docs](..). 14 15 ## Engines 16 17 Our [Docs](..) introduce the "engine" concept of wazero. More precisely, there 18 are three types of engines, `Engine`, `ModuleEngine` and `callEngine`. Each has 19 a different scope and role: 20 21 - `Engine` has the same lifetime as `Runtime`. This compiles a `CompiledModule` 22 into machine code, which is both cached and memory-mapped as an executable. 23 - `ModuleEngine` is a virtual machine with the same lifetime as its [Module][api-module]. 24 Notably, this binds each [function instance][spec-function-instance] to 25 corresponding machine code owned by its `Engine`. 26 - `callEngine` is the implementation of [api.Function][api-function] in a 27 [Module][api-module]. This implements `Function.Call(...)` by invoking 28 machine code corresponding to a function instance in `ModuleEngine` and 29 managing the [call stack][call-stack] representing the invocation. 30 31 Here is a diagram showing the relationships of these engines: 32 33 ```goat 34 .-----------> Instantiated module Exported Function 35 /1:N | | 36 / | v 37 | +----------+ v +----------------+ +------------+ 38 | | Engine |--------------->| ModuleEngine |----------------->| callEngine | 39 | +----------+ +----------------+ +------------+ 40 | | | | | 41 . | | | | 42 main.wasm -->| .--------------------->| '-----------------+ | 43 | / | | | 44 v . v v v 45 +--------------+ +-----------------------------------+ +----------+ 46 | Machine Code | |[(func_instance, machine_code),...]| |Call Stack| 47 +--------------+ +-----------------------------------+ +----------+ 48 ^ ^ 49 | | 50 | | 51 +----------------------------------+ 52 | 53 | 54 | 55 Function.Call() 56 ``` 57 58 ## Callbacks from machine code to Go 59 60 Go source can be compiled to invoke native library functions using CGO. 61 However, [CGO is not GO][cgo-not-go]. To call native functions in pure Go, we 62 need a different approach with unique constraints. 63 64 The most notable constraints are: 65 * machine code must not manipulate the Goroutine or system stack 66 * we cannot modify the signal handler of Go at runtime 67 68 ### Handling the call stack 69 70 One constraint is the generated machine code must not manipulate Goroutine 71 (or system) stack. Otherwise, the Go runtime gets corrupted, which results in 72 fatal execution errors. This means we cannot[^1] call Go functions (host 73 functions) directly from machine code (compiled from wasm). This is routinely 74 needed in WebAssembly, as system calls such as WASI are defined in Go, but 75 invoked from Wasm. To handle this, we employ a "trampoline strategy". 76 77 Let's explain the "trampoline strategy" with an example. `random_get` is a host 78 function defined in Go, called from machine code compiled from guest `main` 79 function. Let's say the wasm function corresponding to that is called `_start`. 80 `_start` function is called by wazero by default on `Instantiate`. 81 82 Here is a TinyGo source file describing this. 83 ```go 84 //go:import wasi_snapshot_preview1 random_get 85 func random_get(age int32)package main 86 87 import "unsafe" 88 89 // random_get is a function defined on the host, specifically, the wazero 90 // program written in Go. 91 // 92 //go:wasmimport wasi_snapshot_preview1 random_get 93 func random_get(ptr uintptr, size uint32) (errno uint32) 94 95 // main is compiled to wasm, so this is the guest. Conventionally, this ends up 96 // named `_start`. 97 func main() { 98 // Define a buffer to hold random data 99 size := uint32(8) 100 buf := make([]byte, size) 101 102 // Fill the buffer with random data using an imported host function. 103 // The host needs to know where in guest memory to place the random data. 104 // To communicate this, we have to convert buf to a uintptr. 105 errno := random_get(uintptr(unsafe.Pointer(&buf[0])), size) 106 if errno != 0 { 107 panic(errno) 108 } 109 } 110 ``` 111 112 When `_start` calls `random_get`, it exits execution first. wazero calls the Go 113 function mapped to `random_get` like a usual Go program. Finally, wazero 114 transfers control back to machine code again, resuming `_start` after the call 115 instruction to `random_get`. 116 117 Here's what the "trampoline strategy" looks like in a diagram. For simplicity, 118 we'll say the wasm memory offset of the `buf` is zero, but it will be different 119 in real execution. 120 ```goat 121 | Go | Machine Code 122 | (compiled from main.wasm) 123 | | 124 v 125 | `Instantiate(ctx, mainWasm)` | 126 | | 127 v v | 128 | +----------------+ +------------+ 129 | |func exec_native|-------|--------> |func _start | 130 v +----------------+ +------------+ 131 | | / 132 | Go func call +----------------+ / ptr=0,size=8 133 v .----------------|func exec_native|<------|-------. status=call_host_fn(name=rand_get) 134 | / ptr=0,size=8 +----------------+ exit 135 | v | 136 v +-------------+ +----------------+ 137 | |func rand_get|--------->|func exec_native|-------|-------. 138 | +-------------+ errno=0 +----------------+ continue \ errno=0 139 v | \ 140 | | +------------+ 141 | | |func _start | 142 v | +------------+ 143 ``` 144 145 ### Signal handling 146 147 Code compiled to wasm use [runtime traps][spec-trap] to abort execution. For 148 example, a `panic` compiled with TinyGo becomes a wasm function named 149 `runtime._panic`, which issues an [unreachable][spec-unreachable] instruction 150 after printing the message to STDERR. 151 152 ```go 153 package main 154 155 func main() { 156 panic("help") 157 } 158 ``` 159 160 Native JIT compilers set custom signal handlers for [Wasm runtime traps][spec-trap], 161 such as the [unreachable][spec-unreachable] instruction. However, we cannot 162 safely [modify the signal handler of Go at runtime][signal-handler-discussion]. 163 As described in the first section, wazero always exits the execution of machine 164 code. Machine code sets status when it encounters an `unreachable` instruction. 165 This is read by wazero, which propagates it back with `ErrRuntimeUnreachable`. 166 167 Here's a diagram showing this: 168 ```goat 169 | Go | Machine Code 170 | (compiled from main.wasm) 171 | | 172 v 173 | `Instantiate(ctx, mainWasm)` | 174 | | 175 v v | 176 | +----------------+ +------------+ 177 | |func exec_native|---------|-------------------------> |func _start | 178 v +----------------+ +------------+ 179 | | | 180 | +----------------+ exit +--------------------+ 181 v |func exec_native|<--------|---------------------- |func runtime._panic | 182 | +----------------+ status=unreachable +--------------------+ 183 | | | 184 v | 185 | panic(WasmRuntimeErrUnreachable) | 186 ``` 187 188 One thing you will notice above is that the calls between wasm functions, such 189 as from `_start` to `runtime._panic` do not use a trampoline. The trampoline 190 strategy is only used between wasm and the host. 191 192 ## Summary 193 194 When an exported wasm function is called, using a wazero API, such as 195 `Function.Call()`, wazero allocates a `callEngine` and starts invocation. This 196 begins with jumping to machine code compiled from the Wasm binary. When that 197 code makes a callback to the host, it exits execution, passing control back to 198 `exec_native` which then calls a Go function and resumes the machine code 199 afterwards. In the face of Wasm runtime errors, we exit the machine code 200 execution with the proper status, and return the control back to `exec_native` 201 function, just like host function calls. Just instead of calling a Go function, 202 we call `panic` with a corresponding error. This jumping is why the strategy is 203 called a trampoline, and only used between the guest wasm and the host running 204 it. 205 206 [call-stack]: https://en.wikipedia.org/wiki/Call_stack 207 [api-function]: https://pkg.go.dev/github.com/tetratelabs/wazero@v1.0.0-rc.1/api#Function 208 [api-module]: https://pkg.go.dev/github.com/tetratelabs/wazero@v1.0.0-rc.1/api#Module 209 [spec-function-instance]: https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#function-instances%E2%91%A0 210 [spec-trap]: https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#trap 211 [spec-unreachable]: https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#syntax-instr-control 212 [signal-handler-discussion]: https://gophers.slack.com/archives/C1C1YSQBT/p1675992411241409 213 [cgo-not-go]: https://www.youtube.com/watch?v=PAAkCSZUG1c&t=757s 214 215 [^1]: it's technically possible to call it directly, but that would come with performing "stack switching" in the native code. 216 It's almost the same as what wazero does: exiting the execution of machine code, then call the target Go function (using the caller of machine code as a "trampoline").