github.com/tetratelabs/wazero@v1.2.1/RATIONALE.md

github.com/tetratelabs/wazero@v1.2.1/RATIONALE.md (about)

     1  # Notable rationale of wazero
     2  
     3  ## Zero dependencies
     4  
     5  Wazero has zero dependencies to differentiate itself from other runtimes which
     6  have heavy impact usually due to CGO. By avoiding CGO, wazero avoids
     7  prerequisites such as shared libraries or libc, and lets users keep features
     8  like cross compilation.
     9  
    10  Avoiding go.mod dependencies reduces interference on Go version support, and
    11  size of a statically compiled binary. However, doing so brings some
    12  responsibility into the project.
    13  
    14  Go's native platform support is good: We don't need platform-specific code to
    15  get monotonic time, nor do we need much work to implement certain features
    16  needed by our compiler such as `mmap`. That said, Go does not support all
    17  common operating systems to the same degree. For example, Go 1.18 includes
    18  `Mprotect` on Linux and Darwin, but not FreeBSD.
    19  
    20  The general tradeoff the project takes from a zero dependency policy is more
    21  explicit support of platforms (in the compiler runtime), as well a larger and
    22  more technically difficult codebase.
    23  
    24  At some point, we may allow extensions to supply their own platform-specific
    25  hooks. Until then, one end user impact/tradeoff is some glitches trying
    26  untested platforms (with the Compiler runtime).
    27  
    28  ### Why do we use CGO to implement system calls on darwin?
    29  
    30  wazero is dependency and CGO free by design. In some cases, we have code that
    31  can optionally use CGO, but retain a fallback for when that's disabled. The only
    32  operating system (`GOOS`) we use CGO by default in is `darwin`.
    33  
    34  Unlike other operating systems, regardless of `CGO_ENABLED`, Go always uses
    35  "CGO" mechanisms in the runtime layer of `darwin`. This is explained in
    36  [Statically linked binaries on Mac OS X](https://developer.apple.com/library/archive/qa/qa1118/_index.html#//apple_ref/doc/uid/DTS10001666):
    37  
    38  > Apple does not support statically linked binaries on Mac OS X. A statically
    39  > linked binary assumes binary compatibility at the kernel system call
    40  > interface, and we do not make any guarantees on that front. Rather, we strive
    41  > to ensure binary compatibility in each dynamically linked system library and
    42  > framework.
    43  
    44  This plays to our advantage for system calls that aren't yet exposed in the Go
    45  standard library, notably `futimens` for nanosecond-precision timestamp
    46  manipulation.
    47  
    48  ### Why not x/sys
    49  
    50  Going beyond Go's SDK limitations can be accomplished with their [x/sys library](https://pkg.go.dev/golang.org/x/sys/unix).
    51  For example, this includes `zsyscall_freebsd_amd64.go` missing from the Go SDK.
    52  
    53  However, like all dependencies, x/sys is a source of conflict. For example,
    54  x/sys had to be in order to upgrade to Go 1.18.
    55  
    56  If we depended on x/sys, we could get more precise functionality needed for
    57  features such as clocks or more platform support for the compiler runtime.
    58  
    59  That said, formally supporting an operating system may still require testing as
    60  even use of x/sys can require platform-specifics. For example, [mmap-go](https://github.com/edsrzf/mmap-go)
    61  uses x/sys, but also mentions limitations, some not surmountable with x/sys
    62  alone.
    63  
    64  Regardless, we may at some point introduce a separate go.mod for users to use
    65  x/sys as a platform plugin without forcing all users to maintain that
    66  dependency.
    67  
    68  ## Project structure
    69  
    70  wazero uses internal packages extensively to balance API compatability desires for end users with the need to safely
    71  share internals between compilers.
    72  
    73  End-user packages include `wazero`, with `Config` structs, `api`, with shared types, and the built-in `wasi` library.
    74  Everything else is internal.
    75  
    76  We put the main program for wazero into a directory of the same name to match conventions used in `go install`,
    77  notably the name of the folder becomes the binary name. We chose to use `cmd/wazero` as it is common practice
    78  and less surprising than `wazero/wazero`.
    79  
    80  ### Internal packages
    81  
    82  Most code in wazero is internal, and it is acknowledged that this prevents external implementation of facets such as
    83  compilers or decoding. It also prevents splitting this code into separate repositories, resulting in a larger monorepo.
    84  This also adds work as more code needs to be centrally reviewed.
    85  
    86  However, the alternative is neither secure nor viable. To allow external implementation would require exporting symbols
    87  public, such as the `CodeSection`, which can easily create bugs. Moreover, there's a high drift risk for any attempt at
    88  external implementations, compounded not just by wazero's code organization, but also the fast moving Wasm and WASI
    89  specifications.
    90  
    91  For example, implementing a compiler correctly requires expertise in Wasm, Golang and assembly. This requires deep
    92  insight into how internals are meant to be structured and the various tiers of testing required for `wazero` to result
    93  in a high quality experience. Even if someone had these skills, supporting external code would introduce variables which
    94  are constants in the central one. Supporting an external codebase is harder on the project team, and could starve time
    95  from the already large burden on the central codebase.
    96  
    97  The tradeoffs of internal packages are a larger codebase and responsibility to implement all standard features. It also
    98  implies thinking about extension more as forking is not viable for reasons above also. The primary mitigation of these
    99  realities are friendly OSS licensing, high rigor and a collaborative spirit which aim to make contribution in the shared
   100  codebase productive.
   101  
   102  ### Avoiding cyclic dependencies
   103  
   104  wazero shares constants and interfaces with internal code by a sharing pattern described below:
   105  * shared interfaces and constants go in one package under root: `api`.
   106  * user APIs and structs depend on `api` and go into the root package `wazero`.
   107    * e.g. `InstantiateModule` -> `/wasm.go` depends on the type `api.Module`.
   108  * implementation code can also depend on `api` in a corresponding package under `/internal`.
   109    * Ex  package `wasm` -> `/internal/wasm/*.go` and can depend on the type `api.Module`.
   110  
   111  The above guarantees no cyclic dependencies at the cost of having to re-define symbols that exist in both packages.
   112  For example, if `wasm.Store` is a type the user needs access to, it is narrowed by a cover type in the `wazero`:
   113  
   114  ```go
   115  type runtime struct {
   116  	s *wasm.Store
   117  }
   118  ```
   119  
   120  This is not as bad as it sounds as mutations are only available via configuration. This means exported functions are
   121  limited to only a few functions.
   122  
   123  ### Avoiding security bugs
   124  
   125  In order to avoid security flaws such as code insertion, nothing in the public API is permitted to write directly to any
   126  mutable symbol in the internal package. For example, the package `api` is shared with internal code. To ensure
   127  immutability, the `api` package cannot contain any mutable public symbol, such as a slice or a struct with an exported
   128  field.
   129  
   130  In practice, this means shared functionality like memory mutation need to be implemented by interfaces.
   131  
   132  Here are some examples:
   133  * `api.Memory` protects access by exposing functions like `WriteFloat64Le` instead of exporting a buffer (`[]byte`).
   134  * There is no exported symbol for the `[]byte` representing the `CodeSection`
   135  
   136  Besides security, this practice prevents other bugs and allows centralization of validation logic such as decoding Wasm.
   137  
   138  ## API Design
   139  
   140  ### Why is `context.Context` inconsistent?
   141  
   142  It may seem strange that only certain API have an initial `context.Context`
   143  parameter. We originally had a `context.Context` for anything that might be
   144  traced, but it turned out to be only useful for lifecycle and host functions.
   145  
   146  For instruction-scoped aspects like memory updates, a context parameter is too
   147  fine-grained and also invisible in practice. For example, most users will use
   148  the compiler engine, and its memory, global or table access will never use go's
   149  context.
   150  
   151  ### Why does `api.ValueType` map to uint64?
   152  
   153  WebAssembly allows functions to be defined either by the guest or the host,
   154  with signatures expressed as WebAssembly types. For example, `i32` is a 32-bit
   155  type which might be interpreted as signed. Function signatures can have zero or
   156  more parameters or results even if WebAssembly 1.0 allows up to one result.
   157  
   158  The guest can export functions, so that the host can call it. In the case of
   159  wazero, the host is Go and an exported function can be called via
   160  `api.Function`. `api.Function` allows users to supply parameters and read
   161  results as a slice of uint64. For example, if there are no results, an empty
   162  slice is returned. The user can learn the signature via `FunctionDescription`,
   163  which returns the `api.ValueType` corresponding to each parameter or result.
   164  `api.ValueType` defines the mapping of WebAssembly types to `uint64` values for
   165  reason described in this section. The special case of `v128` is also mentioned
   166  below.
   167  
   168  wazero maps each value type to a uint64 values because it holds the largest
   169  type in WebAssembly 1.0 (i64). A slice allows you to express empty (e.g. a
   170  nullary signature), for example a start function.
   171  
   172  Here's an example of calling a function, noting this syntax works for both a
   173  signature `(param i32 i32) (result i32)` and `(param i64 i64) (result i64)`
   174  ```go
   175  x, y := uint64(1), uint64(2)
   176  results, err := mod.ExportedFunction("add").Call(ctx, x, y)
   177  if err != nil {
   178  	log.Panicln(err)
   179  }
   180  fmt.Printf("%d + %d = %d\n", x, y, results[0])
   181  ```
   182  
   183  WebAssembly does not define an encoding strategy for host defined parameters or
   184  results. This means the encoding rules above are defined by wazero instead. To
   185  address this, we clarified mapping both in `api.ValueType` and added helper
   186  functions like `api.EncodeF64`. This allows users conversions typical in Go
   187  programming, and utilities to avoid ambiguity and edge cases around casting.
   188  
   189  Alternatively, we could have defined a byte buffer based approach and a binary
   190  encoding of value types in and out. For example, an empty byte slice would mean
   191  no values, while a non-empty could use a binary encoding for supported values.
   192  This could work, but it is more difficult for the normal case of i32 and i64.
   193  It also shares a struggle with the current approach, which is that value types
   194  were added after WebAssembly 1.0 and not all of them have an encoding. More on
   195  this below.
   196  
   197  In summary, wazero chose an approach for signature mapping because there was
   198  none, and the one we chose biases towards simplicity with integers and handles
   199  the rest with documentation and utilities.
   200  
   201  #### Post 1.0 value types
   202  
   203  Value types added after WebAssembly 1.0 stressed the current model, as some
   204  have no encoding or are larger than 64 bits. While problematic, these value
   205  types are not commonly used in exported (extern) functions. However, some
   206  decisions were made and detailed below.
   207  
   208  For example `externref` has no guest representation. wazero chose to map
   209  references to uint64 as that's the largest value needed to encode a pointer on
   210  supported platforms. While there are two reference types, `externref` and
   211  `functype`, the latter is an internal detail of function tables, and the former
   212  is rarely if ever used in function signatures as of the end of 2022.
   213  
   214  The only value larger than 64 bits is used for SIMD (`v128`). Vectorizing via
   215  host functions is not used as of the end of 2022. Even if it were, it would be
   216  inefficient vs guest vectorization due to host function overhead. In other
   217  words, the `v128` value type is unlikely to be in an exported function
   218  signature. That it requires two uint64 values to encode is an internal detail
   219  and not worth changing the exported function interface `api.Function`, as doing
   220  so would break all users.
   221  
   222  ### Interfaces, not structs
   223  
   224  All exported types in public packages, regardless of configuration vs runtime, are interfaces. The primary benefits are
   225  internal flexibility and avoiding people accidentally mis-initializing by instantiating the types on their own vs using
   226  the `NewXxx` constructor functions. In other words, there's less support load when things can't be done incorrectly.
   227  
   228  Here's an example:
   229  ```go
   230  rt := &RuntimeConfig{} // not initialized properly (fields are nil which shouldn't be)
   231  rt := RuntimeConfig{} // not initialized properly (should be a pointer)
   232  rt := wazero.NewRuntimeConfig() // initialized properly
   233  ```
   234  
   235  There are a few drawbacks to this, notably some work for maintainers.
   236  * Interfaces are decoupled from the structs implementing them, which means the signature has to be repeated twice.
   237  * Interfaces have to be documented and guarded at time of use, that 3rd party implementations aren't supported.
   238  * As of Golang 1.18, interfaces are still [not well supported](https://github.com/golang/go/issues/5860) in godoc.
   239  
   240  ## Config
   241  
   242  wazero configures scopes such as Runtime and Module using `XxxConfig` types. For example, `RuntimeConfig` configures
   243  `Runtime` and `ModuleConfig` configure `Module` (instantiation). In all cases, config types begin defaults and can be
   244  customized by a user, e.g., selecting features or a module name override.
   245  
   246  ### Why don't we make each configuration setting return an error?
   247  No config types create resources that would need to be closed, nor do they return errors on use. This helps reduce
   248  resource leaks, and makes chaining easier. It makes it possible to parse configuration (ex by parsing yaml) independent
   249  of validating it.
   250  
   251  Instead of:
   252  ```
   253  cfg, err = cfg.WithFS(fs)
   254  if err != nil {
   255    return err
   256  }
   257  cfg, err = cfg.WithName(name)
   258  if err != nil {
   259    return err
   260  }
   261  mod, err = rt.InstantiateModuleWithConfig(ctx, code, cfg)
   262  if err != nil {
   263    return err
   264  }
   265  ```
   266  
   267  There's only one call site to handle errors:
   268  ```
   269  cfg = cfg.WithFS(fs).WithName(name)
   270  mod, err = rt.InstantiateModuleWithConfig(ctx, code, cfg)
   271  if err != nil {
   272    return err
   273  }
   274  ```
   275  
   276  This allows users one place to look for errors, and also the benefit that if anything internally opens a resource, but
   277  errs, there's nothing they need to close. In other words, users don't need to track which resources need closing on
   278  partial error, as that is handled internally by the only code that can read configuration fields.
   279  
   280  ### Why are configuration immutable?
   281  While it seems certain scopes like `Runtime` won't repeat within a process, they do, possibly in different goroutines.
   282  For example, some users create a new runtime for each module, and some re-use the same base module configuration with
   283  only small updates (ex the name) for each instantiation. Making configuration immutable allows them to be safely used in
   284  any goroutine.
   285  
   286  Since config are immutable, changes apply via return val, similar to `append` in a slice.
   287  
   288  For example, both of these are the same sort of error:
   289  ```go
   290  append(slice, element) // bug as only the return value has the updated slice.
   291  cfg.WithName(next) // bug as only the return value has the updated name.
   292  ```
   293  
   294  Here's an example of correct use: re-assigning explicitly or via chaining.
   295  ```go
   296  cfg = cfg.WithName(name) // explicit
   297  
   298  mod, err = rt.InstantiateModuleWithConfig(ctx, code, cfg.WithName(name)) // implicit
   299  if err != nil {
   300    return err
   301  }
   302  ```
   303  
   304  ### Why aren't configuration assigned with option types?
   305  The option pattern is a familiar one in Go. For example, someone defines a type `func (x X) err` and uses it to update
   306  the target. For example, you could imagine wazero could choose to make `ModuleConfig` from options vs chaining fields.
   307  
   308  Ex instead of:
   309  ```go
   310  type ModuleConfig interface {
   311  	WithName(string) ModuleConfig
   312  	WithFS(fs.FS) ModuleConfig
   313  }
   314  
   315  struct moduleConfig {
   316  	name string
   317  	fs fs.FS
   318  }
   319  
   320  func (c *moduleConfig) WithName(name string) ModuleConfig {
   321      ret := *c // copy
   322      ret.name = name
   323      return &ret
   324  }
   325  
   326  func (c *moduleConfig) WithFS(fs fs.FS) ModuleConfig {
   327      ret := *c // copy
   328      ret.setFS("/", fs)
   329      return &ret
   330  }
   331  
   332  config := r.NewModuleConfig().WithFS(fs)
   333  configDerived := config.WithName("name")
   334  ```
   335  
   336  An option function could be defined, then refactor each config method into an name prefixed option function:
   337  ```go
   338  type ModuleConfig interface {
   339  }
   340  struct moduleConfig {
   341      name string
   342      fs fs.FS
   343  }
   344  
   345  type ModuleConfigOption func(c *moduleConfig)
   346  
   347  func ModuleConfigName(name string) ModuleConfigOption {
   348      return func(c *moduleConfig) {
   349          c.name = name
   350  	}
   351  }
   352  
   353  func ModuleConfigFS(fs fs.FS) ModuleConfigOption {
   354      return func(c *moduleConfig) {
   355          c.fs = fs
   356      }
   357  }
   358  
   359  func (r *runtime) NewModuleConfig(opts ...ModuleConfigOption) ModuleConfig {
   360  	ret := newModuleConfig() // defaults
   361      for _, opt := range opts {
   362          opt(&ret.config)
   363      }
   364      return ret
   365  }
   366  
   367  func (c *moduleConfig) WithOptions(opts ...ModuleConfigOption) ModuleConfig {
   368      ret := *c // copy base config
   369      for _, opt := range opts {
   370          opt(&ret.config)
   371      }
   372      return ret
   373  }
   374  
   375  config := r.NewModuleConfig(ModuleConfigFS(fs))
   376  configDerived := config.WithOptions(ModuleConfigName("name"))
   377  ```
   378  
   379  wazero took the path of the former design primarily due to:
   380  * interfaces provide natural namespaces for their methods, which is more direct than functions with name prefixes.
   381  * parsing config into function callbacks is more direct vs parsing config into a slice of functions to do the same.
   382  * in either case derived config is needed and the options pattern is more awkward to achieve that.
   383  
   384  There are other reasons such as test and debug being simpler without options: the above list is constrained to conserve
   385  space. It is accepted that the options pattern is common in Go, which is the main reason for documenting this decision.
   386  
   387  ### Why aren't config types deeply structured?
   388  wazero's configuration types cover the two main scopes of WebAssembly use:
   389  * `RuntimeConfig`: This is the broadest scope, so applies also to compilation
   390    and instantiation. e.g. This controls the WebAssembly Specification Version.
   391  * `ModuleConfig`: This affects modules instantiated after compilation and what
   392    resources are allowed. e.g. This defines how or if STDOUT is captured. This
   393    also allows sub-configuration of `FSConfig`.
   394  
   395  These default to a flat definition each, with lazy sub-configuration only after
   396  proven to be necessary. A flat structure is easier to work with and is also
   397  easy to discover. Unlike the option pattern described earlier, more
   398  configuration in the interface doesn't taint the package namespace, only
   399  `ModuleConfig`.
   400  
   401  We default to a flat structure to encourage simplicity. If we eagerly broke out
   402  all possible configurations into sub-types (e.g. ClockConfig), it would be hard
   403  to notice configuration sprawl. By keeping the config flat, it is easy to see
   404  the cognitive load we may be adding to our users.
   405  
   406  In other words, discomfort adding more configuration is a feature, not a bug.
   407  We should only add new configuration rarely, and before doing so, ensure it
   408  will be used. In fact, this is why we support using context fields for
   409  experimental configuration. By letting users practice, we can find out if a
   410  configuration was a good idea or not before committing to it, and potentially
   411  sprawling our types.
   412  
   413  In reflection, this approach worked well for the nearly 1.5 year period leading
   414  to version 1.0. We've only had to create a single sub-configuration, `FSConfig`,
   415  and it was well understood why when it occurred.
   416  
   417  ## Why does InstantiateModule call "_start" by default?
   418  We formerly had functions like `StartWASICommand` that would verify preconditions and start WASI's "_start" command.
   419  However, this caused confusion because both many languages compiled a WASI dependency, and many did so inconsistently.
   420  
   421  That said, if "_start" isn't called, it causes issues in TinyGo, as it needs this in order to implement panic. To deal
   422  with this a different way, we have a configuration to call any start functions that exist, which defaults to "_start".
   423  
   424  ## Runtime == Engine+Store
   425  wazero defines a single user-type which combines the specification concept of `Store` with the unspecified `Engine`
   426  which manages them.
   427  
   428  ### Why not multi-store?
   429  Multi-store isn't supported as the extra tier complicates lifecycle and locking. Moreover, in practice it is unusual for
   430  there to be an engine that has multiple stores which have multiple modules. More often, it is the case that there is
   431  either 1 engine with 1 store and multiple modules, or 1 engine with many stores, each having 1 non-host module. In worst
   432  case, a user can use multiple runtimes until "multi-store" is better understood.
   433  
   434  If later, we have demand for multiple stores, that can be accomplished by overload. e.g. `Runtime.InstantiateInStore` or
   435  `Runtime.Store(name) Store`.
   436  
   437  ## wazeroir
   438  wazero's intermediate representation (IR) is called `wazeroir`. Lowering into an IR provides us a faster interpreter
   439  and a closer to assembly representation for used by our compiler.
   440  
   441  ### Intermediate Representation (IR) design
   442  `wazeroir`'s initial design borrowed heavily from the defunct `microwasm` format (a.k.a. LightbeamIR). Notably,
   443  `wazeroir` doesn't have block operations: this simplifies the implementation.
   444  
   445  Note: `microwasm` was never specified formally, and only exists in a historical codebase of wasmtime:
   446  https://github.com/bytecodealliance/wasmtime/blob/v0.29.0/crates/lightbeam/src/microwasm.rs
   447  
   448  ## Exit
   449  
   450  ### Why do we return a `sys.ExitError` on exit code zero?
   451  
   452  It may be surprising to find an error returned on success (exit code zero).
   453  This can be explained easier when you think of function returns: When results
   454  aren't empty, then you must return an error. This is trickier to explain when
   455  results are empty, such as the case in the "_start" function in WASI.
   456  
   457  The main rationale for returning an exit error even if the code is success is
   458  that the module is no longer functional. For example, function exports would
   459  error later. In cases like these, it is better to handle errors where they
   460  occur.
   461  
   462  Luckily, it is not common to exit a module during the "_start" function. For
   463  example, the only known compilation target that does this is Emscripten. Most,
   464  such as Rust, TinyGo, or normal wasi-libc, don't. If they did, it would
   465  invalidate their function exports. This means it is unlikely most compilers
   466  will change this behavior.
   467  
   468  In summary, we return a `sys.ExitError` to the caller whenever we get it, as it
   469  properly reflects the state of the module, which would be closed on this error.
   470  
   471  ### Why panic with `sys.ExitError` after a host function exits?
   472  
   473  Currently, the only portable way to stop processing code is via panic. For
   474  example, WebAssembly "trap" instructions, such as divide by zero, are
   475  implemented via panic. This ensures code isn't executed after it.
   476  
   477  When code reaches the WASI `proc_exit` instruction, we need to stop processing.
   478  Regardless of the exit code, any code invoked after exit would be in an
   479  inconsistent state. This is likely why unreachable instructions are sometimes
   480  inserted after exit: https://github.com/emscripten-core/emscripten/issues/12322
   481  
   482  ## WASI
   483  
   484  Unfortunately, (WASI Snapshot Preview 1)[https://github.com/WebAssembly/WASI/blob/snapshot-01/phases/snapshot/docs.md] is not formally defined enough, and has APIs with ambiguous semantics.
   485  This section describes how Wazero interprets and implements the semantics of several WASI APIs that may be interpreted differently by different wasm runtimes.
   486  Those APIs may affect the portability of a WASI application.
   487  
   488  ### Why don't we attempt to pass wasi-testsuite on user-defined `fs.FS`?
   489  
   490  While most cases work fine on an `os.File` based implementation, we won't
   491  promise wasi-testsuite compatibility on user defined wrappers of `os.DirFS`.
   492  The only option for real systems is to use our `sysfs.FS`.
   493  
   494  There are a lot of areas where windows behaves differently, despite the
   495  `os.File` abstraction. This goes well beyond file locking concerns (e.g.
   496  `EBUSY` errors on open files). For example, errors like `ACCESS_DENIED` aren't
   497  properly mapped to `EPERM`. There are trickier parts too. `FileInfo.Sys()`
   498  doesn't return enough information to build inodes needed for WASI. To rebuild
   499  them requires the full path to the underlying file, not just its directory
   500  name, and there's no way for us to get that information. At one point we tried,
   501  but in practice things became tangled and functionality such as read-only
   502  wrappers became untenable. Finally, there are version-specific behaviors which
   503  are difficult to maintain even in our own code. For example, go 1.20 opens
   504  files in a different way than versions before it.
   505  
   506  ### Why aren't WASI rules enforced?
   507  
   508  The [snapshot-01](https://github.com/WebAssembly/WASI/blob/snapshot-01/phases/snapshot/docs.md) version of WASI has a
   509  number of rules for a "command module", but only the memory export rule is enforced. If a "_start" function exists, it
   510  is enforced to be the correct signature and succeed, but the export itself isn't enforced. It follows that this means
   511  exports are not required to be contained to a "_start" function invocation. Finally, the "__indirect_function_table"
   512  export is also not enforced.
   513  
   514  The reason for the exceptions are that implementations aren't following the rules. For example, TinyGo doesn't export
   515  "__indirect_function_table", so crashing on this would make wazero unable to run TinyGo modules. Similarly, modules
   516  loaded by wapc-go don't always define a "_start" function. Since "snapshot-01" is not a proper version, and certainly
   517  not a W3C recommendation, there's no sense in breaking users over matters like this.
   518  
   519  ### Why is I/O configuration not coupled to WASI?
   520  
   521  WebAssembly System Interfaces (WASI) is a formalization of a practice that can be done anyway: Define a host function to
   522  access a system interface, such as writing to STDOUT. WASI stalled at snapshot-01 and as of early 2023, is being
   523  rewritten entirely.
   524  
   525  This instability implies a need to transition between WASI specs, which places wazero in a position that requires
   526  decoupling. For example, if code uses two different functions to call `fd_write`, the underlying configuration must be
   527  centralized and decoupled. Otherwise, calls using the same file descriptor number will end up writing to different
   528  places.
   529  
   530  In short, wazero defined system configuration in `ModuleConfig`, not a WASI type. This allows end-users to switch from
   531  one spec to another with minimal impact. This has other helpful benefits, as centralized resources are simpler to close
   532  coherently (ex via `Module.Close`).
   533  
   534  In reflection, this worked well as more ABI became usable in wazero. For example, `GOARCH=wasm GOOS=js` code uses the
   535  same `ModuleConfig` (and `FSConfig`) WASI uses, and in compatible ways.
   536  
   537  ### Background on `ModuleConfig` design
   538  
   539  WebAssembly 1.0 (20191205) specifies some aspects to control isolation between modules ([sandboxing](https://en.wikipedia.org/wiki/Sandbox_(computer_security))).
   540  For example, `wasm.Memory` has size constraints and each instance of it is isolated from each other. While `wasm.Memory`
   541  can be shared, by exporting it, it is not exported by default. In fact a WebAssembly Module (Wasm) has no memory by
   542  default.
   543  
   544  While memory is defined in WebAssembly 1.0 (20191205), many aspects are not. Let's use an example of `exec.Cmd` as for
   545  example, a WebAssembly System Interfaces (WASI) command is implemented as a module with a `_start` function, and in many
   546  ways acts similar to a process with a `main` function.
   547  
   548  To capture "hello world" written to the console (stdout a.k.a. file descriptor 1) in `exec.Cmd`, you would set the
   549  `Stdout` field accordingly, perhaps to a buffer. In WebAssembly 1.0 (20191205), the only way to perform something like
   550  this is via a host function (ex `HostModuleFunctionBuilder`) and internally copy memory corresponding to that string
   551  to a buffer.
   552  
   553  WASI implements system interfaces with host functions. Concretely, to write to console, a WASI command `Module` imports
   554  "fd_write" from "wasi_snapshot_preview1" and calls it with the `fd` parameter set to 1 (STDOUT).
   555  
   556  The [snapshot-01](https://github.com/WebAssembly/WASI/blob/snapshot-01/phases/snapshot/docs.md) version of WASI has no
   557  means to declare configuration, although its function definitions imply configuration for example if fd 1 should exist,
   558  and if so where should it write. Moreover, snapshot-01 was last updated in late 2020 and the specification is being
   559  completely rewritten as of early 2022. This means WASI as defined by "snapshot-01" will not clarify aspects like which
   560  file descriptors are required. While it is possible a subsequent version may, it is too early to tell as no version of
   561  WASI has reached a stage near W3C recommendation. Even if it did, module authors are not required to only use WASI to
   562  write to console, as they can define their own host functions, such as they did before WASI existed.
   563  
   564  wazero aims to serve Go developers as a primary function, and help them transition between WASI specifications. In
   565  order to do this, we have to allow top-level configuration. To ensure isolation by default, `ModuleConfig` has WithXXX
   566  that override defaults to no-op or empty. One `ModuleConfig` instance is used regardless of how many times the same WASI
   567  functions are imported. The nil defaults allow safe concurrency in these situations, as well lower the cost when they
   568  are never used. Finally, a one-to-one mapping with `Module` allows the module to close the `ModuleConfig` instead of
   569  confusing users with another API to close.
   570  
   571  Naming, defaults and validation rules of aspects like `STDIN` and `Environ` are intentionally similar to other Go
   572  libraries such as `exec.Cmd` or `syscall.SetEnv`, and differences called out where helpful. For example, there's no goal
   573  to emulate any operating system primitive specific to Windows (such as a 'c:\' drive). Moreover, certain defaults
   574  working with real system calls are neither relevant nor safe to inherit: For example, `exec.Cmd` defaults to read STDIN
   575  from a real file descriptor ("/dev/null"). Defaulting to this, vs reading `io.EOF`, would be unsafe as it can exhaust
   576  file descriptors if resources aren't managed properly. In other words, blind copying of defaults isn't wise as it can
   577  violate isolation or endanger the embedding process. In summary, we try to be similar to normal Go code, but often need
   578  act differently and document `ModuleConfig` is more about emulating, not necessarily performing real system calls.
   579  
   580  ## File systems
   581  
   582  ### Why doesn't wazero implement the working directory?
   583  
   584  An early design of wazero's API included a `WithWorkDirFS` which allowed
   585  control over which file a relative path such as "./config.yml" resolved to,
   586  independent of the root file system. This intended to help separate concerns
   587  like mutability of files, but it didn't work and was removed.
   588  
   589  Compilers that target wasm act differently with regard to the working
   590  directory. For example, while `GOOS=js` uses host functions to track the
   591  working directory, WASI host functions do not. wasi-libc, used by TinyGo,
   592  tracks working directory changes in compiled wasm instead: initially "/" until
   593  code calls `chdir`. Zig assumes the first pre-opened file descriptor is the
   594  working directory.
   595  
   596  The only place wazero can standardize a layered concern is via a host function.
   597  Since WASI doesn't use host functions to track the working directory, we can't
   598  standardize the storage and initial value of it.
   599  
   600  Meanwhile, code may be able to affect the working directory by compiling
   601  `chdir` into their main function, using an argument or ENV for the initial
   602  value (possibly `PWD`). Those unable to control the compiled code should only
   603  use absolute paths in configuration.
   604  
   605  See
   606  * https://github.com/golang/go/blob/go1.20/src/syscall/fs_js.go#L324
   607  * https://github.com/WebAssembly/wasi-libc/pull/214#issue-673090117
   608  * https://github.com/ziglang/zig/blob/53a9ee699a35a3d245ab6d1dac1f0687a4dcb42c/src/main.zig#L32
   609  
   610  ### Why ignore the error returned by io.Reader when n > 1?
   611  
   612  Per https://pkg.go.dev/io#Reader, if we receive an error, any bytes read should
   613  be processed first. At the syscall abstraction (`fd_read`), the caller is the
   614  processor, so we can't process the bytes inline and also return the error (as
   615  `EIO`).
   616  
   617  Let's assume we want to return the bytes read on error to the caller. This
   618  implies we at least temporarily ignore the error alongside them. The choice
   619  remaining is whether to persist the error returned with the read until a
   620  possible next call, or ignore the error.
   621  
   622  If we persist an error returned, it would be coupled to a file descriptor, but
   623  effectively it is boolean as this case coerces to `EIO`. If we track a "last
   624  error" on a file descriptor, it could be complicated for a couple reasons
   625  including whether the error is transient or permanent, or if the error would
   626  apply to any FD operation, or just read. Finally, there may never be a
   627  subsequent read as perhaps the bytes leading up to the error are enough to
   628  satisfy the processor.
   629  
   630  This decision boils down to whether or not to track an error bit per file
   631  descriptor or not. If not, the assumption is that a subsequent operation would
   632  also error, this time without reading any bytes.
   633  
   634  The current opinion is to go with the simplest path, which is to return the
   635  bytes read and ignore the error the there were any. Assume a subsequent
   636  operation will err if it needs to. This helps reduce the complexity of the code
   637  in wazero and also accommodates the scenario where the bytes read are enough to
   638  satisfy its processor.
   639  
   640  ### File descriptor allocation strategy
   641  
   642  File descriptor allocation currently uses a strategy similar the one implemented
   643  by unix systems: when opening a file, the lowest unused number is picked.
   644  
   645  The WASI standard documents that programs cannot expect that file descriptor
   646  numbers will be allocated with a lowest-first strategy, and they should instead
   647  assume the values will be random. Since _random_ is a very imprecise concept in
   648  computers, we technically satisfying the implementation with the descriptor
   649  allocation strategy we use in Wazero. We could imagine adding more _randomness_
   650  to the descriptor selection process, however this should never be used as a
   651  security measure to prevent applications from guessing the next file number so
   652  there are no strong incentives to complicate the logic.
   653  
   654  ### Why does `FSConfig.WithDirMount` not match behaviour with `os.DirFS`?
   655  
   656  It may seem that we should require any feature that seems like a standard
   657  library in Go, to behave the same way as the standard library. Doing so would
   658  present least surprise to Go developers. In the case of how we handle
   659  filesystems, we break from that as it is incompatible with the expectations of
   660  WASI, the most commonly implemented filesystem ABI.
   661  
   662  The main reason is that `os.DirFS` is a virtual filesystem abstraction while
   663  WASI is an abstraction over syscalls. For example, the signature of `fs.Open`
   664  does not permit use of flags. This creates conflict on what default behaviors
   665  to take when Go implemented `os.DirFS`. On the other hand, `path_open` can pass
   666  flags, and in fact tests require them to be honored in specific ways. This
   667  extends beyond WASI as even `GOARCH=wasm GOOS=js` compiled code requires
   668  certain flags passed to `os.OpenFile` which are impossible to pass due to the
   669  signature of `fs.FS`.
   670  
   671  This conflict requires us to choose what to be more compatible with, and which
   672  type of user to surprise the least. We assume there will be more developers
   673  compiling code to wasm than developers of custom filesystem plugins, and those
   674  compiling code to wasm will be better served if we are compatible with WASI.
   675  Hence on conflict, we prefer WASI behavior vs the behavior of `os.DirFS`.
   676  
   677  Meanwhile, it is possible that Go will one day compile to `GOOS=wasi` in
   678  addition to `GOOS=js`. When there is shared stake in WASI, we expect gaps like
   679  these to be easier to close.
   680  
   681  See https://github.com/WebAssembly/wasi-testsuite
   682  See https://github.com/golang/go/issues/58141
   683  
   684  ### fd_pread: io.Seeker fallback when io.ReaderAt is not supported
   685  
   686  `ReadAt` is the Go equivalent to `pread`: it does not affect, and is not
   687  affected by, the underlying file offset. Unfortunately, `io.ReaderAt` is not
   688  implemented by all `fs.File`. For example, as of Go 1.19, `embed.openFile` does
   689  not.
   690  
   691  The initial implementation of `fd_pread` instead used `Seek`. To avoid a
   692  regression, we fall back to `io.Seeker` when `io.ReaderAt` is not supported.
   693  
   694  This requires obtaining the initial file offset, seeking to the intended read
   695  offset, and resetting the file offset the initial state. If this final seek
   696  fails, the file offset is left in an undefined state. This is not thread-safe.
   697  
   698  While seeking per read seems expensive, the common case of `embed.openFile` is
   699  only accessing a single int64 field, which is cheap.
   700  
   701  ### Pre-opened files
   702  
   703  WASI includes `fd_prestat_get` and `fd_prestat_dir_name` functions used to
   704  learn any directory paths for file descriptors open at initialization time.
   705  
   706  For example, `__wasilibc_register_preopened_fd` scans any file descriptors past
   707  STDERR (1) and invokes `fd_prestat_dir_name` to learn any path prefixes they
   708  correspond to. Zig's `preopensAlloc` does similar. These pre-open functions are
   709  not used again after initialization.
   710  
   711  wazero supports stdio pre-opens followed by any mounts e.g `.:/`. The guest
   712  path is a directory and its name, e.g. "/" is returned by `fd_prestat_dir_name`
   713  for file descriptor 3 (STDERR+1). The first longest match wins on multiple
   714  pre-opens, which allows a path like "/tmp" to match regardless of order vs "/".
   715  
   716  See
   717   * https://github.com/WebAssembly/wasi-libc/blob/a02298043ff551ce1157bc2ee7ab74c3bffe7144/libc-bottom-half/sources/preopens.c
   718   * https://github.com/ziglang/zig/blob/9cb06f3b8bf9ea6b5e5307711bc97328762d6a1d/lib/std/fs/wasi.zig#L50-L53
   719  
   720  ### fd_prestat_dir_name
   721  
   722  `fd_prestat_dir_name` is a WASI function to return the path of the pre-opened
   723  directory of a file descriptor. It has the following three parameters, and the
   724  third `path_len` has ambiguous semantics.
   725  
   726  * `fd`: a file descriptor
   727  * `path`: the offset for the result path
   728  * `path_len`: In wazero, `FdPrestatDirName` writes the result path string to
   729    `path` offset for the exact length of `path_len`.
   730  
   731  Wasmer considers `path_len` to be the maximum length instead of the exact
   732  length  that should be written.
   733  See https://github.com/wasmerio/wasmer/blob/3463c51268ed551933392a4063bd4f8e7498b0f6/lib/wasi/src/syscalls/mod.rs#L764
   734  
   735  The semantics in wazero follows that of wasmtime.
   736  See https://github.com/bytecodealliance/wasmtime/blob/2ca01ae9478f199337cf743a6ab543e8c3f3b238/crates/wasi-common/src/snapshots/preview_1.rs#L578-L582
   737  
   738  Their semantics match when `path_len` == the length of `path`, so in practice
   739  this difference won't matter match.
   740  
   741  ## Why does fd_readdir not include dot (".") and dot-dot ("..") entries?
   742  
   743  When reading a directory, wazero code does not return dot (".") and dot-dot
   744  ("..") entries. The main reason is that Go does not return them from
   745  `os.ReadDir`, and materializing them is complicated (at least dot-dot is).
   746  
   747  A directory entry has stat information in it. The stat information includes
   748  inode which is used for comparing file equivalence. In the simple case of dot,
   749  we could materialize a special entry to expose the same info as stat on the fd
   750  would return. However, doing this and not doing dot-dot would cause confusion,
   751  and dot-dot is far more tricky. To back-fill inode information about a parent
   752  directory would be costly and subtle. For example, the pre-open (mount) of the
   753  directory may be different than its logical parent. This is easy to understand
   754  when considering the common case of mounting "/" and "/tmp" as pre-opens. To
   755  implement ".." from "/tmp" requires information from a separate pre-open, this
   756  includes state to even know the difference. There are easier edge cases as
   757  well, such as the decision to not return ".." from a root path. In any case,
   758  this should start to explain that faking entries when underlying stdlib doesn't
   759  return them is tricky and requires quite a lot of state.
   760  
   761  Even if we did that, it would cause expense to all users of wazero, so we'd
   762  then look to see if that would be justified or not. However, the most common
   763  compilers involved in end user questions, as of early 2023 are TinyGo, Rust and
   764  Zig. All of these compile code which ignores dot and dot-dot entries. In other
   765  words, faking these entries would not only cost our codebase with complexity,
   766  but it would also add unnecessary overhead as the values aren't commonly used.
   767  
   768  The final reason why we might do this, is an end users or a specification
   769  requiring us to. As of early 2023, no end user has raised concern over Go and
   770  by extension wazero not returning dot and dot-dot. The snapshot-01 spec of WASI
   771  does not mention anything on this point. Also, POSIX has the following to say,
   772  which summarizes to "these are optional"
   773  
   774  > The readdir() function shall not return directory entries containing empty names. If entries for dot or dot-dot exist, one entry shall be returned for dot and one entry shall be returned for dot-dot; otherwise, they shall not be returned.
   775  
   776  In summary, wazero not only doesn't return dot and dot-dot entries because Go
   777  doesn't and emulating them in spite of that would result in no difference
   778  except hire overhead to the majority of our users.
   779  
   780  See https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html
   781  See https://github.com/golang/go/blob/go1.20/src/os/dir_unix.go#L108-L110
   782  
   783  ## sys.Walltime and Nanotime
   784  
   785  The `sys` package has two function types, `Walltime` and `Nanotime` for real
   786  and monotonic clock exports. The naming matches conventions used in Go.
   787  
   788  ```go
   789  func time_now() (sec int64, nsec int32, mono int64) {
   790  	sec, nsec = walltime()
   791  	return sec, nsec, nanotime()
   792  }
   793  ```
   794  
   795  Splitting functions for wall and clock time allow implementations to choose
   796  whether to implement the clock once (as in Go), or split them out.
   797  
   798  Each can be configured with a `ClockResolution`, although is it usually
   799  incorrect as detailed in a sub-heading below. The only reason for exposing this
   800  is to satisfy WASI:
   801  
   802  See https://github.com/WebAssembly/wasi-clocks
   803  
   804  ### Why default to fake time?
   805  
   806  WebAssembly has an implicit design pattern of capabilities based security. By
   807  defaulting to a fake time, we reduce the chance of timing attacks, at the cost
   808  of requiring configuration to opt-into real clocks.
   809  
   810  See https://gruss.cc/files/fantastictimers.pdf for an example attacks.
   811  
   812  ### Why does fake time increase on reading?
   813  
   814  Both the fake nanotime and walltime increase by 1ms on reading. Particularly in
   815  the case of nanotime, this prevents spinning. For example, when Go compiles
   816  `time.Sleep` using `GOOS=js GOARCH=wasm`, nanotime is used in a loop. If that
   817  never increases, the gouroutine is mistaken for being busy. This would be worse
   818  if a compiler implement sleep using nanotime, yet doesn't check for spinning!
   819  
   820  ### Why not `time.Clock`?
   821  
   822  wazero can't use `time.Clock` as a plugin for clock implementation as it is
   823  only substitutable with build flags (`faketime`) and conflates wall and
   824  monotonic time in the same call.
   825  
   826  Go's `time.Clock` was added monotonic time after the fact. For portability with
   827  prior APIs, a decision was made to combine readings into the same API call.
   828  
   829  See https://go.googlesource.com/proposal/+/master/design/12914-monotonic.md
   830  
   831  WebAssembly time imports do not have the same concern. In fact even Go's
   832  imports for clocks split walltime from nanotime readings.
   833  
   834  See https://github.com/golang/go/blob/go1.20/misc/wasm/wasm_exec.js#L243-L255
   835  
   836  Finally, Go's clock is not an interface. WebAssembly users who want determinism
   837  or security need to be able to substitute an alternative clock implementation
   838  from the host process one.
   839  
   840  ### `ClockResolution`
   841  
   842  A clock's resolution is hardware and OS dependent so requires a system call to retrieve an accurate value.
   843  Go does not provide a function for getting resolution, so without CGO we don't have an easy way to get an actual
   844  value. For now, we return fixed values of 1us for realtime and 1ns for monotonic, assuming that realtime clocks are
   845  often lower precision than monotonic clocks. In the future, this could be improved by having OS+arch specific assembly
   846  to make syscalls.
   847  
   848  For example, Go implements time.Now for linux-amd64 with this [assembly](https://github.com/golang/go/blob/go1.20/src/runtime/time_linux_amd64.s).
   849  Because retrieving resolution is not generally called often, unlike getting time, it could be appropriate to only
   850  implement the fallback logic that does not use VDSO (executing syscalls in user mode). The syscall for clock_getres
   851  is 229 and should be usable. https://pkg.go.dev/syscall#pkg-constants.
   852  
   853  If implementing similar for Windows, [mingw](https://github.com/mirror/mingw-w64/blob/6a0e9165008f731bccadfc41a59719cf7c8efc02/mingw-w64-libraries/winpthreads/src/clock.c#L77
   854  ) is often a good source to find the Windows API calls that correspond
   855  to a POSIX method.
   856  
   857  Writing assembly would allow making syscalls without CGO, but comes with the cost that it will require implementations
   858  across many combinations of OS and architecture.
   859  
   860  ## sys.Nanosleep
   861  
   862  All major programming languages have a `sleep` mechanism to block for a
   863  duration. Sleep is typically implemented by a WASI `poll_oneoff` relative clock
   864  subscription.
   865  
   866  For example, the below ends up calling `wasi_snapshot_preview1.poll_oneoff`:
   867  
   868  ```zig
   869  const std = @import("std");
   870  pub fn main() !void {
   871      std.time.sleep(std.time.ns_per_s * 5);
   872  }
   873  ```
   874  
   875  Besides Zig, this is also the case with TinyGo (`-target=wasi`) and Rust
   876  (`--target wasm32-wasi`). This isn't the case with Go (`GOOS=js GOARCH=wasm`),
   877  though. In the latter case, wasm loops on `sys.Nanotime`.
   878  
   879  We decided to expose `sys.Nanosleep` to allow overriding the implementation
   880  used in the common case, even if it isn't used by Go, because this gives an
   881  easy and efficient closure over a common program function. We also documented
   882  `sys.Nanotime` to warn users that some compilers don't optimize sleep.
   883  
   884  ## sys.Osyield
   885  
   886  We expose `sys.Osyield`, to allow users to control the behavior of WASI's
   887  `sched_yield` without a new build of wazero. This is mainly for parity with
   888  all other related features which we allow users to implement, including
   889  `sys.Nanosleep`. Unlike others, we don't provide an out-of-box implementation
   890  primarily because it will cause performance problems when accessed.
   891  
   892  For example, the below implementation uses CGO, which might result in a 1us
   893  delay per invocation depending on the platform.
   894  
   895  See https://github.com/golang/go/issues/19409#issuecomment-284788196
   896  ```go
   897  //go:noescape
   898  //go:linkname osyield runtime.osyield
   899  func osyield()
   900  ```
   901  
   902  In practice, a request to customize this is unlikely to happen until other
   903  thread based functions are implemented. That said, as of early 2023, there are
   904  a few signs of implementation interest and cross-referencing:
   905  
   906  See https://github.com/WebAssembly/stack-switching/discussions/38
   907  See https://github.com/WebAssembly/wasi-threads#what-can-be-skipped
   908  See https://slinkydeveloper.com/Kubernetes-controllers-A-New-Hope/
   909  
   910  ## poll_oneoff
   911  
   912  `poll_oneoff` is a WASI API for waiting for I/O events on multiple handles.
   913  It is conceptually similar to the POSIX `poll(2)` syscall.
   914  The name is not `poll`, because it references [“the fact that this function is not efficient
   915  when used repeatedly with the same large set of handles”][poll_oneoff].
   916  
   917  We chose to support this API in a handful of cases that work for regular files
   918  and standard input. We currently do not support other types of file descriptors such
   919  as socket handles.
   920  
   921  ### Clock Subscriptions
   922  
   923  As detailed above in [sys.Nanosleep](#sysnanosleep), `poll_oneoff` handles
   924  relative clock subscriptions. In our implementation we use `sys.Nanosleep()`
   925  for this purpose in most cases, except when polling for interactive input
   926  from `os.Stdin` (see more details below).
   927  
   928  ### FdRead and FdWrite Subscriptions
   929  
   930  When subscribing a file descriptor (except `Stdin`) for reads or writes,
   931  the implementation will generally return immediately with success, unless
   932  the file descriptor is unknown. The file descriptor is not checked further
   933  for new incoming data. Any timeout is cancelled, and the API call is able
   934  to return, unless there are subscriptions to `Stdin`: these are handled
   935  separately.
   936  
   937  ### FdRead and FdWrite Subscription to Stdin
   938  
   939  Subscribing `Stdin` for reads (writes make no sense and cause an error),
   940  requires extra care: wazero allows to configure a custom reader for `Stdin`.
   941  
   942  In general, if a custom reader is found, the behavior will be the same
   943  as for regular file descriptors: data is assumed to be present and
   944  a success is written back to the result buffer.
   945  
   946  However, if the reader is detected to read from `os.Stdin`,
   947  a special code path is followed, invoking `platform.Select()`.
   948  
   949  `platform.Select()` is a wrapper for `select(2)` on POSIX systems,
   950  and it is mocked for a handful of cases also on Windows.
   951  
   952  ### Select on POSIX
   953  
   954  On POSIX systems,`select(2)` allows to wait for incoming data on a file
   955  descriptor, and block until either data becomes available or the timeout
   956  expires. It is not surprising that `select(2)` and `poll(2)` have lot in common:
   957  the main difference is how the file descriptor parameters are passed.
   958  
   959  Usage of `platform.Select()` is only reserved for the standard input case, because
   960  
   961  1. it is really only necessary to handle interactive input: otherwise,
   962     there is no way in Go to peek from Standard Input without actually
   963     reading (and thus consuming) from it;
   964  
   965  2. if `Stdin` is connected to a pipe, it is ok in most cases to return
   966     with success immediately;
   967  
   968  3. `platform.Select()` is currently a blocking call, irrespective of goroutines,
   969     because the underlying syscall is; thus, it is better to limit its usage.
   970  
   971  So, if the subscription is for `os.Stdin` and the handle is detected
   972  to correspond to an interactive session, then `platform.Select()` will be
   973  invoked with a the `Stdin` handle *and* the timeout.
   974  
   975  This also means that in this specific case, the timeout is uninterruptible,
   976  unless data becomes available on `Stdin` itself.
   977  
   978  ### Select on Windows
   979  
   980  On Windows the `platform.Select()` is much more straightforward,
   981  and it really just replicates the behavior found in the general cases
   982  for `FdRead` subscriptions: in other words, the subscription to `Stdin`
   983  is immediately acknowledged.
   984  
   985  The implementation also support a timeout, but in this case
   986  it relies on `time.Sleep()`, which notably, as compared to the POSIX
   987  case, interruptible and compatible with goroutines.
   988  
   989  However, because `Stdin` subscriptions are always acknowledged
   990  without wait and because this code path is always followed only
   991  when at least one `Stdin` subscription is present, then the
   992  timeout is effectively always handled externally.
   993  
   994  In any case, the behavior of `platform.Select` on Windows
   995  is sensibly different from the behavior on POSIX platforms;
   996  we plan to refine and further align it in semantics in the future.
   997  
   998  [poll_oneoff]: https://github.com/WebAssembly/wasi-poll#why-is-the-function-called-poll_oneoff
   999  
  1000  ## Signed encoding of integer global constant initializers
  1001  
  1002  wazero treats integer global constant initializers signed as their interpretation is not known at declaration time. For
  1003  example, there is no signed integer [value type](https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#value-types%E2%91%A0).
  1004  
  1005  To get at the problem, let's use an example.
  1006  ```
  1007  (global (export "start_epoch") i64 (i64.const 1620216263544))
  1008  ```
  1009  
  1010  In both signed and unsigned LEB128 encoding, this value is the same bit pattern. The problem is that some numbers are
  1011  not. For example, 16256 is `807f` encoded as unsigned, but `80ff00` encoded as signed.
  1012  
  1013  While the specification mentions uninterpreted integers are in abstract [unsigned values](https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#integers%E2%91%A0),
  1014  the binary encoding is clear that they are encoded [signed](https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#integers%E2%91%A4).
  1015  
  1016  For consistency, we go with signed encoding in the special case of global constant initializers.
  1017  
  1018  ## Implementation limitations
  1019  
  1020  WebAssembly 1.0 (20191205) specification allows runtimes to [limit certain aspects of Wasm module or execution](https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#a2-implementation-limitations).
  1021  
  1022  wazero limitations are imposed pragmatically and described below.
  1023  
  1024  ### Number of functions in a module
  1025  
  1026  The possible number of function instances in [a module](https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#module-instances%E2%91%A0) is not specified in the WebAssembly specifications since [`funcaddr`](https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#syntax-funcaddr) corresponding to a function instance in a store can be arbitrary number.
  1027  wazero limits the maximum function instances to 2^27 as even that number would occupy 1GB in function pointers.
  1028  
  1029  That is because not only we _believe_ that all use cases are fine with the limitation, but also we have no way to test wazero runtimes under these unusual circumstances.
  1030  
  1031  ### Number of function types in a store
  1032  
  1033  There's no limitation on the number of function types in [a store](https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#store%E2%91%A0) according to the spec. In wazero implementation, we assign each function type to a unique ID, and choose to use `uint32` to represent the IDs.
  1034  Therefore the maximum number of function types a store can have is limited to 2^27 as even that number would occupy 512MB just to reference the function types.
  1035  
  1036  This is due to the same reason for the limitation on the number of functions above.
  1037  
  1038  ### Number of values on the stack in a function
  1039  
  1040  While the the spec does not clarify a limitation of function stack values, wazero limits this to 2^27 = 134,217,728.
  1041  The reason is that we internally represent all the values as 64-bit integers regardless of its types (including f32, f64), and 2^27 values means
  1042  1 GiB = (2^30). 1 GiB is the reasonable for most applications [as we see a Goroutine has 250 MB as a limit on the stack for 32-bit arch](https://github.com/golang/go/blob/go1.20/src/runtime/proc.go#L152-L159), considering that WebAssembly is (currently) 32-bit environment.
  1043  
  1044  All the functions are statically analyzed at module instantiation phase, and if a function can potentially reach this limit, an error is returned.
  1045  
  1046  ### Number of globals in a module
  1047  
  1048  Theoretically, a module can declare globals (including imports) up to 2^32 times. However, wazero limits this to  2^27(134,217,728) per module.
  1049  That is because internally we store globals in a slice with pointer types (meaning 8 bytes on 64-bit platforms), and therefore 2^27 globals
  1050  means that we have 1 GiB size of slice which seems large enough for most applications.
  1051  
  1052  ### Number of tables in a module
  1053  
  1054  While the the spec says that a module can have up to 2^32 tables, wazero limits this to 2^27 = 134,217,728.
  1055  One of the reasons is even that number would occupy 1GB in the pointers tables alone. Not only that, we access tables slice by
  1056  table index by using 32-bit signed offset in the compiler implementation, which means that the table index of 2^27 can reach 2^27 * 8 (pointer size on 64-bit machines) = 2^30 offsets in bytes.
  1057  
  1058  We _believe_ that all use cases are fine with the limitation, but also note that we have no way to test wazero runtimes under these unusual circumstances.
  1059  
  1060  If a module reaches this limit, an error is returned at the compilation phase.
  1061  
  1062  ## Compiler engine implementation
  1063  
  1064  See [compiler/RATIONALE.md](internal/engine/compiler/RATIONALE.md).
  1065  
  1066  ## Golang patterns
  1067  
  1068  ### Hammer tests
  1069  Code that uses concurrency primitives, such as locks or atomics, should include "hammer tests", which run large loops
  1070  inside a bounded amount of goroutines, run by half that many `GOMAXPROCS`. These are named consistently "hammer", so
  1071  they are easy to find. The name inherits from some existing tests in [golang/go](https://github.com/golang/go/search?q=hammer&type=code).
  1072  
  1073  Here is an annotated description of the key pieces of a hammer test:
  1074  1. `P` declares the count of goroutines to use, defaulting to 8 or 4 if `testing.Short`.
  1075     * Half this amount are the cores used, and 4 is less than a modern laptop's CPU. This allows multiple "hammer" tests to run in parallel.
  1076  2. `N` declares the scale of work (loop) per goroutine, defaulting to value that finishes in ~0.1s on a modern laptop.
  1077     * When in doubt, try 1000 or 100 if `testing.Short`
  1078     * Remember, there are multiple hammer tests and CI nodes are slow. Slower tests hurt feedback loops.
  1079  3. `defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(P/2))` makes goroutines switch cores, testing visibility of shared data.
  1080  4. To ensure goroutines execute at the same time, block them with `sync.WaitGroup`, initialized to `Add(P)`.
  1081     * `sync.WaitGroup` internally uses `runtime_Semacquire` not available in any other library.
  1082     * `sync.WaitGroup.Add` with a negative value can unblock many goroutines at the same time, e.g. without a for loop.
  1083  5. Track goroutines progress via `finished := make(chan int)` where each goroutine in `P` defers `finished <- 1`.
  1084     1. Tests use `require.XXX`, so `recover()` into `t.Fail` in a `defer` function before `finished <- 1`.
  1085        * This makes it easier to spot larger concurrency problems as you see each failure, not just the first.
  1086     2. After the `defer` function, await unblocked, then run the stateful function `N` times in a normal loop.
  1087        * This loop should trigger shared state problems as locks or atomics are contended by `P` goroutines.
  1088  6. After all `P` goroutines launch, atomically release all of them with `WaitGroup.Add(-P)`.
  1089  7. Block the runner on goroutine completion, by (`<-finished`) for each `P`.
  1090  8. When all goroutines complete, `return` if `t.Failed()`, otherwise perform follow-up state checks.
  1091  
  1092  This is implemented in wazero in [hammer.go](internal/testing/hammer/hammer.go)
  1093  
  1094  ### Lock-free, cross-goroutine observations of updates
  1095  
  1096  How to achieve cross-goroutine reads of a variable are not explicitly defined in https://go.dev/ref/mem. wazero uses
  1097  atomics to implement this following unofficial practice. For example, a `Close` operation can be guarded to happen only
  1098  once via compare-and-swap (CAS) against a zero value. When we use this pattern, we consistently use atomics to both
  1099  read and update the same numeric field.
  1100  
  1101  In lieu of formal documentation, we infer this pattern works from other sources (besides tests):
  1102   * `sync.WaitGroup` by definition must support calling `Add` from other goroutines. Internally, it uses atomics.
  1103   * rsc in golang/go#5045 writes "atomics guarantee sequential consistency among the atomic variables".
  1104  
  1105  See https://github.com/golang/go/blob/go1.20/src/sync/waitgroup.go#L64
  1106  See https://github.com/golang/go/issues/5045#issuecomment-252730563
  1107  See https://www.youtube.com/watch?v=VmrEG-3bWyM