github.com/wasilibs/wazerox@v0.0.0-20240124024944-4923be63ab5f/RATIONALE.md (about)

     1  # Notable rationale of wazero
     2  
     3  ## Zero dependencies
     4  
     5  Wazero has zero dependencies to differentiate itself from other runtimes which
     6  have heavy impact usually due to CGO. By avoiding CGO, wazero avoids
     7  prerequisites such as shared libraries or libc, and lets users keep features
     8  like cross compilation.
     9  
    10  Avoiding go.mod dependencies reduces interference on Go version support, and
    11  size of a statically compiled binary. However, doing so brings some
    12  responsibility into the project.
    13  
    14  Go's native platform support is good: We don't need platform-specific code to
    15  get monotonic time, nor do we need much work to implement certain features
    16  needed by our compiler such as `mmap`. That said, Go does not support all
    17  common operating systems to the same degree. For example, Go 1.18 includes
    18  `Mprotect` on Linux and Darwin, but not FreeBSD.
    19  
    20  The general tradeoff the project takes from a zero dependency policy is more
    21  explicit support of platforms (in the compiler runtime), as well a larger and
    22  more technically difficult codebase.
    23  
    24  At some point, we may allow extensions to supply their own platform-specific
    25  hooks. Until then, one end user impact/tradeoff is some glitches trying
    26  untested platforms (with the Compiler runtime).
    27  
    28  ### Why do we use CGO to implement system calls on darwin?
    29  
    30  wazero is dependency and CGO free by design. In some cases, we have code that
    31  can optionally use CGO, but retain a fallback for when that's disabled. The only
    32  operating system (`GOOS`) we use CGO by default in is `darwin`.
    33  
    34  Unlike other operating systems, regardless of `CGO_ENABLED`, Go always uses
    35  "CGO" mechanisms in the runtime layer of `darwin`. This is explained in
    36  [Statically linked binaries on Mac OS X](https://developer.apple.com/library/archive/qa/qa1118/_index.html#//apple_ref/doc/uid/DTS10001666):
    37  
    38  > Apple does not support statically linked binaries on Mac OS X. A statically
    39  > linked binary assumes binary compatibility at the kernel system call
    40  > interface, and we do not make any guarantees on that front. Rather, we strive
    41  > to ensure binary compatibility in each dynamically linked system library and
    42  > framework.
    43  
    44  This plays to our advantage for system calls that aren't yet exposed in the Go
    45  standard library, notably `futimens` for nanosecond-precision timestamp
    46  manipulation.
    47  
    48  ### Why not x/sys
    49  
    50  Going beyond Go's SDK limitations can be accomplished with their [x/sys library](https://pkg.go.dev/golang.org/x/sys/unix).
    51  For example, this includes `zsyscall_freebsd_amd64.go` missing from the Go SDK.
    52  
    53  However, like all dependencies, x/sys is a source of conflict. For example,
    54  x/sys had to be in order to upgrade to Go 1.18.
    55  
    56  If we depended on x/sys, we could get more precise functionality needed for
    57  features such as clocks or more platform support for the compiler runtime.
    58  
    59  That said, formally supporting an operating system may still require testing as
    60  even use of x/sys can require platform-specifics. For example, [mmap-go](https://github.com/edsrzf/mmap-go)
    61  uses x/sys, but also mentions limitations, some not surmountable with x/sys
    62  alone.
    63  
    64  Regardless, we may at some point introduce a separate go.mod for users to use
    65  x/sys as a platform plugin without forcing all users to maintain that
    66  dependency.
    67  
    68  ## Project structure
    69  
    70  wazero uses internal packages extensively to balance API compatability desires for end users with the need to safely
    71  share internals between compilers.
    72  
    73  End-user packages include `wazero`, with `Config` structs, `api`, with shared types, and the built-in `wasi` library.
    74  Everything else is internal.
    75  
    76  We put the main program for wazero into a directory of the same name to match conventions used in `go install`,
    77  notably the name of the folder becomes the binary name. We chose to use `cmd/wazero` as it is common practice
    78  and less surprising than `wazero/wazero`.
    79  
    80  ### Internal packages
    81  
    82  Most code in wazero is internal, and it is acknowledged that this prevents external implementation of facets such as
    83  compilers or decoding. It also prevents splitting this code into separate repositories, resulting in a larger monorepo.
    84  This also adds work as more code needs to be centrally reviewed.
    85  
    86  However, the alternative is neither secure nor viable. To allow external implementation would require exporting symbols
    87  public, such as the `CodeSection`, which can easily create bugs. Moreover, there's a high drift risk for any attempt at
    88  external implementations, compounded not just by wazero's code organization, but also the fast moving Wasm and WASI
    89  specifications.
    90  
    91  For example, implementing a compiler correctly requires expertise in Wasm, Golang and assembly. This requires deep
    92  insight into how internals are meant to be structured and the various tiers of testing required for `wazero` to result
    93  in a high quality experience. Even if someone had these skills, supporting external code would introduce variables which
    94  are constants in the central one. Supporting an external codebase is harder on the project team, and could starve time
    95  from the already large burden on the central codebase.
    96  
    97  The tradeoffs of internal packages are a larger codebase and responsibility to implement all standard features. It also
    98  implies thinking about extension more as forking is not viable for reasons above also. The primary mitigation of these
    99  realities are friendly OSS licensing, high rigor and a collaborative spirit which aim to make contribution in the shared
   100  codebase productive.
   101  
   102  ### Avoiding cyclic dependencies
   103  
   104  wazero shares constants and interfaces with internal code by a sharing pattern described below:
   105  * shared interfaces and constants go in one package under root: `api`.
   106  * user APIs and structs depend on `api` and go into the root package `wazero`.
   107    * e.g. `InstantiateModule` -> `/wasm.go` depends on the type `api.Module`.
   108  * implementation code can also depend on `api` in a corresponding package under `/internal`.
   109    * Ex  package `wasm` -> `/internal/wasm/*.go` and can depend on the type `api.Module`.
   110  
   111  The above guarantees no cyclic dependencies at the cost of having to re-define symbols that exist in both packages.
   112  For example, if `wasm.Store` is a type the user needs access to, it is narrowed by a cover type in the `wazero`:
   113  
   114  ```go
   115  type runtime struct {
   116  	s *wasm.Store
   117  }
   118  ```
   119  
   120  This is not as bad as it sounds as mutations are only available via configuration. This means exported functions are
   121  limited to only a few functions.
   122  
   123  ### Avoiding security bugs
   124  
   125  In order to avoid security flaws such as code insertion, nothing in the public API is permitted to write directly to any
   126  mutable symbol in the internal package. For example, the package `api` is shared with internal code. To ensure
   127  immutability, the `api` package cannot contain any mutable public symbol, such as a slice or a struct with an exported
   128  field.
   129  
   130  In practice, this means shared functionality like memory mutation need to be implemented by interfaces.
   131  
   132  Here are some examples:
   133  * `api.Memory` protects access by exposing functions like `WriteFloat64Le` instead of exporting a buffer (`[]byte`).
   134  * There is no exported symbol for the `[]byte` representing the `CodeSection`
   135  
   136  Besides security, this practice prevents other bugs and allows centralization of validation logic such as decoding Wasm.
   137  
   138  ## API Design
   139  
   140  ### Why is `context.Context` inconsistent?
   141  
   142  It may seem strange that only certain API have an initial `context.Context`
   143  parameter. We originally had a `context.Context` for anything that might be
   144  traced, but it turned out to be only useful for lifecycle and host functions.
   145  
   146  For instruction-scoped aspects like memory updates, a context parameter is too
   147  fine-grained and also invisible in practice. For example, most users will use
   148  the compiler engine, and its memory, global or table access will never use go's
   149  context.
   150  
   151  ### Why does `api.ValueType` map to uint64?
   152  
   153  WebAssembly allows functions to be defined either by the guest or the host,
   154  with signatures expressed as WebAssembly types. For example, `i32` is a 32-bit
   155  type which might be interpreted as signed. Function signatures can have zero or
   156  more parameters or results even if WebAssembly 1.0 allows up to one result.
   157  
   158  The guest can export functions, so that the host can call it. In the case of
   159  wazero, the host is Go and an exported function can be called via
   160  `api.Function`. `api.Function` allows users to supply parameters and read
   161  results as a slice of uint64. For example, if there are no results, an empty
   162  slice is returned. The user can learn the signature via `FunctionDescription`,
   163  which returns the `api.ValueType` corresponding to each parameter or result.
   164  `api.ValueType` defines the mapping of WebAssembly types to `uint64` values for
   165  reason described in this section. The special case of `v128` is also mentioned
   166  below.
   167  
   168  wazero maps each value type to a uint64 values because it holds the largest
   169  type in WebAssembly 1.0 (i64). A slice allows you to express empty (e.g. a
   170  nullary signature), for example a start function.
   171  
   172  Here's an example of calling a function, noting this syntax works for both a
   173  signature `(param i32 i32) (result i32)` and `(param i64 i64) (result i64)`
   174  ```go
   175  x, y := uint64(1), uint64(2)
   176  results, err := mod.ExportedFunction("add").Call(ctx, x, y)
   177  if err != nil {
   178  	log.Panicln(err)
   179  }
   180  fmt.Printf("%d + %d = %d\n", x, y, results[0])
   181  ```
   182  
   183  WebAssembly does not define an encoding strategy for host defined parameters or
   184  results. This means the encoding rules above are defined by wazero instead. To
   185  address this, we clarified mapping both in `api.ValueType` and added helper
   186  functions like `api.EncodeF64`. This allows users conversions typical in Go
   187  programming, and utilities to avoid ambiguity and edge cases around casting.
   188  
   189  Alternatively, we could have defined a byte buffer based approach and a binary
   190  encoding of value types in and out. For example, an empty byte slice would mean
   191  no values, while a non-empty could use a binary encoding for supported values.
   192  This could work, but it is more difficult for the normal case of i32 and i64.
   193  It also shares a struggle with the current approach, which is that value types
   194  were added after WebAssembly 1.0 and not all of them have an encoding. More on
   195  this below.
   196  
   197  In summary, wazero chose an approach for signature mapping because there was
   198  none, and the one we chose biases towards simplicity with integers and handles
   199  the rest with documentation and utilities.
   200  
   201  #### Post 1.0 value types
   202  
   203  Value types added after WebAssembly 1.0 stressed the current model, as some
   204  have no encoding or are larger than 64 bits. While problematic, these value
   205  types are not commonly used in exported (extern) functions. However, some
   206  decisions were made and detailed below.
   207  
   208  For example `externref` has no guest representation. wazero chose to map
   209  references to uint64 as that's the largest value needed to encode a pointer on
   210  supported platforms. While there are two reference types, `externref` and
   211  `functype`, the latter is an internal detail of function tables, and the former
   212  is rarely if ever used in function signatures as of the end of 2022.
   213  
   214  The only value larger than 64 bits is used for SIMD (`v128`). Vectorizing via
   215  host functions is not used as of the end of 2022. Even if it were, it would be
   216  inefficient vs guest vectorization due to host function overhead. In other
   217  words, the `v128` value type is unlikely to be in an exported function
   218  signature. That it requires two uint64 values to encode is an internal detail
   219  and not worth changing the exported function interface `api.Function`, as doing
   220  so would break all users.
   221  
   222  ### Interfaces, not structs
   223  
   224  All exported types in public packages, regardless of configuration vs runtime, are interfaces. The primary benefits are
   225  internal flexibility and avoiding people accidentally mis-initializing by instantiating the types on their own vs using
   226  the `NewXxx` constructor functions. In other words, there's less support load when things can't be done incorrectly.
   227  
   228  Here's an example:
   229  ```go
   230  rt := &RuntimeConfig{} // not initialized properly (fields are nil which shouldn't be)
   231  rt := RuntimeConfig{} // not initialized properly (should be a pointer)
   232  rt := wazero.NewRuntimeConfig() // initialized properly
   233  ```
   234  
   235  There are a few drawbacks to this, notably some work for maintainers.
   236  * Interfaces are decoupled from the structs implementing them, which means the signature has to be repeated twice.
   237  * Interfaces have to be documented and guarded at time of use, that 3rd party implementations aren't supported.
   238  * As of Golang 1.21, interfaces are still [not well supported](https://github.com/golang/go/issues/5860) in godoc.
   239  
   240  ## Config
   241  
   242  wazero configures scopes such as Runtime and Module using `XxxConfig` types. For example, `RuntimeConfig` configures
   243  `Runtime` and `ModuleConfig` configure `Module` (instantiation). In all cases, config types begin defaults and can be
   244  customized by a user, e.g., selecting features or a module name override.
   245  
   246  ### Why don't we make each configuration setting return an error?
   247  No config types create resources that would need to be closed, nor do they return errors on use. This helps reduce
   248  resource leaks, and makes chaining easier. It makes it possible to parse configuration (ex by parsing yaml) independent
   249  of validating it.
   250  
   251  Instead of:
   252  ```
   253  cfg, err = cfg.WithFS(fs)
   254  if err != nil {
   255    return err
   256  }
   257  cfg, err = cfg.WithName(name)
   258  if err != nil {
   259    return err
   260  }
   261  mod, err = rt.InstantiateModuleWithConfig(ctx, code, cfg)
   262  if err != nil {
   263    return err
   264  }
   265  ```
   266  
   267  There's only one call site to handle errors:
   268  ```
   269  cfg = cfg.WithFS(fs).WithName(name)
   270  mod, err = rt.InstantiateModuleWithConfig(ctx, code, cfg)
   271  if err != nil {
   272    return err
   273  }
   274  ```
   275  
   276  This allows users one place to look for errors, and also the benefit that if anything internally opens a resource, but
   277  errs, there's nothing they need to close. In other words, users don't need to track which resources need closing on
   278  partial error, as that is handled internally by the only code that can read configuration fields.
   279  
   280  ### Why are configuration immutable?
   281  While it seems certain scopes like `Runtime` won't repeat within a process, they do, possibly in different goroutines.
   282  For example, some users create a new runtime for each module, and some re-use the same base module configuration with
   283  only small updates (ex the name) for each instantiation. Making configuration immutable allows them to be safely used in
   284  any goroutine.
   285  
   286  Since config are immutable, changes apply via return val, similar to `append` in a slice.
   287  
   288  For example, both of these are the same sort of error:
   289  ```go
   290  append(slice, element) // bug as only the return value has the updated slice.
   291  cfg.WithName(next) // bug as only the return value has the updated name.
   292  ```
   293  
   294  Here's an example of correct use: re-assigning explicitly or via chaining.
   295  ```go
   296  cfg = cfg.WithName(name) // explicit
   297  
   298  mod, err = rt.InstantiateModuleWithConfig(ctx, code, cfg.WithName(name)) // implicit
   299  if err != nil {
   300    return err
   301  }
   302  ```
   303  
   304  ### Why aren't configuration assigned with option types?
   305  The option pattern is a familiar one in Go. For example, someone defines a type `func (x X) err` and uses it to update
   306  the target. For example, you could imagine wazero could choose to make `ModuleConfig` from options vs chaining fields.
   307  
   308  Ex instead of:
   309  ```go
   310  type ModuleConfig interface {
   311  	WithName(string) ModuleConfig
   312  	WithFS(fs.FS) ModuleConfig
   313  }
   314  
   315  struct moduleConfig {
   316  	name string
   317  	fs fs.FS
   318  }
   319  
   320  func (c *moduleConfig) WithName(name string) ModuleConfig {
   321      ret := *c // copy
   322      ret.name = name
   323      return &ret
   324  }
   325  
   326  func (c *moduleConfig) WithFS(fs fs.FS) ModuleConfig {
   327      ret := *c // copy
   328      ret.setFS("/", fs)
   329      return &ret
   330  }
   331  
   332  config := r.NewModuleConfig().WithFS(fs)
   333  configDerived := config.WithName("name")
   334  ```
   335  
   336  An option function could be defined, then refactor each config method into an name prefixed option function:
   337  ```go
   338  type ModuleConfig interface {
   339  }
   340  struct moduleConfig {
   341      name string
   342      fs fs.FS
   343  }
   344  
   345  type ModuleConfigOption func(c *moduleConfig)
   346  
   347  func ModuleConfigName(name string) ModuleConfigOption {
   348      return func(c *moduleConfig) {
   349          c.name = name
   350  	}
   351  }
   352  
   353  func ModuleConfigFS(fs fs.FS) ModuleConfigOption {
   354      return func(c *moduleConfig) {
   355          c.fs = fs
   356      }
   357  }
   358  
   359  func (r *runtime) NewModuleConfig(opts ...ModuleConfigOption) ModuleConfig {
   360  	ret := newModuleConfig() // defaults
   361      for _, opt := range opts {
   362          opt(&ret.config)
   363      }
   364      return ret
   365  }
   366  
   367  func (c *moduleConfig) WithOptions(opts ...ModuleConfigOption) ModuleConfig {
   368      ret := *c // copy base config
   369      for _, opt := range opts {
   370          opt(&ret.config)
   371      }
   372      return ret
   373  }
   374  
   375  config := r.NewModuleConfig(ModuleConfigFS(fs))
   376  configDerived := config.WithOptions(ModuleConfigName("name"))
   377  ```
   378  
   379  wazero took the path of the former design primarily due to:
   380  * interfaces provide natural namespaces for their methods, which is more direct than functions with name prefixes.
   381  * parsing config into function callbacks is more direct vs parsing config into a slice of functions to do the same.
   382  * in either case derived config is needed and the options pattern is more awkward to achieve that.
   383  
   384  There are other reasons such as test and debug being simpler without options: the above list is constrained to conserve
   385  space. It is accepted that the options pattern is common in Go, which is the main reason for documenting this decision.
   386  
   387  ### Why aren't config types deeply structured?
   388  wazero's configuration types cover the two main scopes of WebAssembly use:
   389  * `RuntimeConfig`: This is the broadest scope, so applies also to compilation
   390    and instantiation. e.g. This controls the WebAssembly Specification Version.
   391  * `ModuleConfig`: This affects modules instantiated after compilation and what
   392    resources are allowed. e.g. This defines how or if STDOUT is captured. This
   393    also allows sub-configuration of `FSConfig`.
   394  
   395  These default to a flat definition each, with lazy sub-configuration only after
   396  proven to be necessary. A flat structure is easier to work with and is also
   397  easy to discover. Unlike the option pattern described earlier, more
   398  configuration in the interface doesn't taint the package namespace, only
   399  `ModuleConfig`.
   400  
   401  We default to a flat structure to encourage simplicity. If we eagerly broke out
   402  all possible configurations into sub-types (e.g. ClockConfig), it would be hard
   403  to notice configuration sprawl. By keeping the config flat, it is easy to see
   404  the cognitive load we may be adding to our users.
   405  
   406  In other words, discomfort adding more configuration is a feature, not a bug.
   407  We should only add new configuration rarely, and before doing so, ensure it
   408  will be used. In fact, this is why we support using context fields for
   409  experimental configuration. By letting users practice, we can find out if a
   410  configuration was a good idea or not before committing to it, and potentially
   411  sprawling our types.
   412  
   413  In reflection, this approach worked well for the nearly 1.5 year period leading
   414  to version 1.0. We've only had to create a single sub-configuration, `FSConfig`,
   415  and it was well understood why when it occurred.
   416  
   417  ## Why does `ModuleConfig.WithStartFunctions` default to `_start`?
   418  
   419  We formerly had functions like `StartWASICommand` that would verify
   420  preconditions and start WASI's `_start` command. However, this caused confusion
   421  because both many languages compiled a WASI dependency, and many did so
   422  inconsistently.
   423  
   424  The conflict is that exported functions need to use features the language
   425  runtime provides, such as garbage collection. There's a "chicken-egg problem"
   426  where `_start` needs to complete in order for exported behavior to work.
   427  
   428  For example, unlike `GOOS=wasip1` in Go 1.21, TinyGo's "wasi" target supports
   429  function exports. So, the only way to use FFI style is via the "wasi" target.
   430  Not explicitly calling `_start` before an ABI such as wapc-go, would crash, due
   431  to setup not happening (e.g. to implement `panic`). Other embedders such as
   432  Envoy also called `_start` for the same reason. To avoid a common problem for
   433  users unaware of WASI, and also to simplify normal use of WASI (e.g. `main`),
   434  we added `_start` to `ModuleConfig.WithStartFunctions`.
   435  
   436  In cases of multiple initializers, such as in wapc-go, users can override this
   437  to add the others *after* `_start`. Users who want to explicitly control
   438  `_start`, such as some of our unit tests, can clear the start functions and
   439  remove it.
   440  
   441  This decision was made in 2022, and holds true in 2023, even with the
   442  introduction of "wasix". It holds because "wasix" is backwards compatible with
   443  "wasip1". In the future, there will be other ways to start applications, and
   444  may not be backwards compatible with "wasip1".
   445  
   446  Most notably WASI "Preview 2" is not implemented in a way compatible with
   447  wasip1. Its start function is likely to be different, and defined in the
   448  wasi-cli "world". When the design settles, and it is implemented by compilers,
   449  wazero will attempt to support "wasip2". However, it won't do so in a way that
   450  breaks existing compilers.
   451  
   452  In other words, we won't remove `_start` if "wasip2" continues a path of an
   453  alternate function name. If we did, we'd break existing users despite our
   454  compatibility promise saying we don't. The most likely case is that when we
   455  build-in something incompatible with "wasip1", that start function will be
   456  added to the start functions list in addition to `_start`.
   457  
   458  See http://wasix.org
   459  See https://github.com/WebAssembly/wasi-cli
   460  
   461  ## Runtime == Engine+Store
   462  wazero defines a single user-type which combines the specification concept of `Store` with the unspecified `Engine`
   463  which manages them.
   464  
   465  ### Why not multi-store?
   466  Multi-store isn't supported as the extra tier complicates lifecycle and locking. Moreover, in practice it is unusual for
   467  there to be an engine that has multiple stores which have multiple modules. More often, it is the case that there is
   468  either 1 engine with 1 store and multiple modules, or 1 engine with many stores, each having 1 non-host module. In worst
   469  case, a user can use multiple runtimes until "multi-store" is better understood.
   470  
   471  If later, we have demand for multiple stores, that can be accomplished by overload. e.g. `Runtime.InstantiateInStore` or
   472  `Runtime.Store(name) Store`.
   473  
   474  ## wazeroir
   475  wazero's intermediate representation (IR) is called `wazeroir`. Lowering into an IR provides us a faster interpreter
   476  and a closer to assembly representation for used by our compiler.
   477  
   478  ### Intermediate Representation (IR) design
   479  `wazeroir`'s initial design borrowed heavily from the defunct `microwasm` format (a.k.a. LightbeamIR). Notably,
   480  `wazeroir` doesn't have block operations: this simplifies the implementation.
   481  
   482  Note: `microwasm` was never specified formally, and only exists in a historical codebase of wasmtime:
   483  https://github.com/bytecodealliance/wasmtime/blob/v0.29.0/crates/lightbeam/src/microwasm.rs
   484  
   485  ## Exit
   486  
   487  ### Why do we only return a `sys.ExitError` on a non-zero exit code?
   488  
   489  It is reasonable to think an exit error should be returned, even if the code is
   490  success (zero). Even on success, the module is no longer functional. For
   491  example, function exports would error later. However, wazero does not. The only
   492  time `sys.ExitError` is on error (non-zero).
   493  
   494  This decision was to improve performance and ergonomics for guests that both
   495  use WASI (have a `_start` function), and also allow custom exports.
   496  Specifically, Rust, TinyGo and normal wasi-libc, don't exit the module during
   497  `_start`. If they did, it would invalidate their function exports. This means
   498  it is unlikely most compilers will change this behavior.
   499  
   500  `GOOS=waspi1` from Go 1.21 does exit during `_start`. However, it doesn't
   501  support other exports besides `_start`, and `_start` is not defined to be
   502  called multiple times anyway.
   503  
   504  Since `sys.ExitError` is not always returned, we added `Module.IsClosed` for
   505  defensive checks. This helps integrators avoid calling functions which will
   506  always fail.
   507  
   508  ### Why panic with `sys.ExitError` after a host function exits?
   509  
   510  Currently, the only portable way to stop processing code is via panic. For
   511  example, WebAssembly "trap" instructions, such as divide by zero, are
   512  implemented via panic. This ensures code isn't executed after it.
   513  
   514  When code reaches the WASI `proc_exit` instruction, we need to stop processing.
   515  Regardless of the exit code, any code invoked after exit would be in an
   516  inconsistent state. This is likely why unreachable instructions are sometimes
   517  inserted after exit: https://github.com/emscripten-core/emscripten/issues/12322
   518  
   519  ## WASI
   520  
   521  Unfortunately, (WASI Snapshot Preview 1)[https://github.com/WebAssembly/WASI/blob/snapshot-01/phases/snapshot/docs.md] is not formally defined enough, and has APIs with ambiguous semantics.
   522  This section describes how Wazero interprets and implements the semantics of several WASI APIs that may be interpreted differently by different wasm runtimes.
   523  Those APIs may affect the portability of a WASI application.
   524  
   525  ### Why don't we attempt to pass wasi-testsuite on user-defined `fs.FS`?
   526  
   527  While most cases work fine on an `os.File` based implementation, we won't
   528  promise wasi-testsuite compatibility on user defined wrappers of `os.DirFS`.
   529  The only option for real systems is to use our `sysfs.FS`.
   530  
   531  There are a lot of areas where windows behaves differently, despite the
   532  `os.File` abstraction. This goes well beyond file locking concerns (e.g.
   533  `EBUSY` errors on open files). For example, errors like `ACCESS_DENIED` aren't
   534  properly mapped to `EPERM`. There are trickier parts too. `FileInfo.Sys()`
   535  doesn't return enough information to build inodes needed for WASI. To rebuild
   536  them requires the full path to the underlying file, not just its directory
   537  name, and there's no way for us to get that information. At one point we tried,
   538  but in practice things became tangled and functionality such as read-only
   539  wrappers became untenable. Finally, there are version-specific behaviors which
   540  are difficult to maintain even in our own code. For example, go 1.20 opens
   541  files in a different way than versions before it.
   542  
   543  ### Why aren't WASI rules enforced?
   544  
   545  The [snapshot-01](https://github.com/WebAssembly/WASI/blob/snapshot-01/phases/snapshot/docs.md) version of WASI has a
   546  number of rules for a "command module", but only the memory export rule is enforced. If a "_start" function exists, it
   547  is enforced to be the correct signature and succeed, but the export itself isn't enforced. It follows that this means
   548  exports are not required to be contained to a "_start" function invocation. Finally, the "__indirect_function_table"
   549  export is also not enforced.
   550  
   551  The reason for the exceptions are that implementations aren't following the rules. For example, TinyGo doesn't export
   552  "__indirect_function_table", so crashing on this would make wazero unable to run TinyGo modules. Similarly, modules
   553  loaded by wapc-go don't always define a "_start" function. Since "snapshot-01" is not a proper version, and certainly
   554  not a W3C recommendation, there's no sense in breaking users over matters like this.
   555  
   556  ### Why is I/O configuration not coupled to WASI?
   557  
   558  WebAssembly System Interfaces (WASI) is a formalization of a practice that can be done anyway: Define a host function to
   559  access a system interface, such as writing to STDOUT. WASI stalled at snapshot-01 and as of early 2023, is being
   560  rewritten entirely.
   561  
   562  This instability implies a need to transition between WASI specs, which places wazero in a position that requires
   563  decoupling. For example, if code uses two different functions to call `fd_write`, the underlying configuration must be
   564  centralized and decoupled. Otherwise, calls using the same file descriptor number will end up writing to different
   565  places.
   566  
   567  In short, wazero defined system configuration in `ModuleConfig`, not a WASI type. This allows end-users to switch from
   568  one spec to another with minimal impact. This has other helpful benefits, as centralized resources are simpler to close
   569  coherently (ex via `Module.Close`).
   570  
   571  In reflection, this worked well as more ABI became usable in wazero. For example, `GOOS=js GOARCH=wasm` code uses the
   572  same `ModuleConfig` (and `FSConfig`) WASI uses, and in compatible ways.
   573  
   574  ### Background on `ModuleConfig` design
   575  
   576  WebAssembly 1.0 (20191205) specifies some aspects to control isolation between modules ([sandboxing](https://en.wikipedia.org/wiki/Sandbox_(computer_security))).
   577  For example, `wasm.Memory` has size constraints and each instance of it is isolated from each other. While `wasm.Memory`
   578  can be shared, by exporting it, it is not exported by default. In fact a WebAssembly Module (Wasm) has no memory by
   579  default.
   580  
   581  While memory is defined in WebAssembly 1.0 (20191205), many aspects are not. Let's use an example of `exec.Cmd` as for
   582  example, a WebAssembly System Interfaces (WASI) command is implemented as a module with a `_start` function, and in many
   583  ways acts similar to a process with a `main` function.
   584  
   585  To capture "hello world" written to the console (stdout a.k.a. file descriptor 1) in `exec.Cmd`, you would set the
   586  `Stdout` field accordingly, perhaps to a buffer. In WebAssembly 1.0 (20191205), the only way to perform something like
   587  this is via a host function (ex `HostModuleFunctionBuilder`) and internally copy memory corresponding to that string
   588  to a buffer.
   589  
   590  WASI implements system interfaces with host functions. Concretely, to write to console, a WASI command `Module` imports
   591  "fd_write" from "wasi_snapshot_preview1" and calls it with the `fd` parameter set to 1 (STDOUT).
   592  
   593  The [snapshot-01](https://github.com/WebAssembly/WASI/blob/snapshot-01/phases/snapshot/docs.md) version of WASI has no
   594  means to declare configuration, although its function definitions imply configuration for example if fd 1 should exist,
   595  and if so where should it write. Moreover, snapshot-01 was last updated in late 2020 and the specification is being
   596  completely rewritten as of early 2022. This means WASI as defined by "snapshot-01" will not clarify aspects like which
   597  file descriptors are required. While it is possible a subsequent version may, it is too early to tell as no version of
   598  WASI has reached a stage near W3C recommendation. Even if it did, module authors are not required to only use WASI to
   599  write to console, as they can define their own host functions, such as they did before WASI existed.
   600  
   601  wazero aims to serve Go developers as a primary function, and help them transition between WASI specifications. In
   602  order to do this, we have to allow top-level configuration. To ensure isolation by default, `ModuleConfig` has WithXXX
   603  that override defaults to no-op or empty. One `ModuleConfig` instance is used regardless of how many times the same WASI
   604  functions are imported. The nil defaults allow safe concurrency in these situations, as well lower the cost when they
   605  are never used. Finally, a one-to-one mapping with `Module` allows the module to close the `ModuleConfig` instead of
   606  confusing users with another API to close.
   607  
   608  Naming, defaults and validation rules of aspects like `STDIN` and `Environ` are intentionally similar to other Go
   609  libraries such as `exec.Cmd` or `syscall.SetEnv`, and differences called out where helpful. For example, there's no goal
   610  to emulate any operating system primitive specific to Windows (such as a 'c:\' drive). Moreover, certain defaults
   611  working with real system calls are neither relevant nor safe to inherit: For example, `exec.Cmd` defaults to read STDIN
   612  from a real file descriptor ("/dev/null"). Defaulting to this, vs reading `io.EOF`, would be unsafe as it can exhaust
   613  file descriptors if resources aren't managed properly. In other words, blind copying of defaults isn't wise as it can
   614  violate isolation or endanger the embedding process. In summary, we try to be similar to normal Go code, but often need
   615  act differently and document `ModuleConfig` is more about emulating, not necessarily performing real system calls.
   616  
   617  ## File systems
   618  
   619  ### Motivation on `sys.FS`
   620  
   621  The `sys.FS` abstraction in wazero was created because of limitations in
   622  `fs.FS`, and `fs.File` in Go. Compilers targeting `wasip1` may access
   623  functionality that writes new files. The ability to overcome this was requested
   624  even before wazero was named this, via issue #21 in March 2021.
   625  
   626  A month later, golang/go#45757 was raised by someone else on the same topic. As
   627  of July 2023, this has not resolved to a writeable file system abstraction.
   628  
   629  Over the next year more use cases accumulated, consolidated in March 2022 into
   630  #390. This closed in January 2023 with a milestone of providing more
   631  functionality, limited to users giving a real directory. This didn't yet expose
   632  a file abstraction for general purpose use. Internally, this used `os.File`.
   633  However, a wasm module instance is a virtual machine. Only supporting `os.File`
   634  breaks sand-boxing use cases. Moreover, `os.File` is not an interface. Even
   635  though this abstracts functionality, it does allow interception use cases.
   636  
   637  Hence, a few days later in January 2023, we had more issues asking to expose an
   638  abstraction, #1013 and later #1532, on use cases like masking access to files.
   639  In other words, the use case requests never stopped, and aren't solved by
   640  exposing only real files.
   641  
   642  In summary, the primary motivation for exposing a replacement for `fs.FS` and
   643  `fs.File` was around repetitive use case requests for years, around
   644  interception and the ability to create new files, both virtual and real files.
   645  While some use cases are solved with real files, not all are. Regardless, an
   646  interface approach is necessary to ensure users can intercept I/O operations.
   647  
   648  ### Why doesn't `sys.File` have a `Fd()` method?
   649  
   650  There are many features we could expose. We could make File expose underlying
   651  file descriptors in case they are supported, for integration of system calls
   652  that accept multiple ones, namely `poll` for multiplexing. This special case is
   653  described in a subsequent section.
   654  
   655  As noted above, users have been asking for a file abstraction for over two
   656  years, and a common answer was to wait. Making users wait is a problem,
   657  especially so long. Good reasons to make people wait are stabilization. Edge
   658  case features are not a great reason to hold abstractions from users.
   659  
   660  Another reason is implementation difficulty. Go did not attempt to abstract
   661  file descriptors. For example, unlike `fs.ReadFile` there is no `fs.FdFile`
   662  interface. Most likely, this is because file descriptors are an implementation
   663  detail of common features. Programming languages, including Go, do not require
   664  end users to know about file descriptors. Types such as `fs.File` can be used
   665  without any knowledge of them. Implementations may or may not have file
   666  descriptors. For example, in Go, `os.DirFS` has underlying file descriptors
   667  while `embed.FS` does not.
   668  
   669  Despite this, some may want to expose a non-standard interface because
   670  `os.File` has `Fd() uintptr` to return a file descriptor. Mainly, this is
   671  handy to integrate with `syscall` package functions (on `GOOS` values that
   672  declare them). Notice, though that `uintptr` is unsafe and not an abstraction.
   673  Close inspection will find some `os.File` types internally use `poll.FD`
   674  instead, yet this is not possible to use abstractly because that type is not
   675  exposed. For example, `plan9` uses a different type than `poll.FD`. In other
   676  words, even in real files, `Fd()` is not wholly portable, despite it being
   677  useful on many operating systems with the `syscall` package.
   678  
   679  The reasons above, why Go doesn't abstract `FdFile` interface are a subset of
   680  reasons why `sys.File` does not. If we exposed `File.Fd()` we not only would
   681  have to declare all the edge cases that Go describes including impact of
   682  finalizers, we would have to describe these in terms of virtualized files.
   683  Then, we would have to reason with this value vs our existing virtualized
   684  `sys.FileTable`, mapping whatever type we return to keys in that table, also
   685  in consideration of garbage collection impact. The combination of issues like
   686  this could lead down a path of not implementing a file system abstraction at
   687  all, and instead a weak key mapped abstraction of the `syscall` package. Once
   688  we finished with all the edge cases, we would have lost context of the original
   689  reason why we started.. simply to allow file write access!
   690  
   691  When wazero attempts to do more than what the Go programming language team, it
   692  has to be carefully evaluated, to:
   693  * Be possible to implement at least for `os.File` backed files
   694  * Not be confusing or cognitively hard for virtual file systems and normal use.
   695  * Affordable: custom code is solely the responsible by the core team, a much
   696    smaller group of individuals than who maintain the Go programming language.
   697  
   698  Due to problems well known in Go, consideration of the end users who constantly
   699  ask for basic file system functionality, and the difficulty virtualizing file
   700  descriptors at multiple levels, we don't expose `Fd()` and likely won't ever
   701  expose `Fd()` on `sys.File`.
   702  
   703  ### Why does `sys.File` have a `Poll()` method, while `sys.FS` does not?
   704  
   705  wazero exposes `File.Poll` which allows one-at-a-time poll use cases,
   706  requested by multiple users. This not only includes abstract tests such as
   707  Go 1.21 `GOOS=wasip1`, but real use cases including python and container2wasm
   708  repls, as well listen sockets. The main use cases is non-blocking poll on a
   709  single file. Being a single file, this has no risk of problems such as
   710  head-of-line blocking, even when emulated.
   711  
   712  The main use case of multi-poll are bidirectional network services, something
   713  not used in `GOOS=wasip1` standard libraries, but could be in the future.
   714  Moving forward without a multi-poller allows wazero to expose its file system
   715  abstraction instead of continuing to hold back it back for edge cases. We'll
   716  continue discussion below regardless, as rationale was requested.
   717  
   718  You can loop through multiple `sys.File`, using `File.Poll` to see if an event
   719  is ready, but there is a head-of-line blocking problem. If a long timeout is
   720  used, bad luck could have a file that has nothing to read or write before one
   721  that does. This could cause more blocking than necessary, even if you could
   722  poll the others just after with a zero timeout. What's worse than this is if
   723  unlimited blocking was used (`timeout=-1`). The host implementations could use
   724  goroutines to avoid this, but interrupting a "forever" poll is problematic. All
   725  of these are reasons to consider a multi-poll API, but do not require exporting
   726  `File.Fd()`.
   727  
   728  Should multi-poll becomes critical, `sys.FS` could expose a `Poll` function
   729  like below, despite it being the non-portable, complicated if possible to
   730  implement on all platforms and virtual file systems.
   731  ```go
   732  ready, errno := fs.Poll([]sys.PollFile{{f1, sys.POLLIN}, {f2, sys.POLLOUT}}, timeoutMillis)
   733  ```
   734  
   735  A real filesystem could handle this by using an approach like the internal
   736  `unix.Poll` function in Go, passing file descriptors on unix platforms, or
   737  returning `sys.ENOSYS` for unsupported operating systems. Implementation for
   738  virtual files could have a strategy around timeout to avoid the worst case of
   739  head-of-line blocking (unlimited timeout).
   740  
   741  Let's remember that when designing abstractions, it is not best to add an
   742  interface for everything. Certainly, Go doesn't, as evidenced by them not
   743  exposing `poll.FD` in `os.File`! Such a multi-poll could be limited to
   744  built-in filesystems in the wazero repository, avoiding complexity of trying to
   745  support and test this abstractly. This would still permit multiplexing for CLI
   746  users, and also permit single file polling as exists now.
   747  
   748  ### Why doesn't wazero implement the working directory?
   749  
   750  An early design of wazero's API included a `WithWorkDirFS` which allowed
   751  control over which file a relative path such as "./config.yml" resolved to,
   752  independent of the root file system. This intended to help separate concerns
   753  like mutability of files, but it didn't work and was removed.
   754  
   755  Compilers that target wasm act differently with regard to the working
   756  directory. For example, while `GOOS=js` uses host functions to track the
   757  working directory, WASI host functions do not. wasi-libc, used by TinyGo,
   758  tracks working directory changes in compiled wasm instead: initially "/" until
   759  code calls `chdir`. Zig assumes the first pre-opened file descriptor is the
   760  working directory.
   761  
   762  The only place wazero can standardize a layered concern is via a host function.
   763  Since WASI doesn't use host functions to track the working directory, we can't
   764  standardize the storage and initial value of it.
   765  
   766  Meanwhile, code may be able to affect the working directory by compiling
   767  `chdir` into their main function, using an argument or ENV for the initial
   768  value (possibly `PWD`). Those unable to control the compiled code should only
   769  use absolute paths in configuration.
   770  
   771  See
   772  * https://github.com/golang/go/blob/go1.20/src/syscall/fs_js.go#L324
   773  * https://github.com/WebAssembly/wasi-libc/pull/214#issue-673090117
   774  * https://github.com/ziglang/zig/blob/53a9ee699a35a3d245ab6d1dac1f0687a4dcb42c/src/main.zig#L32
   775  
   776  ### Why ignore the error returned by io.Reader when n > 1?
   777  
   778  Per https://pkg.go.dev/io#Reader, if we receive an error, any bytes read should
   779  be processed first. At the syscall abstraction (`fd_read`), the caller is the
   780  processor, so we can't process the bytes inline and also return the error (as
   781  `EIO`).
   782  
   783  Let's assume we want to return the bytes read on error to the caller. This
   784  implies we at least temporarily ignore the error alongside them. The choice
   785  remaining is whether to persist the error returned with the read until a
   786  possible next call, or ignore the error.
   787  
   788  If we persist an error returned, it would be coupled to a file descriptor, but
   789  effectively it is boolean as this case coerces to `EIO`. If we track a "last
   790  error" on a file descriptor, it could be complicated for a couple reasons
   791  including whether the error is transient or permanent, or if the error would
   792  apply to any FD operation, or just read. Finally, there may never be a
   793  subsequent read as perhaps the bytes leading up to the error are enough to
   794  satisfy the processor.
   795  
   796  This decision boils down to whether or not to track an error bit per file
   797  descriptor or not. If not, the assumption is that a subsequent operation would
   798  also error, this time without reading any bytes.
   799  
   800  The current opinion is to go with the simplest path, which is to return the
   801  bytes read and ignore the error the there were any. Assume a subsequent
   802  operation will err if it needs to. This helps reduce the complexity of the code
   803  in wazero and also accommodates the scenario where the bytes read are enough to
   804  satisfy its processor.
   805  
   806  ### File descriptor allocation strategy
   807  
   808  File descriptor allocation currently uses a strategy similar the one implemented
   809  by unix systems: when opening a file, the lowest unused number is picked.
   810  
   811  The WASI standard documents that programs cannot expect that file descriptor
   812  numbers will be allocated with a lowest-first strategy, and they should instead
   813  assume the values will be random. Since _random_ is a very imprecise concept in
   814  computers, we technically satisfying the implementation with the descriptor
   815  allocation strategy we use in Wazero. We could imagine adding more _randomness_
   816  to the descriptor selection process, however this should never be used as a
   817  security measure to prevent applications from guessing the next file number so
   818  there are no strong incentives to complicate the logic.
   819  
   820  ### Why does `FSConfig.WithDirMount` not match behaviour with `os.DirFS`?
   821  
   822  It may seem that we should require any feature that seems like a standard
   823  library in Go, to behave the same way as the standard library. Doing so would
   824  present least surprise to Go developers. In the case of how we handle
   825  filesystems, we break from that as it is incompatible with the expectations of
   826  WASI, the most commonly implemented filesystem ABI.
   827  
   828  The main reason is that `os.DirFS` is a virtual filesystem abstraction while
   829  WASI is an abstraction over syscalls. For example, the signature of `fs.Open`
   830  does not permit use of flags. This creates conflict on what default behaviors
   831  to take when Go implemented `os.DirFS`. On the other hand, `path_open` can pass
   832  flags, and in fact tests require them to be honored in specific ways. This
   833  extends beyond WASI as even `GOOS=js GOARCH=wasm` compiled code requires
   834  certain flags passed to `os.OpenFile` which are impossible to pass due to the
   835  signature of `fs.FS`.
   836  
   837  This conflict requires us to choose what to be more compatible with, and which
   838  type of user to surprise the least. We assume there will be more developers
   839  compiling code to wasm than developers of custom filesystem plugins, and those
   840  compiling code to wasm will be better served if we are compatible with WASI.
   841  Hence on conflict, we prefer WASI behavior vs the behavior of `os.DirFS`.
   842  
   843  Meanwhile, it is possible that Go will one day compile to `GOOS=wasi` in
   844  addition to `GOOS=js`. When there is shared stake in WASI, we expect gaps like
   845  these to be easier to close.
   846  
   847  See https://github.com/WebAssembly/wasi-testsuite
   848  See https://github.com/golang/go/issues/58141
   849  
   850  ## Why is our `Readdir` function more like Go's `os.File` than POSIX `readdir`?
   851  
   852  At one point we attempted to move from a bulk `Readdir` function to something
   853  more like the POSIX `DIR` struct, exposing functions like `telldir`, `seekdir`
   854  and `readdir`. However, we chose the design more like `os.File.Readdir`,
   855  because it performs and fits wasip1 better.
   856  
   857  ### wasip1/wasix
   858  
   859  `fd_readdir` in wasip1 (and so also wasix) is like `getdents` in Linux, not
   860  `readdir` in POSIX. `getdents` is more like Go's `os.File.Readdir`.
   861  
   862  We currently have an internal type `sys.DirentCache` which only is used by
   863  wasip1 or wasix. When `HostModuleBuilder` adds support for instantiation state,
   864  we could move this to the `wasi_snapshot_preview1` package. Meanwhile, all
   865  filesystem code is internal anyway, so this special-case is acceptable.
   866  
   867  ### wasip2
   868  
   869  `directory-entry-stream` in wasi-filesystem preview2 is defined in component
   870  model, not an ABI, but in wasmtime it is a consuming iterator. A consuming
   871  iterator is easy to support with anything (like `Readdir(1)`), even if it is
   872  inefficient as you can neither bulk read nor skip. The implementation of the
   873  preview1 adapter (uses preview2) confirms this. They use a dirent cache similar
   874  in some ways to our `sysfs.DirentCache`. As there is no seek concept in
   875  preview2, they interpret the cookie as numeric and read on repeat entries when
   876  a cache wasn't available. Note: we currently do not skip-read like this as it
   877  risks buffering large directories, and no user has requested entries before the
   878  cache, yet.
   879  
   880  Regardless, wasip2 is not complete until the end of 2023. We can defer design
   881  discussion until after it is stable and after the reference impl wasmtime
   882  implements it.
   883  
   884  See
   885   * https://github.com/WebAssembly/wasi-filesystem/blob/ef9fc87c07323a6827632edeb6a7388b31266c8e/example-world.md#directory_entry_stream
   886   * https://github.com/bytecodealliance/wasmtime/blob/b741f7c79d72492d17ab8a29c8ffe4687715938e/crates/wasi/src/preview2/preview2/filesystem.rs#L286-L296
   887   * https://github.com/bytecodealliance/preview2-prototyping/blob/e4c04bcfbd11c42c27c28984948d501a3e168121/crates/wasi-preview1-component-adapter/src/lib.rs#L2131-L2137
   888   * https://github.com/bytecodealliance/preview2-prototyping/blob/e4c04bcfbd11c42c27c28984948d501a3e168121/crates/wasi-preview1-component-adapter/src/lib.rs#L936
   889  
   890  ### wasip3
   891  
   892  `directory-entry-stream` is documented to change significantly in wasip3 moving
   893  from synchronous to synchronous streams. This is dramatically different than
   894  POSIX `readdir` which is synchronous.
   895  
   896  Regardless, wasip3 is not complete until after wasip2, which means 2024 or
   897  later. We can defer design discussion until after it is stable and after the
   898  reference impl wasmtime implements it.
   899  
   900  See
   901   * https://github.com/WebAssembly/WASI/blob/ddfe3d1dda5d1473f37ecebc552ae20ce5fd319a/docs/WitInWasi.md#Streams
   902   * https://docs.google.com/presentation/d/1MNVOZ8hdofO3tI0szg_i-Yoy0N2QPU2C--LzVuoGSlE/edit#slide=id.g1270ef7d5b6_0_662
   903  
   904  ### How do we implement `Pread` with an `fs.File`?
   905  
   906  `ReadAt` is the Go equivalent to `pread`: it does not affect, and is not
   907  affected by, the underlying file offset. Unfortunately, `io.ReaderAt` is not
   908  implemented by all `fs.File`. For example, as of Go 1.19, `embed.openFile` does
   909  not.
   910  
   911  The initial implementation of `fd_pread` instead used `Seek`. To avoid a
   912  regression, we fall back to `io.Seeker` when `io.ReaderAt` is not supported.
   913  
   914  This requires obtaining the initial file offset, seeking to the intended read
   915  offset, and resetting the file offset the initial state. If this final seek
   916  fails, the file offset is left in an undefined state. This is not thread-safe.
   917  
   918  While seeking per read seems expensive, the common case of `embed.openFile` is
   919  only accessing a single int64 field, which is cheap.
   920  
   921  ### Pre-opened files
   922  
   923  WASI includes `fd_prestat_get` and `fd_prestat_dir_name` functions used to
   924  learn any directory paths for file descriptors open at initialization time.
   925  
   926  For example, `__wasilibc_register_preopened_fd` scans any file descriptors past
   927  STDERR (1) and invokes `fd_prestat_dir_name` to learn any path prefixes they
   928  correspond to. Zig's `preopensAlloc` does similar. These pre-open functions are
   929  not used again after initialization.
   930  
   931  wazero supports stdio pre-opens followed by any mounts e.g `.:/`. The guest
   932  path is a directory and its name, e.g. "/" is returned by `fd_prestat_dir_name`
   933  for file descriptor 3 (STDERR+1). The first longest match wins on multiple
   934  pre-opens, which allows a path like "/tmp" to match regardless of order vs "/".
   935  
   936  See
   937   * https://github.com/WebAssembly/wasi-libc/blob/a02298043ff551ce1157bc2ee7ab74c3bffe7144/libc-bottom-half/sources/preopens.c
   938   * https://github.com/ziglang/zig/blob/9cb06f3b8bf9ea6b5e5307711bc97328762d6a1d/lib/std/fs/wasi.zig#L50-L53
   939  
   940  ### fd_prestat_dir_name
   941  
   942  `fd_prestat_dir_name` is a WASI function to return the path of the pre-opened
   943  directory of a file descriptor. It has the following three parameters, and the
   944  third `path_len` has ambiguous semantics.
   945  
   946  * `fd`: a file descriptor
   947  * `path`: the offset for the result path
   948  * `path_len`: In wazero, `FdPrestatDirName` writes the result path string to
   949    `path` offset for the exact length of `path_len`.
   950  
   951  Wasmer considers `path_len` to be the maximum length instead of the exact
   952  length that should be written.
   953  See https://github.com/wasmerio/wasmer/blob/3463c51268ed551933392a4063bd4f8e7498b0f6/lib/wasi/src/syscalls/mod.rs#L764
   954  
   955  The semantics in wazero follows that of wasmtime.
   956  See https://github.com/bytecodealliance/wasmtime/blob/2ca01ae9478f199337cf743a6ab543e8c3f3b238/crates/wasi-common/src/snapshots/preview_1.rs#L578-L582
   957  
   958  Their semantics match when `path_len` == the length of `path`, so in practice
   959  this difference won't matter match.
   960  
   961  ## fd_readdir
   962  
   963  ### Why does "wasi_snapshot_preview1" require dot entries when POSIX does not?
   964  
   965  In October 2019, WASI project knew requiring dot entries ("." and "..") was not
   966  documented in preview1, not required by POSIX and problematic to synthesize.
   967  For example, Windows runtimes backed by `FindNextFileW` could not return these.
   968  A year later, the tag representing WASI preview 1 (`snapshot-01`) was made.
   969  This did not include the requested change of making dot entries optional.
   970  
   971  The `phases/snapshot/docs.md` document was altered in subsequent years in
   972  significant ways, often in lock-step with wasmtime or wasi-libc. In January
   973  2022, `sock_accept` was added to `phases/snapshot/docs.md`, a document later
   974  renamed to later renamed to `legacy/preview1/docs.md`.
   975  
   976  As a result, the ABI and behavior remained unstable: The `snapshot-01` tag was
   977  not an effective basis of portability. A test suite was requested well before
   978  this tag, in April 2019. Meanwhile, compliance had no meaning. Developers had
   979  to track changes to the latest doc, while clarifying with wasi-libc or wasmtime
   980  behavior. This lack of stability could have permitted a fix to the dot entries
   981  problem, just as it permitted changes desired by other users.
   982  
   983  In November 2022, the wasi-testsuite project began and started solidifying
   984  expectations. This quickly led to changes in runtimes and the spec doc. WASI
   985  began importing tests from wasmtime as required behaviors for all runtimes.
   986  Some changes implied changes to wasi-libc. For example, `readdir` began to
   987  imply inode fan-outs, which caused performance regressions. Most notably a
   988  test merged in January required dot entries. Tests were merged without running
   989  against any runtime, and even when run ad-hoc only against Linux. Hence,
   990  portability issues mentioned over three years earlier did not trigger any
   991  failure until wazero (which tests Windows) noticed.
   992  
   993  In the same month, wazero requested to revert this change primarily because
   994  Go does not return them from `os.ReadDir`, and materializing them is
   995  complicated due to tests also requiring inodes. Moreover, they are discarded by
   996  not just Go, but other common programming languages. This was rejected by the
   997  WASI lead for preview1, but considered for the completely different ABI named
   998  preview2.
   999  
  1000  In February 2023, the WASI chair declared that new rule requiring preview1 to
  1001  return dot entries "was decided by the subgroup as a whole", citing meeting
  1002  notes. According to these notes, the WASI lead stated incorrectly that POSIX
  1003  conformance required returning dot entries, something it explicitly says are
  1004  optional. In other words, he said filtering them out would make Preview1
  1005  non-conforming, and asked if anyone objects to this. The co-chair was noted to
  1006  say "Because there are existing P1 programs, we shouldn’t make changes like
  1007  this." No other were recorded to say anything.
  1008  
  1009  In summary, preview1 was changed retrospectively to require dot entries and
  1010  preview2 was changed to require their absence. This rule was reverse engineered
  1011  from wasmtime tests, and affirmed on two false premises:
  1012  
  1013  * POSIX compliance requires dot entries
  1014    * POSIX literally says these are optional
  1015  * WASI cannot make changes because there are existing P1 programs.
  1016    * Changes to Preview 1 happened before and after this topic.
  1017  
  1018  As of June 2023, wasi-testsuite still only runs on Linux, so compliance of this
  1019  rule on Windows is left to runtimes to decide to validate. The preview2 adapter
  1020  uses fake cookies zero and one to refer to dot dirents, uses a real inode for
  1021  the dot(".") entry and zero inode for dot-dot("..").
  1022  
  1023  See https://github.com/WebAssembly/wasi-filesystem/issues/3
  1024  See https://github.com/WebAssembly/WASI/tree/snapshot-01
  1025  See https://github.com/WebAssembly/WASI/issues/9
  1026  See https://github.com/WebAssembly/WASI/pull/458
  1027  See https://github.com/WebAssembly/wasi-testsuite/pull/32
  1028  See https://github.com/WebAssembly/wasi-libc/pull/345
  1029  See https://github.com/WebAssembly/wasi-testsuite/issues/52
  1030  See https://github.com/WebAssembly/WASI/pull/516
  1031  See https://github.com/WebAssembly/meetings/blob/main/wasi/2023/WASI-02-09.md#should-preview1-fd_readdir-filter-out--and-
  1032  See https://github.com/bytecodealliance/preview2-prototyping/blob/e4c04bcfbd11c42c27c28984948d501a3e168121/crates/wasi-preview1-component-adapter/src/lib.rs#L1026-L1041
  1033  
  1034  ### Why are dot (".") and dot-dot ("..") entries problematic?
  1035  
  1036  When reading a directory, dot (".") and dot-dot ("..") entries are problematic.
  1037  For example, Go does not return them from `os.ReadDir`, and materializing them
  1038  is complicated (at least dot-dot is).
  1039  
  1040  A directory entry has stat information in it. The stat information includes
  1041  inode which is used for comparing file equivalence. In the simple case of dot,
  1042  we could materialize a special entry to expose the same info as stat on the fd
  1043  would return. However, doing this and not doing dot-dot would cause confusion,
  1044  and dot-dot is far more tricky. To back-fill inode information about a parent
  1045  directory would be costly and subtle. For example, the pre-open (mount) of the
  1046  directory may be different than its logical parent. This is easy to understand
  1047  when considering the common case of mounting "/" and "/tmp" as pre-opens. To
  1048  implement ".." from "/tmp" requires information from a separate pre-open, this
  1049  includes state to even know the difference. There are easier edge cases as
  1050  well, such as the decision to not return ".." from a root path. In any case,
  1051  this should start to explain that faking entries when underlying stdlib doesn't
  1052  return them is tricky and requires quite a lot of state.
  1053  
  1054  Another issue is around the `Dirent.Off` value of a directory entry, sometimes
  1055  called a "cookie" in Linux man pagers. When the host operating system or
  1056  library function does not return dot entries, to support functions such as
  1057  `seekdir`, you still need a value for `Dirent.Off`. Naively, you can synthesize
  1058  these by choosing sequential offsets zero and one. However, POSIX strictly says
  1059  offsets should be treated opaquely. The backing filesystem could use these to
  1060  represent real entries. For example, a directory with one entry could use zero
  1061  as the `Dirent.Off` value. If you also used zero for the "." dirent, there
  1062  would be a clash. This means if you synthesize `Dirent.Off` for any entry, you
  1063  need to synthesize this value for all entries. In practice, the simplest way is
  1064  using an incrementing number, such as done in the WASI preview2 adapter.
  1065  
  1066  Working around these issues causes expense to all users of wazero, so we'd
  1067  then look to see if that would be justified or not. However, the most common
  1068  compilers involved in end user questions, as of early 2023 are TinyGo, Rust and
  1069  Zig. All of these compile code which ignores dot and dot-dot entries. In other
  1070  words, faking these entries would not only cost our codebase with complexity,
  1071  but it would also add unnecessary overhead as the values aren't commonly used.
  1072  
  1073  The final reason why we might do this, is an end users or a specification
  1074  requiring us to. As of early 2023, no end user has raised concern over Go and
  1075  by extension wazero not returning dot and dot-dot. The snapshot-01 spec of WASI
  1076  does not mention anything on this point. Also, POSIX has the following to say,
  1077  which summarizes to "these are optional"
  1078  
  1079  > The readdir() function shall not return directory entries containing empty names. If entries for dot or dot-dot exist, one entry shall be returned for dot and one entry shall be returned for dot-dot; otherwise, they shall not be returned.
  1080  
  1081  Unfortunately, as described above, the WASI project decided in early 2023 to
  1082  require dot entries in both the spec and the wasi-testsuite. For only this
  1083  reason, wazero adds overhead to synthesize dot entries despite it being
  1084  unnecessary for most users.
  1085  
  1086  See https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html
  1087  See https://github.com/golang/go/blob/go1.20/src/os/dir_unix.go#L108-L110
  1088  See https://github.com/bytecodealliance/preview2-prototyping/blob/e4c04bcfbd11c42c27c28984948d501a3e168121/crates/wasi-preview1-component-adapter/src/lib.rs#L1026-L1041
  1089  
  1090  ### Why don't we pre-populate an inode for the dot-dot ("..") entry?
  1091  
  1092  We only populate an inode for dot (".") because wasi-testsuite requires it, and
  1093  we likely already have it (because we cache it). We could attempt to populate
  1094  one for dot-dot (".."), but chose not to.
  1095  
  1096  Firstly, wasi-testsuite does not require the inode of dot-dot, possibly because
  1097  the wasip2 adapter doesn't populate it (but we don't really know why).
  1098  
  1099  The only other reason to populate it would be to avoid wasi-libc's stat fanout
  1100  when it is missing. However, wasi-libc explicitly doesn't fan-out to lstat on
  1101  the ".." entry on a zero ino.
  1102  
  1103  Fetching dot-dot's inode despite the above not only doesn't help wasi-libc, but
  1104  it also hurts languages that don't use it, such as Go. These languages would
  1105  pay a stat syscall penalty even if they don't need the inode. In fact, Go
  1106  discards both dot entries!
  1107  
  1108  In summary, there are no significant upsides in attempting to pre-fetch
  1109  dot-dot's inode, and there are downsides to doing it anyway.
  1110  
  1111  See
  1112   * https://github.com/WebAssembly/wasi-libc/blob/bd950eb128bff337153de217b11270f948d04bb4/libc-bottom-half/cloudlibc/src/libc/dirent/readdir.c#L87-L94
  1113   * https://github.com/WebAssembly/wasi-testsuite/blob/main/tests/rust/src/bin/fd_readdir.rs#L108
  1114   * https://github.com/bytecodealliance/preview2-prototyping/blob/e4c04bcfbd11c42c27c28984948d501a3e168121/crates/wasi-preview1-component-adapter/src/lib.rs#L1037
  1115  
  1116  ### Why don't we require inodes to be non-zero?
  1117  
  1118  We don't require a non-zero value for `Dirent.Ino` because doing so can prevent
  1119  a real one from resolving later via `Stat_t.Ino`.
  1120  
  1121  We define `Ino` like `d_ino` in POSIX which doesn't special-case zero. It can
  1122  be zero for a few reasons:
  1123  
  1124  * The file is not a regular file or directory.
  1125  * The underlying filesystem does not support inodes. e.g. embed:fs
  1126  * A directory doesn't include inodes, but a later stat can. e.g. Windows
  1127  * The backend is based on wasi-filesystem (a.k.a wasip2), which has
  1128    `directory_entry.inode` optional, and might remove it entirely.
  1129  
  1130  There are other downsides to returning a zero inode in widely used compilers:
  1131  
  1132  * File equivalence utilities, like `os.SameFile` will not work.
  1133  * wasi-libc's `wasip1` mode will call `lstat` and attempt to retrieve a
  1134    non-zero value (unless the entry is named "..").
  1135  
  1136  A new compiler may accidentally skip a `Dirent` with a zero `Ino` if emulating
  1137  a non-POSIX function and re-using `Dirent.Ino` for `d_fileno`.
  1138  
  1139  * Linux `getdents` doesn't define `d_fileno` must be non-zero
  1140  * BSD `getdirentries` is implementation specific. For example, OpenBSD will
  1141    return dirents with a zero `d_fileno`, but Darwin will skip them.
  1142  
  1143  The above shouldn't be a problem, even in the case of BSD, because `wasip1` is
  1144  defined more in terms of `getdents` than `getdirentries`. The bottom half of
  1145  either should treat `wasip1` (or any similar ABI such as wasix or wasip2) as a
  1146  different operating system and either use different logic that doesn't skip, or
  1147  synthesize a fake non-zero `d_fileno` when `d_ino` is zero.
  1148  
  1149  However, this has been a problem. Go's `syscall.ParseDirent` utility is shared
  1150  for all `GOOS=unix`. For simplicity, this abstracts `direntIno` with data from
  1151  `d_fileno` or `d_ino`, and drops if either are zero, even if `d_fileno` is the
  1152  only field with zero explicitly defined. This led to a change to special case
  1153  `GOOS=wasip1` as otherwise virtual files would be unconditionally skipped.
  1154  
  1155  In practice, this problem is rather unique due to so many compilers relying on
  1156  wasi-libc, which tolerates a zero inode. For example, while issues were
  1157  reported about the performance regression when wasi-libc began doing a fan-out
  1158  on zero `Dirent.Ino`, no issues were reported about dirents being dropped as a
  1159  result.
  1160  
  1161  In summary, rather than complicating implementation and forcing non-zero inodes
  1162  for a rare case, we permit zero. We instead document this topic thoroughly, so
  1163  that emerging compilers can re-use the research and reference it on conflict.
  1164  We also document that `Ino` should be non-zero, so that users implementing that
  1165  field will attempt to get it.
  1166  
  1167  See
  1168   * https://github.com/WebAssembly/wasi-filesystem/pull/81
  1169   * https://github.com/WebAssembly/wasi-libc/blob/bd950eb128bff337153de217b11270f948d04bb4/libc-bottom-half/cloudlibc/src/libc/dirent/readdir.c#L87-L94
  1170   * https://linux.die.net/man/3/getdents
  1171   * https://www.unix.com/man-page/osx/2/getdirentries/
  1172   * https://man.openbsd.org/OpenBSD-5.4/getdirentries.2
  1173   * https://github.com/golang/go/blob/go1.20/src/syscall/dirent.go#L60-L102
  1174   * https://go-review.googlesource.com/c/go/+/507915
  1175  
  1176  ## sys.Walltime and Nanotime
  1177  
  1178  The `sys` package has two function types, `Walltime` and `Nanotime` for real
  1179  and monotonic clock exports. The naming matches conventions used in Go.
  1180  
  1181  ```go
  1182  func time_now() (sec int64, nsec int32, mono int64) {
  1183  	sec, nsec = walltime()
  1184  	return sec, nsec, nanotime()
  1185  }
  1186  ```
  1187  
  1188  Splitting functions for wall and clock time allow implementations to choose
  1189  whether to implement the clock once (as in Go), or split them out.
  1190  
  1191  Each can be configured with a `ClockResolution`, although is it usually
  1192  incorrect as detailed in a sub-heading below. The only reason for exposing this
  1193  is to satisfy WASI:
  1194  
  1195  See https://github.com/WebAssembly/wasi-clocks
  1196  
  1197  ### Why default to fake time?
  1198  
  1199  WebAssembly has an implicit design pattern of capabilities based security. By
  1200  defaulting to a fake time, we reduce the chance of timing attacks, at the cost
  1201  of requiring configuration to opt-into real clocks.
  1202  
  1203  See https://gruss.cc/files/fantastictimers.pdf for an example attacks.
  1204  
  1205  ### Why does fake time increase on reading?
  1206  
  1207  Both the fake nanotime and walltime increase by 1ms on reading. Particularly in
  1208  the case of nanotime, this prevents spinning. For example, when Go compiles
  1209  `time.Sleep` using `GOOS=js GOARCH=wasm`, nanotime is used in a loop. If that
  1210  never increases, the gouroutine is mistaken for being busy. This would be worse
  1211  if a compiler implement sleep using nanotime, yet doesn't check for spinning!
  1212  
  1213  ### Why not `time.Clock`?
  1214  
  1215  wazero can't use `time.Clock` as a plugin for clock implementation as it is
  1216  only substitutable with build flags (`faketime`) and conflates wall and
  1217  monotonic time in the same call.
  1218  
  1219  Go's `time.Clock` was added monotonic time after the fact. For portability with
  1220  prior APIs, a decision was made to combine readings into the same API call.
  1221  
  1222  See https://go.googlesource.com/proposal/+/master/design/12914-monotonic.md
  1223  
  1224  WebAssembly time imports do not have the same concern. In fact even Go's
  1225  imports for clocks split walltime from nanotime readings.
  1226  
  1227  See https://github.com/golang/go/blob/go1.20/misc/wasm/wasm_exec.js#L243-L255
  1228  
  1229  Finally, Go's clock is not an interface. WebAssembly users who want determinism
  1230  or security need to be able to substitute an alternative clock implementation
  1231  from the host process one.
  1232  
  1233  ### `ClockResolution`
  1234  
  1235  A clock's resolution is hardware and OS dependent so requires a system call to retrieve an accurate value.
  1236  Go does not provide a function for getting resolution, so without CGO we don't have an easy way to get an actual
  1237  value. For now, we return fixed values of 1us for realtime and 1ns for monotonic, assuming that realtime clocks are
  1238  often lower precision than monotonic clocks. In the future, this could be improved by having OS+arch specific assembly
  1239  to make syscalls.
  1240  
  1241  For example, Go implements time.Now for linux-amd64 with this [assembly](https://github.com/golang/go/blob/go1.20/src/runtime/time_linux_amd64.s).
  1242  Because retrieving resolution is not generally called often, unlike getting time, it could be appropriate to only
  1243  implement the fallback logic that does not use VDSO (executing syscalls in user mode). The syscall for clock_getres
  1244  is 229 and should be usable. https://pkg.go.dev/syscall#pkg-constants.
  1245  
  1246  If implementing similar for Windows, [mingw](https://github.com/mirror/mingw-w64/blob/6a0e9165008f731bccadfc41a59719cf7c8efc02/mingw-w64-libraries/winpthreads/src/clock.c#L77
  1247  ) is often a good source to find the Windows API calls that correspond
  1248  to a POSIX method.
  1249  
  1250  Writing assembly would allow making syscalls without CGO, but comes with the cost that it will require implementations
  1251  across many combinations of OS and architecture.
  1252  
  1253  ## sys.Nanosleep
  1254  
  1255  All major programming languages have a `sleep` mechanism to block for a
  1256  duration. Sleep is typically implemented by a WASI `poll_oneoff` relative clock
  1257  subscription.
  1258  
  1259  For example, the below ends up calling `wasi_snapshot_preview1.poll_oneoff`:
  1260  
  1261  ```zig
  1262  const std = @import("std");
  1263  pub fn main() !void {
  1264      std.time.sleep(std.time.ns_per_s * 5);
  1265  }
  1266  ```
  1267  
  1268  Besides Zig, this is also the case with TinyGo (`-target=wasi`) and Rust
  1269  (`--target wasm32-wasi`). This isn't the case with Go (`GOOS=js GOARCH=wasm`),
  1270  though. In the latter case, wasm loops on `sys.Nanotime`.
  1271  
  1272  We decided to expose `sys.Nanosleep` to allow overriding the implementation
  1273  used in the common case, even if it isn't used by Go, because this gives an
  1274  easy and efficient closure over a common program function. We also documented
  1275  `sys.Nanotime` to warn users that some compilers don't optimize sleep.
  1276  
  1277  ## sys.Osyield
  1278  
  1279  We expose `sys.Osyield`, to allow users to control the behavior of WASI's
  1280  `sched_yield` without a new build of wazero. This is mainly for parity with
  1281  all other related features which we allow users to implement, including
  1282  `sys.Nanosleep`. Unlike others, we don't provide an out-of-box implementation
  1283  primarily because it will cause performance problems when accessed.
  1284  
  1285  For example, the below implementation uses CGO, which might result in a 1us
  1286  delay per invocation depending on the platform.
  1287  
  1288  See https://github.com/golang/go/issues/19409#issuecomment-284788196
  1289  ```go
  1290  //go:noescape
  1291  //go:linkname osyield runtime.osyield
  1292  func osyield()
  1293  ```
  1294  
  1295  In practice, a request to customize this is unlikely to happen until other
  1296  thread based functions are implemented. That said, as of early 2023, there are
  1297  a few signs of implementation interest and cross-referencing:
  1298  
  1299  See https://github.com/WebAssembly/stack-switching/discussions/38
  1300  See https://github.com/WebAssembly/wasi-threads#what-can-be-skipped
  1301  See https://slinkydeveloper.com/Kubernetes-controllers-A-New-Hope/
  1302  
  1303  ## sys.Stat_t
  1304  
  1305  We expose `stat` information as `sys.Stat_t`, like `syscall.Stat_t` except
  1306  defined without build constraints. For example, you can use `sys.Stat_t` on
  1307  `GOOS=windows` which doesn't define `syscall.Stat_t`.
  1308  
  1309  The first use case of this is to return inodes from `fs.FileInfo` without
  1310  relying on platform-specifics. For example, a user could return `*sys.Stat_t`
  1311  from `info.Sys()` and define a non-zero inode for a virtual file, or map a
  1312  real inode to a virtual one.
  1313  
  1314  Notable choices per field are listed below, where `sys.Stat_t` is unlike
  1315  `syscall.Stat_t` on `GOOS=linux`, or needs clarification. One common issue
  1316  not repeated below is that numeric fields are 64-bit when at least one platform
  1317  defines it that large. Also, zero values are equivalent to nil or absent.
  1318  
  1319  * `Dev` and `Ino` (`Inode`) are both defined unsigned as they are defined
  1320    opaque, and most `syscall.Stat_t` also defined them unsigned. There are
  1321    separate sections in this document discussing the impact of zero in `Ino`.
  1322  * `Mode` is defined as a `fs.FileMode` even though that is not defined in POSIX
  1323    and will not map to all possible values. This is because the current use is
  1324    WASI, which doesn't define any types or features not already supported. By
  1325    using `fs.FileMode`, we can re-use routine experience in Go.
  1326  * `NLink` is unsigned because it is defined that way in `syscall.Stat_t`: there
  1327    can never be less than zero links to a file. We suggest defaulting to 1 in
  1328    conversions when information is not knowable because at least that many links
  1329    exist.
  1330  * `Size` is signed because it is defined that way in `syscall.Stat_t`: while
  1331    regular files and directories will always be non-negative, irregular files
  1332    are possibly negative or not defined. Notably sparse files are known to
  1333    return negative values.
  1334  * `Atim`, `Mtim` and `Ctim` are signed because they are defined that way in
  1335    `syscall.Stat_t`: Negative values are time before 1970. The resolution is
  1336    nanosecond because that's the maximum resolution currently supported in Go.
  1337  
  1338  ### Why do we use `sys.EpochNanos` instead of `time.Time` or similar?
  1339  
  1340  To simplify documentation, we defined a type alias `sys.EpochNanos` for int64.
  1341  `time.Time` is a data structure, and we could have used this for
  1342  `syscall.Stat_t` time values. The most important reason we do not is conversion
  1343  penalty deriving time from common types.
  1344  
  1345  The most common ABI used in `wasip2`. This, and compatible ABI such as `wasix`,
  1346  encode timestamps in memory as a 64-bit number. If we used `time.Time`, we
  1347  would have to convert an underlying type like `syscall.Timespec` to `time.Time`
  1348  only to later have to call `.UnixNano()` to convert it back to a 64-bit number.
  1349  
  1350  In the future, the component model module "wasi-filesystem" may represent stat
  1351  timestamps with a type shared with "wasi-clocks", abstractly structured similar
  1352  to `time.Time`. However, component model intentionally does not define an ABI.
  1353  It is likely that the canonical ABI for timestamp will be in two parts, but it
  1354  is not required for it to be intermediately represented this way. A utility
  1355  like `syscall.NsecToTimespec` could split an int64 so that it could be written
  1356  to memory as 96 bytes (int64, int32), without allocating a struct.
  1357  
  1358  Finally, some may confuse epoch nanoseconds with 32-bit epoch seconds. While
  1359  32-bit epoch seconds has "The year 2038" problem, epoch nanoseconds has
  1360  "The Year 2262" problem, which is even less concerning for this library. If
  1361  the Go programming language and wazero exist in the 2200's, we can make a major
  1362  version increment to adjust the `sys.EpochNanos` approach. Meanwhile, we have
  1363  faster code.
  1364  
  1365  ## poll_oneoff
  1366  
  1367  `poll_oneoff` is a WASI API for waiting for I/O events on multiple handles.
  1368  It is conceptually similar to the POSIX `poll(2)` syscall.
  1369  The name is not `poll`, because it references [“the fact that this function is not efficient
  1370  when used repeatedly with the same large set of handles”][poll_oneoff].
  1371  
  1372  We chose to support this API in a handful of cases that work for regular files
  1373  and standard input. We currently do not support other types of file descriptors such
  1374  as socket handles.
  1375  
  1376  ### Clock Subscriptions
  1377  
  1378  As detailed above in [sys.Nanosleep](#sysnanosleep), `poll_oneoff` handles
  1379  relative clock subscriptions. In our implementation we use `sys.Nanosleep()`
  1380  for this purpose in most cases, except when polling for interactive input
  1381  from `os.Stdin` (see more details below).
  1382  
  1383  ### FdRead and FdWrite Subscriptions
  1384  
  1385  When subscribing a file descriptor (except `Stdin`) for reads or writes,
  1386  the implementation will generally return immediately with success, unless
  1387  the file descriptor is unknown. The file descriptor is not checked further
  1388  for new incoming data. Any timeout is cancelled, and the API call is able
  1389  to return, unless there are subscriptions to `Stdin`: these are handled
  1390  separately.
  1391  
  1392  ### FdRead and FdWrite Subscription to Stdin
  1393  
  1394  Subscribing `Stdin` for reads (writes make no sense and cause an error),
  1395  requires extra care: wazero allows to configure a custom reader for `Stdin`.
  1396  
  1397  In general, if a custom reader is found, the behavior will be the same
  1398  as for regular file descriptors: data is assumed to be present and
  1399  a success is written back to the result buffer.
  1400  
  1401  However, if the reader is detected to read from `os.Stdin`,
  1402  a special code path is followed, invoking `sysfs.poll()`.
  1403  
  1404  `sysfs.poll()` is a wrapper for `poll(2)` on POSIX systems,
  1405  and it is emulated on Windows.
  1406  
  1407  ### Poll on POSIX
  1408  
  1409  On POSIX systems, `poll(2)` allows to wait for incoming data on a file
  1410  descriptor, and block until either data becomes available or the timeout
  1411  expires.
  1412  
  1413  Usage of `syfs.poll()` is currently only reserved for standard input, because
  1414  
  1415  1. it is really only necessary to handle interactive input: otherwise,
  1416     there is no way in Go to peek from Standard Input without actually
  1417     reading (and thus consuming) from it;
  1418  
  1419  2. if `Stdin` is connected to a pipe, it is ok in most cases to return
  1420     with success immediately;
  1421  
  1422  3. `syfs.poll()` is currently a blocking call, irrespective of goroutines,
  1423     because the underlying syscall is; thus, it is better to limit its usage.
  1424  
  1425  So, if the subscription is for `os.Stdin` and the handle is detected
  1426  to correspond to an interactive session, then `sysfs.poll()` will be
  1427  invoked with a the `Stdin` handle *and* the timeout.
  1428  
  1429  This also means that in this specific case, the timeout is uninterruptible,
  1430  unless data becomes available on `Stdin` itself.
  1431  
  1432  ### Select on Windows
  1433  
  1434  On Windows `sysfs.poll()` cannot be delegated to a single
  1435  syscall, because there is no single syscall to handle sockets,
  1436  pipes and regular files.
  1437  
  1438  Instead, we emulate its behavior for the cases that are currently
  1439  of interest.
  1440  
  1441  - For regular files, we _always_ report them as ready, as
  1442  [most operating systems do anyway][async-io-windows].
  1443  
  1444  - For pipes, we invoke [`PeekNamedPipe`][peeknamedpipe]
  1445  for each file handle we detect is a pipe open for reading.
  1446  We currently ignore pipes open for writing.
  1447  
  1448  - Notably, we include also support for sockets using the [WinSock
  1449  implementation of `poll`][wsapoll], but instead
  1450  of relying on the timeout argument of the `WSAPoll` function,
  1451  we set a 0-duration timeout so that it behaves like a peek.
  1452  
  1453  This way, we can check for regular files all at once,
  1454  at the beginning of the function, then we poll pipes and
  1455  sockets periodically using a cancellable `time.Tick`,
  1456  which plays nicely with the rest of the Go runtime.
  1457  
  1458  ### Impact of blocking
  1459  
  1460  Because this is a blocking syscall, it will also block the carrier thread of
  1461  the goroutine, preventing any means to support context cancellation directly.
  1462  
  1463  There are ways to obviate this issue. We outline here one idea, that is however
  1464  not currently implemented. A common approach to support context cancellation is
  1465  to add a signal file descriptor to the set, e.g. the read-end of a pipe or an
  1466  eventfd on Linux. When the context is canceled, we may unblock a Select call by
  1467  writing to the fd, causing it to return immediately. This however requires to
  1468  do a bit of housekeeping to hide the "special" FD from the end-user.
  1469  
  1470  [poll_oneoff]: https://github.com/WebAssembly/wasi-poll#why-is-the-function-called-poll_oneoff
  1471  [async-io-windows]: https://tinyclouds.org/iocp_links
  1472  [peeknamedpipe]: https://learn.microsoft.com/en-us/windows/win32/api/namedpipeapi/nf-namedpipeapi-peeknamedpipe
  1473  [wsapoll]: https://learn.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsapoll
  1474  
  1475  ## Signed encoding of integer global constant initializers
  1476  
  1477  wazero treats integer global constant initializers signed as their interpretation is not known at declaration time. For
  1478  example, there is no signed integer [value type](https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#value-types%E2%91%A0).
  1479  
  1480  To get at the problem, let's use an example.
  1481  ```
  1482  (global (export "start_epoch") i64 (i64.const 1620216263544))
  1483  ```
  1484  
  1485  In both signed and unsigned LEB128 encoding, this value is the same bit pattern. The problem is that some numbers are
  1486  not. For example, 16256 is `807f` encoded as unsigned, but `80ff00` encoded as signed.
  1487  
  1488  While the specification mentions uninterpreted integers are in abstract [unsigned values](https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#integers%E2%91%A0),
  1489  the binary encoding is clear that they are encoded [signed](https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#integers%E2%91%A4).
  1490  
  1491  For consistency, we go with signed encoding in the special case of global constant initializers.
  1492  
  1493  ## Implementation limitations
  1494  
  1495  WebAssembly 1.0 (20191205) specification allows runtimes to [limit certain aspects of Wasm module or execution](https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#a2-implementation-limitations).
  1496  
  1497  wazero limitations are imposed pragmatically and described below.
  1498  
  1499  ### Number of functions in a module
  1500  
  1501  The possible number of function instances in [a module](https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#module-instances%E2%91%A0) is not specified in the WebAssembly specifications since [`funcaddr`](https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#syntax-funcaddr) corresponding to a function instance in a store can be arbitrary number.
  1502  wazero limits the maximum function instances to 2^27 as even that number would occupy 1GB in function pointers.
  1503  
  1504  That is because not only we _believe_ that all use cases are fine with the limitation, but also we have no way to test wazero runtimes under these unusual circumstances.
  1505  
  1506  ### Number of function types in a store
  1507  
  1508  There's no limitation on the number of function types in [a store](https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#store%E2%91%A0) according to the spec. In wazero implementation, we assign each function type to a unique ID, and choose to use `uint32` to represent the IDs.
  1509  Therefore the maximum number of function types a store can have is limited to 2^27 as even that number would occupy 512MB just to reference the function types.
  1510  
  1511  This is due to the same reason for the limitation on the number of functions above.
  1512  
  1513  ### Number of values on the stack in a function
  1514  
  1515  While the the spec does not clarify a limitation of function stack values, wazero limits this to 2^27 = 134,217,728.
  1516  The reason is that we internally represent all the values as 64-bit integers regardless of its types (including f32, f64), and 2^27 values means
  1517  1 GiB = (2^30). 1 GiB is the reasonable for most applications [as we see a Goroutine has 250 MB as a limit on the stack for 32-bit arch](https://github.com/golang/go/blob/go1.20/src/runtime/proc.go#L152-L159), considering that WebAssembly is (currently) 32-bit environment.
  1518  
  1519  All the functions are statically analyzed at module instantiation phase, and if a function can potentially reach this limit, an error is returned.
  1520  
  1521  ### Number of globals in a module
  1522  
  1523  Theoretically, a module can declare globals (including imports) up to 2^32 times. However, wazero limits this to  2^27(134,217,728) per module.
  1524  That is because internally we store globals in a slice with pointer types (meaning 8 bytes on 64-bit platforms), and therefore 2^27 globals
  1525  means that we have 1 GiB size of slice which seems large enough for most applications.
  1526  
  1527  ### Number of tables in a module
  1528  
  1529  While the the spec says that a module can have up to 2^32 tables, wazero limits this to 2^27 = 134,217,728.
  1530  One of the reasons is even that number would occupy 1GB in the pointers tables alone. Not only that, we access tables slice by
  1531  table index by using 32-bit signed offset in the compiler implementation, which means that the table index of 2^27 can reach 2^27 * 8 (pointer size on 64-bit machines) = 2^30 offsets in bytes.
  1532  
  1533  We _believe_ that all use cases are fine with the limitation, but also note that we have no way to test wazero runtimes under these unusual circumstances.
  1534  
  1535  If a module reaches this limit, an error is returned at the compilation phase.
  1536  
  1537  ## Compiler engine implementation
  1538  
  1539  See [compiler/RATIONALE.md](internal/engine/compiler/RATIONALE.md).
  1540  
  1541  ## Golang patterns
  1542  
  1543  ### Hammer tests
  1544  Code that uses concurrency primitives, such as locks or atomics, should include "hammer tests", which run large loops
  1545  inside a bounded amount of goroutines, run by half that many `GOMAXPROCS`. These are named consistently "hammer", so
  1546  they are easy to find. The name inherits from some existing tests in [golang/go](https://github.com/golang/go/search?q=hammer&type=code).
  1547  
  1548  Here is an annotated description of the key pieces of a hammer test:
  1549  1. `P` declares the count of goroutines to use, defaulting to 8 or 4 if `testing.Short`.
  1550     * Half this amount are the cores used, and 4 is less than a modern laptop's CPU. This allows multiple "hammer" tests to run in parallel.
  1551  2. `N` declares the scale of work (loop) per goroutine, defaulting to value that finishes in ~0.1s on a modern laptop.
  1552     * When in doubt, try 1000 or 100 if `testing.Short`
  1553     * Remember, there are multiple hammer tests and CI nodes are slow. Slower tests hurt feedback loops.
  1554  3. `defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(P/2))` makes goroutines switch cores, testing visibility of shared data.
  1555  4. To ensure goroutines execute at the same time, block them with `sync.WaitGroup`, initialized to `Add(P)`.
  1556     * `sync.WaitGroup` internally uses `runtime_Semacquire` not available in any other library.
  1557     * `sync.WaitGroup.Add` with a negative value can unblock many goroutines at the same time, e.g. without a for loop.
  1558  5. Track goroutines progress via `finished := make(chan int)` where each goroutine in `P` defers `finished <- 1`.
  1559     1. Tests use `require.XXX`, so `recover()` into `t.Fail` in a `defer` function before `finished <- 1`.
  1560        * This makes it easier to spot larger concurrency problems as you see each failure, not just the first.
  1561     2. After the `defer` function, await unblocked, then run the stateful function `N` times in a normal loop.
  1562        * This loop should trigger shared state problems as locks or atomics are contended by `P` goroutines.
  1563  6. After all `P` goroutines launch, atomically release all of them with `WaitGroup.Add(-P)`.
  1564  7. Block the runner on goroutine completion, by (`<-finished`) for each `P`.
  1565  8. When all goroutines complete, `return` if `t.Failed()`, otherwise perform follow-up state checks.
  1566  
  1567  This is implemented in wazero in [hammer.go](internal/testing/hammer/hammer.go)
  1568  
  1569  ### Lock-free, cross-goroutine observations of updates
  1570  
  1571  How to achieve cross-goroutine reads of a variable are not explicitly defined in https://go.dev/ref/mem. wazero uses
  1572  atomics to implement this following unofficial practice. For example, a `Close` operation can be guarded to happen only
  1573  once via compare-and-swap (CAS) against a zero value. When we use this pattern, we consistently use atomics to both
  1574  read and update the same numeric field.
  1575  
  1576  In lieu of formal documentation, we infer this pattern works from other sources (besides tests):
  1577   * `sync.WaitGroup` by definition must support calling `Add` from other goroutines. Internally, it uses atomics.
  1578   * rsc in golang/go#5045 writes "atomics guarantee sequential consistency among the atomic variables".
  1579  
  1580  See https://github.com/golang/go/blob/go1.20/src/sync/waitgroup.go#L64
  1581  See https://github.com/golang/go/issues/5045#issuecomment-252730563
  1582  See https://www.youtube.com/watch?v=VmrEG-3bWyM