gvisor.dev/gvisor@v0.0.0-20240520182842-f9d4d51c7e0f/pkg/sentry/vfs/README.md (about)

     1  # The gVisor Virtual Filesystem
     2  
     3  ## Implementation Notes
     4  
     5  ### Reference Counting
     6  
     7  Filesystem, Dentry, Mount, MountNamespace, and FileDescription are all
     8  reference-counted. Mount and MountNamespace are exclusively VFS-managed; when
     9  their reference count reaches zero, VFS releases their resources. Filesystem and
    10  FileDescription management is shared between VFS and filesystem implementations;
    11  when their reference count reaches zero, VFS notifies the implementation by
    12  calling `FilesystemImpl.Release()` or `FileDescriptionImpl.Release()`
    13  respectively and then releases VFS-owned resources. Dentries are exclusively
    14  managed by filesystem implementations; reference count changes are abstracted
    15  through DentryImpl, which should release resources when reference count reaches
    16  zero.
    17  
    18  Filesystem references are held by:
    19  
    20  -   Mount: Each referenced Mount holds a reference on the mounted Filesystem.
    21  
    22  Dentry references are held by:
    23  
    24  -   FileDescription: Each referenced FileDescription holds a reference on the
    25      Dentry through which it was opened, via `FileDescription.vd.dentry`.
    26  
    27  -   Mount: Each referenced Mount holds a reference on its mount point and on the
    28      mounted filesystem root. The mount point is mutable (`mount(MS_MOVE)`).
    29  
    30  Mount references are held by:
    31  
    32  -   FileDescription: Each referenced FileDescription holds a reference on the
    33      Mount on which it was opened, via `FileDescription.vd.mount`.
    34  
    35  -   Mount: Each referenced Mount holds a reference on its parent, which is the
    36      mount containing its mount point.
    37  
    38  -   VirtualFilesystem: A reference is held on each Mount that has been connected
    39      to a mount point, but not yet umounted.
    40  
    41  MountNamespace and FileDescription references are held by users of VFS. The
    42  expectation is that each `kernel.Task` holds a reference on its corresponding
    43  MountNamespace, and each file descriptor holds a reference on its represented
    44  FileDescription.
    45  
    46  Notes:
    47  
    48  -   Dentries do not hold a reference on their owning Filesystem. Instead, all
    49      uses of a Dentry occur in the context of a Mount, which holds a reference on
    50      the relevant Filesystem (see e.g. the VirtualDentry type). As a corollary,
    51      when releasing references on both a Dentry and its corresponding Mount, the
    52      Dentry's reference must be released first (because releasing the Mount's
    53      reference may release the last reference on the Filesystem, whose state may
    54      be required to release the Dentry reference).
    55  
    56  ### The Inheritance Pattern
    57  
    58  Filesystem, Dentry, and FileDescription are all concepts featuring both state
    59  that must be shared between VFS and filesystem implementations, and operations
    60  that are implementation-defined. To facilitate this, each of these three
    61  concepts follows the same pattern, shown below for Dentry:
    62  
    63  ```go
    64  // Dentry represents a node in a filesystem tree.
    65  type Dentry struct {
    66    // VFS-required dentry state.
    67    parent *Dentry
    68    // ...
    69  
    70    // impl is the DentryImpl associated with this Dentry. impl is immutable.
    71    // This should be the last field in Dentry.
    72    impl DentryImpl
    73  }
    74  
    75  // Init must be called before first use of d.
    76  func (d *Dentry) Init(impl DentryImpl) {
    77    d.impl = impl
    78  }
    79  
    80  // Impl returns the DentryImpl associated with d.
    81  func (d *Dentry) Impl() DentryImpl {
    82    return d.impl
    83  }
    84  
    85  // DentryImpl contains implementation-specific details of a Dentry.
    86  // Implementations of DentryImpl should contain their associated Dentry by
    87  // value as their first field.
    88  type DentryImpl interface {
    89    // VFS-required implementation-defined dentry operations.
    90    IncRef()
    91    // ...
    92  }
    93  ```
    94  
    95  This construction, which is essentially a type-safe analogue to Linux's
    96  `container_of` pattern, has the following properties:
    97  
    98  -   VFS works almost exclusively with pointers to Dentry rather than DentryImpl
    99      interface objects, such as in the type of `Dentry.parent`. This avoids
   100      interface method calls (which are somewhat expensive to perform, and defeat
   101      inlining and escape analysis), reduces the size of VFS types (since an
   102      interface object is two pointers in size), and allows pointers to be loaded
   103      and stored atomically using `sync/atomic`. Implementation-defined behavior
   104      is accessed via `Dentry.impl` when required.
   105  
   106  -   Filesystem implementations can access the implementation-defined state
   107      associated with objects of VFS types by type-asserting or type-switching
   108      (e.g. `Dentry.Impl().(*myDentry)`). Type assertions to a concrete type
   109      require only an equality comparison of the interface object's type pointer
   110      to a static constant, and are consequently very fast.
   111  
   112  -   Filesystem implementations can access the VFS state associated with objects
   113      of implementation-defined types directly.
   114  
   115  -   VFS and implementation-defined state for a given type occupy the same
   116      object, minimizing memory allocations and maximizing memory locality. `impl`
   117      is the last field in `Dentry`, and `Dentry` is the first field in
   118      `DentryImpl` implementations, for similar reasons: this tends to cause
   119      fetching of the `Dentry.impl` interface object to also fetch `DentryImpl`
   120      fields, either because they are in the same cache line or via next-line
   121      prefetching.