github.com/SagerNet/gvisor@v0.0.0-20210707092255-7731c139d75c/pkg/sentry/vfs/README.md (about) 1 # The gVisor Virtual Filesystem 2 3 THIS PACKAGE IS CURRENTLY EXPERIMENTAL AND NOT READY OR ENABLED FOR PRODUCTION 4 USE. For the filesystem implementation currently used by gVisor, see the `fs` 5 package. 6 7 ## Implementation Notes 8 9 ### Reference Counting 10 11 Filesystem, Dentry, Mount, MountNamespace, and FileDescription are all 12 reference-counted. Mount and MountNamespace are exclusively VFS-managed; when 13 their reference count reaches zero, VFS releases their resources. Filesystem and 14 FileDescription management is shared between VFS and filesystem implementations; 15 when their reference count reaches zero, VFS notifies the implementation by 16 calling `FilesystemImpl.Release()` or `FileDescriptionImpl.Release()` 17 respectively and then releases VFS-owned resources. Dentries are exclusively 18 managed by filesystem implementations; reference count changes are abstracted 19 through DentryImpl, which should release resources when reference count reaches 20 zero. 21 22 Filesystem references are held by: 23 24 - Mount: Each referenced Mount holds a reference on the mounted Filesystem. 25 26 Dentry references are held by: 27 28 - FileDescription: Each referenced FileDescription holds a reference on the 29 Dentry through which it was opened, via `FileDescription.vd.dentry`. 30 31 - Mount: Each referenced Mount holds a reference on its mount point and on the 32 mounted filesystem root. The mount point is mutable (`mount(MS_MOVE)`). 33 34 Mount references are held by: 35 36 - FileDescription: Each referenced FileDescription holds a reference on the 37 Mount on which it was opened, via `FileDescription.vd.mount`. 38 39 - Mount: Each referenced Mount holds a reference on its parent, which is the 40 mount containing its mount point. 41 42 - VirtualFilesystem: A reference is held on each Mount that has been connected 43 to a mount point, but not yet umounted. 44 45 MountNamespace and FileDescription references are held by users of VFS. The 46 expectation is that each `kernel.Task` holds a reference on its corresponding 47 MountNamespace, and each file descriptor holds a reference on its represented 48 FileDescription. 49 50 Notes: 51 52 - Dentries do not hold a reference on their owning Filesystem. Instead, all 53 uses of a Dentry occur in the context of a Mount, which holds a reference on 54 the relevant Filesystem (see e.g. the VirtualDentry type). As a corollary, 55 when releasing references on both a Dentry and its corresponding Mount, the 56 Dentry's reference must be released first (because releasing the Mount's 57 reference may release the last reference on the Filesystem, whose state may 58 be required to release the Dentry reference). 59 60 ### The Inheritance Pattern 61 62 Filesystem, Dentry, and FileDescription are all concepts featuring both state 63 that must be shared between VFS and filesystem implementations, and operations 64 that are implementation-defined. To facilitate this, each of these three 65 concepts follows the same pattern, shown below for Dentry: 66 67 ```go 68 // Dentry represents a node in a filesystem tree. 69 type Dentry struct { 70 // VFS-required dentry state. 71 parent *Dentry 72 // ... 73 74 // impl is the DentryImpl associated with this Dentry. impl is immutable. 75 // This should be the last field in Dentry. 76 impl DentryImpl 77 } 78 79 // Init must be called before first use of d. 80 func (d *Dentry) Init(impl DentryImpl) { 81 d.impl = impl 82 } 83 84 // Impl returns the DentryImpl associated with d. 85 func (d *Dentry) Impl() DentryImpl { 86 return d.impl 87 } 88 89 // DentryImpl contains implementation-specific details of a Dentry. 90 // Implementations of DentryImpl should contain their associated Dentry by 91 // value as their first field. 92 type DentryImpl interface { 93 // VFS-required implementation-defined dentry operations. 94 IncRef() 95 // ... 96 } 97 ``` 98 99 This construction, which is essentially a type-safe analogue to Linux's 100 `container_of` pattern, has the following properties: 101 102 - VFS works almost exclusively with pointers to Dentry rather than DentryImpl 103 interface objects, such as in the type of `Dentry.parent`. This avoids 104 interface method calls (which are somewhat expensive to perform, and defeat 105 inlining and escape analysis), reduces the size of VFS types (since an 106 interface object is two pointers in size), and allows pointers to be loaded 107 and stored atomically using `sync/atomic`. Implementation-defined behavior 108 is accessed via `Dentry.impl` when required. 109 110 - Filesystem implementations can access the implementation-defined state 111 associated with objects of VFS types by type-asserting or type-switching 112 (e.g. `Dentry.Impl().(*myDentry)`). Type assertions to a concrete type 113 require only an equality comparison of the interface object's type pointer 114 to a static constant, and are consequently very fast. 115 116 - Filesystem implementations can access the VFS state associated with objects 117 of implementation-defined types directly. 118 119 - VFS and implementation-defined state for a given type occupy the same 120 object, minimizing memory allocations and maximizing memory locality. `impl` 121 is the last field in `Dentry`, and `Dentry` is the first field in 122 `DentryImpl` implementations, for similar reasons: this tends to cause 123 fetching of the `Dentry.impl` interface object to also fetch `DentryImpl` 124 fields, either because they are in the same cache line or via next-line 125 prefetching. 126 127 ## Future Work 128 129 - Most `mount(2)` features, and unmounting, are incomplete. 130 131 - VFS1 filesystems are not directly compatible with VFS2. It may be possible 132 to implement shims that implement `vfs.FilesystemImpl` for 133 `fs.MountNamespace`, `vfs.DentryImpl` for `fs.Dirent`, and 134 `vfs.FileDescriptionImpl` for `fs.File`, which may be adequate for 135 filesystems that are not performance-critical (e.g. sysfs); however, it is 136 not clear that this will be less effort than simply porting the filesystems 137 in question. Practically speaking, the following filesystems will probably 138 need to be ported or made compatible through a shim to evaluate filesystem 139 performance on realistic workloads: 140 141 - devfs/procfs/sysfs, which will realistically be necessary to execute 142 most applications. (Note that procfs and sysfs do not support hard 143 links, so they do not require the complexity of separate inode objects. 144 Also note that Linux's /dev is actually a variant of tmpfs called 145 devtmpfs.) 146 147 - tmpfs. This should be relatively straightforward: copy/paste memfs, 148 store regular file contents in pgalloc-allocated memory instead of 149 `[]byte`, and add support for file timestamps. (In fact, it probably 150 makes more sense to convert memfs to tmpfs and not keep the former.) 151 152 - A remote filesystem, either lisafs (if it is ready by the time that 153 other benchmarking prerequisites are) or v9fs (aka 9P, aka gofers). 154 155 - epoll files. 156 157 Filesystems that will need to be ported before switching to VFS2, but can 158 probably be skipped for early testing: 159 160 - overlayfs, which is needed for (at least) synthetic mount points. 161 162 - Support for host ttys. 163 164 - timerfd files. 165 166 Filesystems that can be probably dropped: 167 168 - ashmem, which is far too incomplete to use. 169 170 - binder, which is similarly far too incomplete to use. 171 172 - Save/restore. For instance, it is unclear if the current implementation of 173 the `state` package supports the inheritance pattern described above. 174 175 - Many features that were previously implemented by VFS must now be 176 implemented by individual filesystems (though, in most cases, this should 177 consist of calls to hooks or libraries provided by `vfs` or other packages). 178 This includes, but is not necessarily limited to: 179 180 - Block and character device special files 181 182 - Inotify 183 184 - File locking 185 186 - `O_ASYNC`