github.com/SagerNet/gvisor@v0.0.0-20210707092255-7731c139d75c/pkg/sentry/fsimpl/ext/README.md (about) 1 ## EXT(2/3/4) File System 2 3 This is a filesystem driver which supports ext2, ext3 and ext4 filesystems. 4 Linux has specialized drivers for each variant but none which supports all. This 5 library takes advantage of ext's backward compatibility and understands the 6 internal organization of on-disk structures to support all variants. 7 8 This driver implementation diverges from the Linux implementations in being more 9 forgiving about versioning. For instance, if a filesystem contains both extent 10 based inodes and classical block map based inodes, this driver will not complain 11 and interpret them both correctly. While in Linux this would be an issue. This 12 blurs the line between the three ext fs variants. 13 14 Ext2 is considered deprecated as of Red Hat Enterprise Linux 7, and ext3 has 15 been superseded by ext4 by large performance gains. Thus it is recommended to 16 upgrade older filesystem images to ext4 using e2fsprogs for better performance. 17 18 ### Read Only 19 20 This driver currently only allows read only operations. A lot of the design 21 decisions are based on this feature. There are plans to implement write (the 22 process for which is documented in the future work section). 23 24 ### Performance 25 26 One of the biggest wins about this driver is that it directly talks to the 27 underlying block device (or whatever persistent storage is being used), instead 28 of making expensive RPCs to a gofer. 29 30 Another advantage is that ext fs supports fast concurrent reads. Currently the 31 device is represented using a `io.ReaderAt` which allows for concurrent reads. 32 All reads are directly passed to the device driver which intelligently serves 33 the read requests in the optimal order. There is no congestion due to locking 34 while reading in the filesystem level. 35 36 Reads are optimized further in the way file data is transferred over to user 37 memory. Ext fs directly copies over file data from disk into user memory with no 38 additional allocations on the way. We can only get faster by preloading file 39 data into memory (see future work section). 40 41 The internal structures used to represent files, inodes and file descriptors use 42 a lot of inheritance. With the level of indirection that an interface adds with 43 an internal pointer, it can quickly fragment a structure across memory. As this 44 runs along side a full blown kernel (which is memory intensive), having a 45 fragmented struct might hurt performance. Hence these internal structures, 46 though interfaced, are tightly packed in memory using the same inheritance 47 pattern that pkg/sentry/vfs uses. The pkg/sentry/fsimpl/ext/disklayout package 48 makes an execption to this pattern for reasons documented in the package. 49 50 ### Security 51 52 This driver also intends to help sandbox the container better by reducing the 53 surface of the host kernel that the application touches. It prevents the 54 application from exploiting vulnerabilities in the host filesystem driver. All 55 `io.ReaderAt.ReadAt()` calls are translated to `pread(2)` which are directly 56 passed to the device driver in the kernel. Hence this reduces the surface for 57 attack. 58 59 The application can not affect any host filesystems other than the one passed 60 via block device by the user. 61 62 ### Future Work 63 64 #### Write 65 66 To support write operations we would need to modify the block device underneath. 67 Currently, the driver does not modify the device at all, not even for updating 68 the access times for reads. Modifying the filesystem incorrectly can corrupt it 69 and render it unreadable for other correct ext(x) drivers. Hence caution must be 70 maintained while modifying metadata structures. 71 72 Ext4 specifically is built for performance and has added a lot of complexity as 73 to how metadata structures are modified. For instance, files that are organized 74 via an extent tree which must be balanced and file data blocks must be placed in 75 the same extent as much as possible to increase locality. Such properties must 76 be maintained while modifying the tree. 77 78 Ext filesystems boast a lot about locality, which plays a big role in them being 79 performant. The block allocation algorithm in Linux does a good job in keeping 80 related data together. This behavior must be maintained as much as possible, 81 else we might end up degrading the filesystem performance over time. 82 83 Ext4 also supports a wide variety of features which are specialized for varying 84 use cases. Implementing all of them can get difficult very quickly. 85 86 Ext(x) checksums all its metadata structures to check for corruption, so 87 modification of any metadata struct must correspond with re-checksumming the 88 struct. Linux filesystem drivers also order on-disk updates intelligently to not 89 corrupt the filesystem and also remain performant. The in-memory metadata 90 structures must be kept in sync with what is on disk. 91 92 There is also replication of some important structures across the filesystem. 93 All replicas must be updated when their original copy is updated. There is also 94 provisioning for snapshotting which must be kept in mind, although it should not 95 affect this implementation unless we allow users to create filesystem snapshots. 96 97 Ext4 also introduced journaling (jbd2). The journal must be updated 98 appropriately. 99 100 #### Performance 101 102 To improve performance we should implement a buffer cache, and optionally, read 103 ahead for small files. While doing so we must also keep in mind the memory usage 104 and have a reasonable cap on how much file data we want to hold in memory. 105 106 #### Features 107 108 Our current implementation will work with most ext4 filesystems for readonly 109 purposed. However, the following features are not supported yet: 110 111 - Journal 112 - Snapshotting 113 - Extended Attributes 114 - Hash Tree Directories 115 - Meta Block Groups 116 - Multiple Mount Protection 117 - Bigalloc