github.com/criyle/go-sandbox@v0.10.3/README.md (about) 1 # go-sandbox 2 3 [![GoDoc](https://godoc.org/github.com/criyle/go-sandbox?status.svg)](https://godoc.org/github.com/criyle/go-sandbox) [![Go Report Card](https://goreportcard.com/badge/github.com/criyle/go-sandbox)](https://goreportcard.com/report/github.com/criyle/go-sandbox) [![Release](https://img.shields.io/github/v/tag/criyle/go-sandbox)](https://github.com/criyle/go-sandbox/releases/latest) 4 5 Original goal was to replica [uoj-judger/run_program](https://github.com/vfleaking/uoj) in GO language using [libseccomp](https://github.com/pkg/seccomp/libseccomp-golang). As technology grows, it also implements new technologies including Linux namespace and cgroup. 6 7 The idea of rootfs and interval CPU usage checking comes from [syzoj/judge-v3](https://github.com/syzoj/judge-v3) and the pooled pre-forked container comes from [vijos/jd4](https://github.com/vijos/jd4). 8 9 If you are looking for sandbox implementation via REST / gRPC API, please check [go-judge](https://github.com/criyle/go-judge). 10 11 Notice: Only works on Linux since ptrace, unshare, cgroup are available only on Linux 12 13 ## Build & Install 14 15 - install latest go compiler from [golang/download](https://golang.org/dl/) 16 - install libseccomp library: (for Ubuntu) `apt install libseccomp-dev` 17 - build & install: `go install github.com/criyle/go-sandbox/...` 18 19 ## Technologies 20 21 ### libseccomp + ptrace (improved UOJ sandbox) 22 23 1. Restricted computing resource by POSIX rlimit: Time & Memory (Stack) & Output 24 2. Restricted syscall access (by libseccomp & ptrace) 25 3. Restricted file access (read & write & access & exec). Evaluated by UOJ FileSet 26 27 Improvements: 28 29 1. Precise resource limits (s -> ms, mb -> kb) 30 2. More architectures (arm32, arm64) 31 3. Allow multiple traced programs in different threads 32 4. Allow pipes as input / output files 33 34 Default file access syscall check: 35 36 - check file read / write: `open`, `openat` 37 - check file read: `readlink`, `readlinkat` 38 - check file write: `unlink`, `unlinkat`, `chmod`, `rename` 39 - check file access: `stat`, `lstat`, `access`, `faccessat` 40 - check file exec: `execve`, `execveat` 41 42 ### linux namespace + cgroup 43 44 1. Unshare & bind mount rootfs based on hostfs (eliminated ptrace) 45 2. Use Linux Control Groups to limit & acct CPU & memory (eliminated wait4.rusage) 46 3. Container tech with execveat memfd, sethostname, setdomainname 47 48 ## Design 49 50 ### Result Status 51 52 - Normal (no error) 53 - Program Error 54 - Resource Limit Exceeded 55 - Time 56 - Memory 57 - Output 58 - Unauthorized Access 59 - Disallowed Syscall 60 - Runtime Error 61 - Signalled 62 - `SIGXCPU` / `SIGKILL` are treated as TimeLimitExceeded by rlimit or caller kill 63 - `SIGXFSZ` is treated as OutputLimitExceeded by rlimit 64 - `SIGSYS` is treaded as Disallowed Syscall by seccomp 65 - Potential Runtime error are: `SIGSEGV` (segment fault) 66 - Nonzero Exit Status 67 - Program Runner Error 68 69 ### Result Structure 70 71 ``` go 72 type Result struct { 73 Status // result status 74 ExitStatus int // exit status (signal number if signalled) 75 Error string // potential detailed error message (for program runner error) 76 77 Time time.Duration // used user CPU time (underlying type int64 in ns) 78 Memory Size // used user memory (underlying type uint64 in bytes) 79 // metrics for the program runner 80 SetUpTime time.Duration 81 RunningTime time.Duration 82 } 83 ``` 84 85 ### Runner Interface 86 87 Configured runner to run the program. `Context` is used to cancel (control time limit exceeded event; should not be nil). 88 89 ``` go 90 type Runner interface { 91 Run(context.Context) <-chan runner.Result 92 } 93 ``` 94 95 ### Pre-forked Container Protocol 96 97 1. Pre-fork container to run programs inside 98 2. Unix socket to pass fd inside / outside 99 100 Container / Host Communication Protocol (single thread): 101 102 - ping (alive check): 103 - reply: pong 104 - conf (set configuration): 105 - reply pong 106 - open (open files in given mode inside container): 107 - send: []OpenCmd 108 - reply: "success", file fds / "error" 109 - delete (unlink file / rmdir dir inside container): 110 - send: path 111 - reply: "finished" / "error" 112 - reset (clean up container for later use (clear workdir / tmp)): 113 - send: 114 - reply: "success" 115 - execve: (execute file inside container): 116 - send: argv, env, rLimits, fds 117 - reply: 118 - success: "success", pid 119 - failed: "failed" 120 - send (success): "init_finished" (as cmd) 121 - reply: "finished" / send: "kill" (as cmd) 122 - send: "kill" (as cmd) / reply: "finished" 123 - reply: 124 125 Any socket related error will cause the container exit (with all process inside container) 126 127 ### Pre-forked Container Environment 128 129 Container restricted environment is accessed though RPC interface defined by above protocol 130 131 Provides: 132 133 - File access 134 - Open: create / access files 135 - Delete: remove file 136 - Management 137 - Ping: alive check 138 - Reset: remove temporary files 139 - Destroy: destroy the container environment 140 - Run program 141 - Execve: execute program with given parameters 142 143 ``` go 144 type Environment interface { 145 Ping() error 146 Open([]OpenCmd) ([]*os.File, error) 147 Delete(p string) error 148 Reset() error 149 Execve(context.Context, ExecveParam) <-chan runner.Result 150 Destroy() error 151 } 152 ``` 153 154 ## Packages (/pkg) 155 156 - seccomp: provides seccomp type definition 157 - libseccomp: provides utility function that wrappers libseccomp 158 - forkexec: fork-exec provides mount, unshare, ptrace, seccomp, capset before exec 159 - memfd: read regular file and creates a sealed memfd for its contents 160 - unixsocket: send / recv oob msg from a unix socket 161 - cgroup: creates cgroup directories and collects resource usage / limits 162 - mount: provides utility function that wrappers mount syscall 163 - rlimit: provides utility function that defines rlimit syscall 164 - pipe: provides wrapper to collect all written content through pipe 165 166 ## Packages 167 168 - cmd/runprog/config: defines arch & language specified trace condition for ptrace runner from UOJ 169 - container: creates pre-forked container to run programs inside 170 - runner: interface to run program 171 - ptrace: wrapper to call forkexec and ptracer 172 - filehandler: an example implementation of UOJ file set 173 - unshare: wrapper to call forkexec and unshared namespaces 174 - ptracer: ptrace tracer and provides syscall trap filter context 175 176 ## Executable 177 178 - runprog: safely run program by unshare / ptrace / pre-forked containers 179 180 ## Configurations 181 182 - config/config.go: all configs toward running specs (similar to UOJ) 183 184 ## Kernel Versions 185 186 - 5.19: `memory.peak` in cgroup v2 187 - 4.15: cgroup v2 188 - 4.14: SECCOMP_RET_KILL_PROCESS 189 - 4.6: CLONE_NEWCGROUP 190 - 3.19: execveat() 191 - 3.17: seccomp, memfd_create 192 - 3.10: CentOS 7 193 - 3.8: CLONE_NEWUSER without CAP_SYS_ADMIN, CAP_SETUID, CAP_SETGID 194 - 3.5: prctl(PR_SET_NO_NEW_PRIVS) 195 - 2.6.36: prlimit64 196 197 ## Benchmarks 198 199 ### ForkExec 200 201 ```bash 202 $ go test -bench . -benchtime 10s 203 goos: linux 204 goarch: amd64 205 pkg: github.com/criyle/go-sandbox/pkg/forkexec 206 BenchmarkSimpleFork-4 12409 996096 ns/op 207 BenchmarkUnsharePid-4 10000 1065168 ns/op 208 BenchmarkUnshareUser-4 10000 1061770 ns/op 209 BenchmarkUnshareUts-4 10000 1056558 ns/op 210 BenchmarkUnshareCgroup-4 10000 1049446 ns/op 211 BenchmarkUnshareIpc-4 709 16114052 ns/op 212 BenchmarkUnshareMount-4 745 16207754 ns/op 213 BenchmarkUnshareNet-4 3643 3492924 ns/op 214 BenchmarkFastUnshareMountPivot-4 612 20967318 ns/op 215 BenchmarkUnshareAll-4 837 14047995 ns/op 216 BenchmarkUnshareMountPivot-4 488 24198331 ns/op 217 PASS 218 ok github.com/criyle/go-sandbox/pkg/forkexec 147.186s 219 ``` 220 221 ### Container 222 223 ```bash 224 $ go test -bench . -benchtime 10s 225 goos: linux 226 goarch: amd64 227 pkg: github.com/criyle/go-sandbox/container 228 BenchmarkContainer-4 5907 2062070 ns/op 229 PASS 230 ok github.com/criyle/go-sandbox/container 21.763s 231 ``` 232 233 ### Cgroup 234 235 ```bash 236 $ go test -bench . -benchtime 10s 237 goos: linux 238 goarch: amd64 239 pkg: github.com/criyle/go-sandbox/pkg/cgroup 240 BenchmarkCgroup-4 50283 245094 ns/op 241 PASS 242 ok github.com/criyle/go-sandbox/pkg/cgroup 14.744s 243 ``` 244 245 ### Socket 246 247 Blocking: 248 249 ```bash 250 $ go test -bench . -benchtime 10s 251 goos: linux 252 goarch: amd64 253 pkg: github.com/criyle/go-sandbox/pkg/unixsocket 254 cpu: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz 255 BenchmarkBaseline-8 12170148 1048 ns/op 256 BenchmarkGoroutine-8 2658846 4910 ns/op 257 BenchmarkChannel-8 8454133 1431 ns/op 258 BenchmarkChannelBuffed-8 8767264 1357 ns/op 259 BenchmarkChannelBuffed4-8 9670935 1230 ns/op 260 BenchmarkEmptyGoroutine-8 34927512 342.8 ns/op 261 PASS 262 ok github.com/criyle/go-sandbox/pkg/unixsocket 83.669s 263 ``` 264 265 Non-block: 266 267 ```bash 268 $ go test -bench . -benchtime 10s 269 goos: linux 270 goarch: amd64 271 pkg: github.com/criyle/go-sandbox/pkg/unixsocket 272 cpu: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz 273 BenchmarkBaseline-8 11609772 1001 ns/op 274 BenchmarkGoroutine-8 2470767 4788 ns/op 275 BenchmarkChannel-8 8488646 1427 ns/op 276 BenchmarkChannelBuffed-8 8876050 1345 ns/op 277 BenchmarkChannelBuffed4-8 9813187 1212 ns/op 278 BenchmarkEmptyGoroutine-8 34852828 342.2 ns/op 279 PASS 280 ok github.com/criyle/go-sandbox/pkg/unixsocket 81.679s 281 ```