github.com/nya3jp/tast@v0.0.0-20230601000426-85c8e4d83a9b/docs/codelab_2.md (about) 1 # Tast Codelab #2: kernel.LogMount (go/tast-codelab-2) 2 3 > This document assumes that you've already gone through [Codelab #1]. 4 5 This codelab follows the creation of a Tast test that verifies that the kernel 6 logs a message to its ring buffer when a filesystem is mounted. In doing so, 7 we'll learn about the following: 8 9 * creating new contexts to limit the running time of operations 10 * declaring separate helper functions 11 * deciding when to run external commands vs. using standard Go code 12 * using channels to communicate between goroutines 13 * waiting for asynchronous events 14 * cleaning up after ourselves 15 16 ## Getting started 17 18 Before writing any code, let's think about the basic approach that we should use 19 to test this. We can create a filesystem within an on-disk file and then use a 20 [loopback mount]. When the filesystem is mounted, the kernel will asynchronously 21 write a message like this to its ring buffer: 22 23 ``` 24 [124273.844282] EXT4-fs (loop4): mounted filesystem without journal. Opts: (null) 25 ``` 26 27 The kernel ring buffer can be viewed using the [dmesg] command. One option would 28 be to mount the filesystem, sleep a bit, and then run `dmesg` and look for the 29 expected message, but that approach is problematic for multiple reasons: 30 31 1. The kernel might take longer than expected to log the message, resulting in 32 our test being flaky. 33 2. If other messages are being logged quickly while our test is running, the 34 `mounted filesystem` message might be pushed out of the buffer before we 35 check it, resulting in our test being flaky. 36 3. The ring buffer may already contain a `mounted filesystem` message from an 37 earlier mount, resulting in our test not detecting failures. 38 39 Luckily, in Linux 3.5.0 and later, `dmesg --follow` can be used to tail new log 40 messages. If we start `dmesg --follow` before mounting the filesystem, that 41 resolves the first two concerns above. For the third concern, we can use `dmesg 42 --clear` to clear the buffer beforehand — that doesn't completely eliminate the 43 risk of seeing a stale message, but it at least makes it less likely. 44 45 We also need to think about how we're going to consume `dmesg --follow`'s output 46 in our test. Go doesn't provide much standard support for non-blocking I/O, but 47 it does make it easy to run code concurrently in a separate [goroutine] and pass 48 data between goroutines using [channels]. (There are more helpful links about 49 concurrency in Go in the [Concurrency] section of the [Writing Tests] doc.) We 50 can start a goroutine that reads the output from the `dmesg` process as long as 51 it's running, and writes each line to a channel that's read by the main 52 goroutine. [select] then lets us to perform [non-blocking operations on 53 channels]. 54 55 Now that we've thought through how the test should work, let's start writing 56 some code. 57 58 [Codelab #1]: codelab_1.md 59 [loopback mount]: https://en.wikipedia.org/wiki/Loop_device 60 [dmesg]: https://linux.die.net/man/1/dmesg 61 [goroutine]: https://tour.golang.org/concurrency/1 62 [channels]: https://tour.golang.org/concurrency/2 63 [Concurrency]: https://chromium.googlesource.com/chromiumos/platform/tast/+/HEAD/docs/writing_tests.md#Concurrency 64 [Writing Tests]: https://chromium.googlesource.com/chromiumos/platform/tast/+/HEAD/docs/writing_tests.md 65 [select]: https://tour.golang.org/concurrency/5 66 [non-blocking operations on channels]: https://gobyexample.com/non-blocking-channel-operations 67 68 ## Test setup 69 70 We'll skip over boilerplate like import statements and the test metadata since 71 those are already covered by [Codelab #1], but you can see these portions of the 72 test in the full listing at the bottom of this document. Instead, let's start 73 with the beginning of our test function: 74 75 ```go 76 func LogMount(ctx context.Context, s *testing.State) { 77 // Use a shortened context for test operations to reserve time for cleanup. 78 shortCtx, shortCancel := ctxutil.Shorten(ctx, 15*time.Second) 79 defer shortCancel() 80 ``` 81 82 When the test exits, we need to perform various cleanup tasks. `defer` is the 83 standard way of doing this. Some of this cleanup, like running the `umount` 84 command, may require a context, though — if the test times out, the original 85 context will be expired by the time that the deferred functions are executed, 86 resulting in them not working correctly. A common way to resolve this is by 87 deriving a new context with a slightly-shorter deadline from the original one 88 that was passed to the test function. Then, we can use the shorter context 89 throughout the test while saving the longer original context for cleanup. The 90 [ctxutil.Shorten] function makes it easy to derive a new context with a shorter 91 deadline. 92 93 We also create a temporary directory that we'll use to hold both the filesystem 94 and the mount point that we use for it: 95 96 ```go 97 // Create a temp dir in /tmp to ensure that we don't leave stale mounts in 98 // Tast's temp dir if we're interrupted. 99 td, err := ioutil.TempDir("/tmp", "tast.kernel.LogMount.") 100 if err != nil { 101 s.Fatal("Failed to create temp dir: ", err) 102 } 103 defer os.RemoveAll(td) 104 ``` 105 106 We explicitly ask for the temp dir to be created under `/tmp`. Tast sets 107 `TMPDIR` explicitly for tests, and if we're interrupted before we can unmount 108 the filesystem, leaving a mount in `TMPDIR` can cause problems when Tast tries 109 to clean it up later. 110 111 [ctxutil.Shorten]: https://godoc.org/chromium.googlesource.com/chromiumos/platform/tast.git/src/go.chromium.org/tast/core/ctxutil#Shorten 112 113 ## Creating the filesystem 114 115 Next, we need to create the on-disk filesystem so we can mount it. Since we only 116 need to do this once, we could just put the code in the main test function. 117 However, arguments can be made for declaring a separate helper function: 118 119 * `defer` can be used to run code when the helper function returns. 120 * Creating a filesystem requires many steps and is a self-contained operation, 121 so declaring a helper function may make the main test function more 122 readable. 123 124 Go allows top-level identifiers to be declared in any order, so we'll put this 125 helper at the end of the file, after the main test function. This makes it 126 easier for readers to see the big picture without getting bogged down in the 127 details. 128 129 Top-level functions, variables, and constants are shared across all tests in the 130 `kernel` package, so we'll also use a descriptive name that's hopefully unlikely 131 to conflict with other tests: 132 133 ```go 134 // makeFilesystem creates an ext4 filesystem of the requested size (in bytes) at path p. 135 func makeFilesystem(ctx context.Context, p string, size int64) error { 136 ``` 137 138 This function's signature deserves some discussion. It takes a context, which is 139 needed for running the `mkfs.ext4` command, and it returns an `error`. Instead 140 of returning an `error`, we could have made it take a `*testing.State` argument 141 and report errors directly using the `Fatal` method, but there are reasons to 142 avoid this: 143 144 * `testing.State` carries a lot of additional information that this function 145 doesn't need. It's often better to practice [information hiding] and give a 146 function only the information that it needs. 147 * If errors are reported directly using `testing.State`'s `Error`, `Fatal`, 148 etc. methods, the caller loses the ability to decide how to deal with 149 errors. 150 * When errors are reported directly by helper functions, the overall flow of 151 the test becomes harder to see. Every call to a helper function could 152 potentially result in a fatal test error. 153 154 As a result, we choose to instead return an `error` value from this function. 155 156 Next, we need to create a new file: 157 158 ```go 159 f, err := os.Create(p) 160 if err != nil { 161 return err 162 } 163 164 // Clean up if we get an error mid-initialization. 165 toClose := f 166 defer func() { 167 if toClose != nil { 168 toClose.Close() 169 } 170 }() 171 ``` 172 173 The `toClose` variable is also worth discussing. The usual pattern in Go is to 174 create an [os.File] and then immediately defer a call to its `Close` method. 175 Here, we want to use `f` to write to the file but then close it before running 176 `mkfs.ext4` to initialize the filesystem. Various things can go wrong while 177 we're writing the file, though, and we don't want to need to remember to call 178 `f.Close` before returning in each of those cases. 179 180 To support automatically closing `f` if an error is encountered while still 181 being able to manually close it midway through the function, we declare a 182 `toClose` variable and defer a function that will close it if it's non-`nil`. If 183 we end up closing `f` manually, we can set `toClose` to `nil` so that the 184 deferred function becomes a no-op. This pattern is useful when performing 185 initialization that requires multiple steps: if initialization is interrupted, 186 cleanup will be automatically performed. 187 188 ```go 189 // Seek to the end of the requested size and write a byte. 190 if _, err := f.Seek(size-1, 0); err != nil { 191 return err 192 } 193 if _, err := f.Write([]byte{0xff}); err != nil { 194 return err 195 } 196 toClose = nil // disarm cleanup 197 if err := f.Close(); err != nil { 198 return err 199 } 200 201 return testexec.CommandContext(ctx, "mkfs.ext4", p).Run(testexec.DumpLogOnError) 202 } 203 ``` 204 205 The final thing of note in this function is the use of standard Go code like 206 `os.Create`, `f.Seek`, and `f.Write` instead of e.g. calling the `dd` command to 207 write a file. When the work being performed is straightforward, standard code 208 is usually preferable, as it both makes it easier to see the exact operations 209 that are being performed and provides vastly better error-reporting when 210 something goes wrong. 211 212 [information hiding]: https://en.wikipedia.org/wiki/Information_hiding 213 [os.File]: https://golang.org/pkg/os/#File 214 215 ## Starting dmesg 216 217 Let's also create a helper function that starts the `dmesg` process and reads 218 its output: 219 220 ```go 221 // streamDmesg clears the kernel ring buffer and then starts a dmesg process and 222 // asynchronously copies all log messages to a channel. The caller is responsible 223 // for killing and waiting on the returned process. 224 func streamDmesg(ctx context.Context) (*testexec.Cmd, <-chan string, error) { 225 ``` 226 227 This function returns a handle that should be used to kill and clean up the 228 already-started `dmesg` process, a [unidirectional channel] for reading log 229 messages, and an error. We don't [name the return arguments] since their purpose 230 seems clear, but we still document what the function does, and (most 231 importantly) the contract that the caller is responsible for stopping the 232 process. 233 234 Next, we'll run `dmesg --clear` synchronously to get rid of old messages and 235 then start `dmesg --follow` asychronously with its stdout automatically copied 236 to a pipe named `stdout`: 237 238 ```go 239 // Clear the buffer first so we don't see stale messages. 240 if err := testexec.CommandContext(ctx, "dmesg", "--clear").Run( 241 testexec.DumpLogOnError); err != nil { 242 return nil, nil, errors.Wrap(err, "failed to clear log buffer") 243 } 244 245 // Start a dmesg process that writes messages to stdout as they're logged. 246 cmd := testexec.CommandContext(ctx, "dmesg", "--follow") 247 stdout, err := cmd.StdoutPipe() 248 if err != nil { 249 return nil, nil, err 250 } 251 if err := cmd.Start(); err != nil { 252 return nil, nil, errors.Wrap(err, "failed to start dmesg") 253 } 254 ``` 255 256 [errors.Wrap] is used to attach additional context to some of the returned 257 errors. 258 259 Note the difference in formatting between the `error` values that we're 260 returning here and the `s.Fatal` message that appeared in the main test 261 function. Go `error` values should always contain lowercase phrases without any 262 punctuation, while it's the convention in Tast for logging and error messages 263 (i.e. those written via `testing.State` or [testing.ContextLog]) to be 264 capitalized, again without any punctuation. See the [Formatting] section of the 265 [Writing Tests] document for more discussion. 266 267 Finally, we need to read the output from `dmesg`: 268 269 ```go 270 // Start a goroutine that just passes lines from dmesg to a channel. 271 ch := make(chan string) 272 go func() { 273 defer close(ch) 274 275 // Writes msg to ch and returns true if more messages should be written. 276 writeMsg := func(ctx context.Context, msg string) bool { 277 select { 278 case ch <- msg: 279 return true 280 case ctx.Done(): 281 return false 282 } 283 } 284 285 // The Scan method will return false once the dmesg process is killed. 286 sc := bufio.NewScanner(stdout) 287 for sc.Scan() { 288 if !writeMsg(ctx, sc.Text()) { 289 break 290 } 291 } 292 // Don't bother checking sc.Err(). The test will already fail if the expected 293 // message isn't seen. 294 }() 295 296 return cmd, ch, nil 297 } 298 ``` 299 300 We create a channel for passing log messages and then start a goroutine that 301 uses [bufio.Scanner] to read from the stdout pipe one line at a time. We loop 302 over the lines and copy each to the channel. 303 304 The channel is unbuffered, so writes to it will block if there isn't a reader on 305 the other end. The reader in this scenario is the main test function, and we 306 don't want our goroutine to block indefinitely if the test exits before it's 307 read everything that we're trying to write to the channel. 308 309 To handle this case, we define a nested `writeMsg` function that performs a 310 blocking `select` statement which waits until either channel has been available 311 for write operations or `ctx` context deadline has been reached (indicating that 312 the test completed). 313 314 [unidirectional channel]: https://gobyexample.com/channel-directions 315 [name the return arguments]: https://github.com/golang/go/wiki/CodeReviewComments#named-result-parameters 316 [testing.ContextLog]: https://godoc.org/chromium.googlesource.com/chromiumos/platform/tast.git/src/go.chromium.org/tast/core/testing#ContextLog 317 [errors.Wrap]: https://godoc.org/chromium.googlesource.com/chromiumos/platform/tast.git/src/go.chromium.org/tast/core/errors#Wrap 318 [Formatting]: https://chromium.googlesource.com/chromiumos/platform/tast/+/HEAD/docs/writing_tests.md#Formatting 319 [bufio.Scanner]: https://golang.org/pkg/bufio/#Scanner 320 321 ## Finishing the test 322 323 Let's go back to the main test function now and make use of the helper functions 324 that we just added. First, we'll create a 1-megabyte filesystem named `fs.bin` 325 in the temporary directory that we created earlier: 326 327 ```go 328 src := filepath.Join(td, "fs.bin") 329 s.Log("Creating filesystem at ", src) 330 if err := makeFilesystem(shortCtx, src, 1024*1024); err != nil { 331 s.Fatal("Failed creating filesystem: ", err) 332 } 333 ``` 334 335 We're using `s.Log` to communicate what we're going to do before doing it, which 336 is helpful both for people reading the code (i.e. these messages can take the 337 place of comments) and for people looking at test logs. It's especially useful 338 to log before performing an operation that could take a while to complete, as 339 these messages are essential for investigating timeouts. 340 341 Next, we start the `dmesg --follow` process: 342 343 ```go 344 s.Log("Starting dmesg") 345 dmesgCmd, dmesgCh, err := streamDmesg(shortCtx) 346 if err != nil { 347 s.Fatal("Failed to start dmesg: ", err) 348 } 349 defer dmesgCmd.Wait() 350 defer dmesgCmd.Kill() 351 ``` 352 353 We defer `Wait` and `Kill` calls to ensure that the process is stopped when the 354 main test function returns. `defer` statements are executed in last-in, 355 first-out order, so these calls will send `SIGKILL` to the process before 356 waiting on its exit status. 357 358 Now that we're reading log messages from the kernel, we can create a mount point 359 and mount the filesystem: 360 361 ```go 362 dst := filepath.Join(td, "mnt") 363 s.Logf("Mounting %v at %v", src, dst) 364 if err := os.Mkdir(dst, 0755); err != nil { 365 s.Fatal("Failed to create mount point: ", err) 366 } 367 if err := testexec.CommandContext(shortCtx, "mount", "-o", "loop", src, dst).Run(); err != nil { 368 s.Fatal("Failed to mount filesystem: ", err) 369 } 370 defer func() { 371 s.Log("Unmounting ", dst) 372 if err := testexec.CommandContext(ctx, "umount", "-f", dst).Run(); err != nil { 373 s.Error("Failed to unmount filesystem: ", err) 374 } 375 }() 376 ``` 377 378 Immediately after the `mount` command succeeds using the shortened context, we 379 defer a function that will run `umount` using the full-length context. 380 381 Finally, we need to wait for the expected log message to show up on the channel 382 that the goroutine started by `streamDmesg` is writing to. We expect the message 383 to be logged reasonably quickly, so we derive a new context with a 15-second 384 timeout: 385 386 ```go 387 // The message shouldn't take long to show up, so derive a short context for it. 388 watchCtx, watchCancel := context.WithTimeout(shortCtx, 15*time.Second) 389 defer watchCancel() 390 ``` 391 392 Now we just need to see the `mounted filesystem` message from the kernel. We log 393 a message to make it clear what we're doing and again use `select` to avoid 394 blocking indefinitely: 395 396 ```go 397 // We expect to see a message like "[124273.844282] EXT4-fs (loop4): mounted filesystem without journal. Opts: (null)". 398 const expMsg = "mounted filesystem" 399 s.Logf("Watching for %q in dmesg output", expMsg) 400 WatchLoop: 401 for { 402 select { 403 case msg := <-dmesgCh: 404 s.Logf("Got message %q", msg) 405 if strings.Contains(msg, expMsg) { 406 break WatchLoop 407 } 408 case <-watchCtx.Done(): 409 s.Fatalf("Didn't see %q in dmesg: %v", expMsg, watchCtx.Err()) 410 } 411 } 412 } 413 ``` 414 415 This time, we have a loop that repeatedly waits either for `dmesgCh` to become 416 readable or for `watchCtx` to expire (indicating that 15 seconds have passed). 417 If we see the expected message, we break out of the loop using a `WatchLoop` 418 [label] that we defined earlier. 419 420 [label]: https://golang.org/ref/spec#Break_statements 421 422 ## Full code 423 424 Here's a full listing of the test's code: 425 426 ```go 427 // Copyright <copyright_year> The ChromiumOS Authors 428 // Use of this source code is governed by a BSD-style license that can be 429 // found in the LICENSE file. 430 431 package kernel 432 433 import ( 434 "bufio" 435 "context" 436 "io/ioutil" 437 "os" 438 "path/filepath" 439 "strings" 440 "time" 441 442 "go.chromium.org/tast-tests/cros/common/testexec" 443 "go.chromium.org/tast/core/ctxutil" 444 "go.chromium.org/tast/core/errors" 445 "go.chromium.org/tast/core/testing" 446 ) 447 448 func init() { 449 testing.AddTest(&testing.Test{ 450 Func: LogMount, 451 Desc: "Checks that the kernel logs a message when a filesystem is mounted", 452 Contacts: []string{"me@chromium.org", "tast-users@chromium.org"}, 453 Attr: []string{"group:mainline", "informational"}, 454 }) 455 } 456 457 func LogMount(ctx context.Context, s *testing.State) { 458 // Use a shortened context for test operations to reserve time for cleanup. 459 shortCtx, shortCancel := ctxutil.Shorten(ctx, 15*time.Second) 460 defer shortCancel() 461 462 // Create a temp dir in /tmp to ensure that we don't leave stale mounts in 463 // Tast's temp dir if we're interrupted. 464 td, err := ioutil.TempDir("/tmp", "tast.kernel.LogMount.") 465 if err != nil { 466 s.Fatal("Failed to create temp dir: ", err) 467 } 468 defer os.RemoveAll(td) 469 470 src := filepath.Join(td, "fs.bin") 471 s.Log("Creating filesystem at ", src) 472 if err := makeFilesystem(shortCtx, src, 1024*1024); err != nil { 473 s.Fatal("Failed creating filesystem: ", err) 474 } 475 476 s.Log("Starting dmesg") 477 dmesgCmd, dmesgCh, err := streamDmesg(shortCtx) 478 if err != nil { 479 s.Fatal("Failed to start dmesg: ", err) 480 } 481 defer dmesgCmd.Wait() 482 defer dmesgCmd.Kill() 483 484 dst := filepath.Join(td, "mnt") 485 s.Logf("Mounting %v at %v", src, dst) 486 if err := os.Mkdir(dst, 0755); err != nil { 487 s.Fatal("Failed to create mount point: ", err) 488 } 489 if err := testexec.CommandContext(shortCtx, "mount", "-o", "loop", src, dst).Run(); err != nil { 490 s.Fatal("Failed to mount filesystem: ", err) 491 } 492 defer func() { 493 s.Log("Unmounting ", dst) 494 if err := testexec.CommandContext(ctx, "umount", "-f", dst).Run(); err != nil { 495 s.Error("Failed to unmount filesystem: ", err) 496 } 497 }() 498 499 // The message shouldn't take long to show up, so derive a short context for it. 500 watchCtx, watchCancel := context.WithTimeout(shortCtx, 15*time.Second) 501 defer watchCancel() 502 503 // We expect to see a message like "[124273.844282] EXT4-fs (loop4): mounted filesystem without journal. Opts: (null)". 504 const expMsg = "mounted filesystem" 505 s.Logf("Watching for %q in dmesg output", expMsg) 506 WatchLoop: 507 for { 508 select { 509 case msg := <-dmesgCh: 510 s.Logf("Got message %q", msg) 511 if strings.Contains(msg, expMsg) { 512 break WatchLoop 513 } 514 case <-watchCtx.Done(): 515 s.Fatalf("Didn't see %q in dmesg: %v", expMsg, watchCtx.Err()) 516 } 517 } 518 } 519 520 // makeFilesystem creates an ext4 filesystem of the requested size (in bytes) at path p. 521 func makeFilesystem(ctx context.Context, p string, size int64) error { 522 f, err := os.Create(p) 523 if err != nil { 524 return err 525 } 526 527 // Clean up if we get an error mid-initialization. 528 toClose := f 529 defer func() { 530 if toClose != nil { 531 toClose.Close() 532 } 533 }() 534 535 // Seek to the end of the requested size and write a byte. 536 if _, err := f.Seek(size-1, 0); err != nil { 537 return err 538 } 539 if _, err := f.Write([]byte{0xff}); err != nil { 540 return err 541 } 542 toClose = nil // disarm cleanup 543 if err := f.Close(); err != nil { 544 return err 545 } 546 547 return testexec.CommandContext(ctx, "mkfs.ext4", p).Run(testexec.DumpLogOnError) 548 } 549 550 // streamDmesg clears the kernel ring buffer and then starts a dmesg process and 551 // asynchronously copies all log messages to a channel. The caller is responsible 552 // for killing and waiting on the returned process. 553 func streamDmesg(ctx context.Context) (*testexec.Cmd, <-chan string, error) { 554 // Clear the buffer first so we don't see stale messages. 555 if err := testexec.CommandContext(ctx, "dmesg", "--clear").Run( 556 testexec.DumpLogOnError); err != nil { 557 return nil, nil, errors.Wrap(err, "failed to clear log buffer") 558 } 559 560 // Start a dmesg process that writes messages to stdout as they're logged. 561 cmd := testexec.CommandContext(ctx, "dmesg", "--follow") 562 stdout, err := cmd.StdoutPipe() 563 if err != nil { 564 return nil, nil, err 565 } 566 if err := cmd.Start(); err != nil { 567 return nil, nil, errors.Wrap(err, "failed to start dmesg") 568 } 569 570 // Start a goroutine that just passes lines from dmesg to a channel. 571 ch := make(chan string) 572 go func() { 573 defer close(ch) 574 575 // Writes msg to ch and returns true if more messages should be written. 576 writeMsg := func(ctx context.Context, msg string) bool { 577 select { 578 case ch <- msg: 579 return true 580 case ctx.Done(): 581 return false 582 } 583 } 584 585 // The Scan method will return false once the dmesg process is killed. 586 sc := bufio.NewScanner(stdout) 587 for sc.Scan() { 588 if !writeMsg(ctx, sc.Text()) { 589 break 590 } 591 } 592 // Don't bother checking sc.Err(). The test will already fail if the expected 593 // message isn't seen. 594 }() 595 596 return cmd, ch, nil 597 } 598 ```