github.com/spotify/syslog-redirector-golang@v0.0.0-20140320174030-4859f03d829a/blog/content/race-detector.article (about) 1 Introducing the Go Race Detector 2 26 Jun 2013 3 Tags: concurrency, technical 4 5 Dmitry Vyukov 6 7 Andrew Gerrand 8 9 * Introduction 10 11 [[http://en.wikipedia.org/wiki/Race_condition][Race conditions]] are among the 12 most insidious and elusive programming errors. They typically cause erratic and 13 mysterious failures, often long after the code has been deployed to production. 14 While Go's concurrency mechanisms make it easy to write clean concurrent code, 15 they don't prevent race conditions. Care, diligence, and testing are required. 16 And tools can help. 17 18 We're happy to announce that Go 1.1 includes a 19 [[http://golang.org/doc/articles/race_detector.html][race detector]], 20 a new tool for finding race conditions in Go code. 21 It is currently available for Linux, OS X, and Windows systems 22 with 64-bit x86 processors. 23 24 The race detector is based on the C/C++ 25 [[https://code.google.com/p/thread-sanitizer/][ThreadSanitizer runtime library]], 26 which has been used to detect many errors in Google's internal code base and in 27 [[http://www.chromium.org/][Chromium]]. 28 The technology was integrated with Go in September 2012; since then it has detected 29 [[https://code.google.com/p/go/issues/list?can=1&q=ThreadSanitizer][42 races]] 30 in the standard library. It is now part of our continuous build process, 31 where it continues to catch race conditions as they arise. 32 33 * How it works 34 35 The race detector is integrated with the go tool chain. When the 36 `-race` command-line flag is set, the compiler instruments all memory accesses 37 with code that records when and how the memory was accessed, while the runtime 38 library watches for unsynchronized accesses to shared variables. 39 When such "racy" behavior is detected, a warning is printed. 40 (See [[https://code.google.com/p/thread-sanitizer/wiki/Algorithm][this article]] 41 for the details of the algorithm.) 42 43 Because of its design, the race detector can detect race conditions only when 44 they are actually triggered by running code, which means it's important to run 45 race-enabled binaries under realistic workloads. 46 However, race-enabled binaries can use ten times the CPU and memory, so it is 47 impractical to enable the race detector all the time. 48 One way out of this dilemma is to run some tests with the race detector 49 enabled. Load tests and integration tests are good candidates, since they tend 50 to exercise concurrent parts of the code. 51 Another approach using production workloads is to deploy a single race-enabled 52 instance within a pool of running servers. 53 54 * Using the race detector 55 56 The race detector is fully integrated with the Go tool chain. 57 To build your code with the race detector enabled, just add the 58 `-race` flag to the command line: 59 60 $ go test -race mypkg // test the package 61 $ go run -race mysrc.go // compile and run the program 62 $ go build -race mycmd // build the command 63 $ go install -race mypkg // install the package 64 65 To try out the race detector for yourself, fetch and run this example program: 66 67 $ go get -race code.google.com/p/go.blog/support/racy 68 $ racy 69 70 * Examples 71 72 Here are two examples of real issues caught by the race detector. 73 74 ** Example 1: Timer.Reset 75 76 The first example is a simplified version of an actual bug found by the race 77 detector. It uses a timer to print a message after a random duration between 0 78 and 1 second. It does so repeatedly for five seconds. 79 It uses [[http://golang.org/pkg/time/#AfterFunc][`time.AfterFunc`]] to create a 80 [[http://golang.org/pkg/time/#Timer][`Timer`]] for the first message and then 81 uses the [[http://golang.org/pkg/time/#Timer.Reset][`Reset`]] method to 82 schedule the next message, re-using the `Timer` each time. 83 84 .play -numbers race-detector/timer.go /func main/,$ 85 86 This looks like reasonable code, but under certain circumstances it fails in a surprising way: 87 88 panic: runtime error: invalid memory address or nil pointer dereference 89 [signal 0xb code=0x1 addr=0x8 pc=0x41e38a] 90 91 goroutine 4 [running]: 92 time.stopTimer(0x8, 0x12fe6b35d9472d96) 93 src/pkg/runtime/ztime_linux_amd64.c:35 +0x25 94 time.(*Timer).Reset(0x0, 0x4e5904f, 0x1) 95 src/pkg/time/sleep.go:81 +0x42 96 main.func·001() 97 race.go:14 +0xe3 98 created by time.goFunc 99 src/pkg/time/sleep.go:122 +0x48 100 101 What's going on here? Running the program with the race detector enabled is more illuminating: 102 103 ================== 104 WARNING: DATA RACE 105 Read by goroutine 5: 106 main.func·001() 107 race.go:14 +0x169 108 109 Previous write by goroutine 1: 110 main.main() 111 race.go:15 +0x174 112 113 Goroutine 5 (running) created at: 114 time.goFunc() 115 src/pkg/time/sleep.go:122 +0x56 116 timerproc() 117 src/pkg/runtime/ztime_linux_amd64.c:181 +0x189 118 ================== 119 120 The race detector shows the problem: an unsynchronized read and write of the 121 variable `t` from different goroutines. If the initial timer duration is very 122 small, the timer function may fire before the main goroutine has assigned a 123 value to `t` and so the call to `t.Reset` is made with a nil `t`. 124 125 To fix the race condition we change the code to read and write the variable 126 `t` only from the main goroutine: 127 128 .play -numbers race-detector/timer-fixed.go /func main/,/^}/ 129 130 Here the main goroutine is wholly responsible for setting and resetting the 131 `Timer` `t` and a new reset channel communicates the need to reset the timer in 132 a thread-safe way. 133 134 A simpler but less efficient approach is to 135 [[http://play.golang.org/p/kuWTrY0pS4][avoid reusing timers]]. 136 137 ** Example 2: ioutil.Discard 138 139 The second example is more subtle. 140 141 The `ioutil` package's 142 [[http://golang.org/pkg/io/ioutil/#Discard][`Discard`]] object implements 143 [[http://golang.org/pkg/io/#Writer][`io.Writer`]], 144 but discards all the data written to it. 145 Think of it like `/dev/null`: a place to send data that you need to read but 146 don't want to store. 147 It is commonly used with [[http://golang.org/pkg/io/#Copy][`io.Copy`]] 148 to drain a reader, like this: 149 150 io.Copy(ioutil.Discard, reader) 151 152 Back in July 2011 the Go team noticed that using `Discard` in this way was 153 inefficient: the `Copy` function allocates an internal 32 kB buffer each time it 154 is called, but when used with `Discard` the buffer is unnecessary since we're 155 just throwing the read data away. 156 We thought that this idiomatic use of `Copy` and `Discard` should not be so costly. 157 158 The fix was simple. 159 If the given `Writer` implements a `ReadFrom` method, a `Copy` call like this: 160 161 io.Copy(writer, reader) 162 163 is delegated to this potentially more efficient call: 164 165 writer.ReadFrom(reader) 166 167 We 168 [[https://code.google.com/p/go/source/detail?r=13faa632ba3a#][added a ReadFrom method]] 169 to Discard's underlying type, which has an internal buffer that is shared 170 between all its users. 171 We knew this was theoretically a race condition, but since all writes to the 172 buffer should be thrown away we didn't think it was important. 173 174 When the race detector was implemented it immediately 175 [[https://code.google.com/p/go/issues/detail?id=3970][flagged this code]] as racy. 176 Again, we considered that the code might be problematic, but decided that the 177 race condition wasn't "real". 178 To avoid the "false positive" in our build we implemented 179 [[https://code.google.com/p/go/source/detail?r=1e55cf10aa4f][a non-racy version]] 180 that is enabled only when the race detector is running. 181 182 But a few months later [[http://bradfitz.com/][Brad]] encountered a 183 [[https://code.google.com/p/go/issues/detail?id=4589][frustrating and strange bug]]. 184 After a few days of debugging, he narrowed it down to a real race condition 185 caused by `ioutil.Discard`. 186 187 Here is the known-racy code in `io/ioutil`, where `Discard` is a 188 `devNull` that shares a single buffer between all of its users. 189 190 .code race-detector/blackhole.go /var blackHole/,/^}/ 191 192 Brad's program includes a `trackDigestReader` type, which wraps an `io.Reader` 193 and records the hash digest of what it reads. 194 195 type trackDigestReader struct { 196 r io.Reader 197 h hash.Hash 198 } 199 200 func (t trackDigestReader) Read(p []byte) (n int, err error) { 201 n, err = t.r.Read(p) 202 t.h.Write(p[:n]) 203 return 204 } 205 206 For example, it could be used to compute the SHA-1 hash of a file while reading it: 207 208 tdr := trackDigestReader{r: file, h: sha1.New()} 209 io.Copy(writer, tdr) 210 fmt.Printf("File hash: %x", tdr.h.Sum(nil)) 211 212 In some cases there would be nowhere to write the data—but still a need to hash 213 the file—and so `Discard` would be used: 214 215 io.Copy(ioutil.Discard, tdr) 216 217 But in this case the `blackHole` buffer isn't just a black hole; it is a 218 legitimate place to store the data between reading it from the source 219 `io.Reader` and writing it to the `hash.Hash`. 220 With multiple goroutines hashing files simultaneously, each sharing the same 221 `blackHole` buffer, the race condition manifested itself by corrupting the data 222 between reading and hashing. 223 No errors or panics occurred, but the hashes were wrong. Nasty! 224 225 func (t trackDigestReader) Read(p []byte) (n int, err error) { 226 // the buffer p is blackHole 227 n, err = t.r.Read(p) 228 // p may be corrupted by another goroutine here, 229 // between the Read above and the Write below 230 t.h.Write(p[:n]) 231 return 232 } 233 234 The bug was finally 235 [[https://code.google.com/p/go/source/detail?r=4b61f121966b][fixed]] 236 by giving a unique buffer to each use of `ioutil.Discard`, eliminating the race 237 condition on the shared buffer. 238 239 * Conclusions 240 241 The race detector is a powerful tool for checking the correctness of concurrent 242 programs. It will not issue false positives, so take its warnings seriously. 243 But it is only as good as your tests; you must make sure they thoroughly 244 exercise the concurrent properties of your code so that the race detector can 245 do its job. 246 247 What are you waiting for? Run `"go`test`-race"` on your code today!