github.com/spotify/syslog-redirector-golang@v0.0.0-20140320174030-4859f03d829a/blog/content/race-detector.article (about)

     1  Introducing the Go Race Detector
     2  26 Jun 2013
     3  Tags: concurrency, technical
     4  
     5  Dmitry Vyukov
     6  
     7  Andrew Gerrand
     8  
     9  * Introduction
    10  
    11  [[http://en.wikipedia.org/wiki/Race_condition][Race conditions]] are among the
    12  most insidious and elusive programming errors. They typically cause erratic and
    13  mysterious failures, often long after the code has been deployed to production.
    14  While Go's concurrency mechanisms make it easy to write clean concurrent code,
    15  they don't prevent race conditions. Care, diligence, and testing are required.
    16  And tools can help.
    17  
    18  We're happy to announce that Go 1.1 includes a
    19  [[http://golang.org/doc/articles/race_detector.html][race detector]],
    20  a new tool for finding race conditions in Go code.
    21  It is currently available for Linux, OS X, and Windows systems
    22  with 64-bit x86 processors.
    23  
    24  The race detector is based on the C/C++
    25  [[https://code.google.com/p/thread-sanitizer/][ThreadSanitizer runtime library]],
    26  which has been used to detect many errors in Google's internal code base and in
    27  [[http://www.chromium.org/][Chromium]].
    28  The technology was integrated with Go in September 2012; since then it has detected
    29  [[https://code.google.com/p/go/issues/list?can=1&q=ThreadSanitizer][42 races]]
    30  in the standard library. It is now part of our continuous build process,
    31  where it continues to catch race conditions as they arise.
    32  
    33  * How it works
    34  
    35  The race detector is integrated with the go tool chain. When the
    36  `-race` command-line flag is set, the compiler instruments all memory accesses
    37  with code that records when and how the memory was accessed, while the runtime
    38  library watches for unsynchronized accesses to shared variables.
    39  When such "racy" behavior is detected, a warning is printed.
    40  (See [[https://code.google.com/p/thread-sanitizer/wiki/Algorithm][this article]]
    41  for the details of the algorithm.)
    42  
    43  Because of its design, the race detector can detect race conditions only when
    44  they are actually triggered by running code, which means it's important to run
    45  race-enabled binaries under realistic workloads.
    46  However, race-enabled binaries can use ten times the CPU and memory, so it is
    47  impractical to enable the race detector all the time.
    48  One way out of this dilemma is to run some tests with the race detector
    49  enabled. Load tests and integration tests are good candidates, since they tend
    50  to exercise concurrent parts of the code.
    51  Another approach using production workloads is to deploy a single race-enabled
    52  instance within a pool of running servers.
    53  
    54  * Using the race detector
    55  
    56  The race detector is fully integrated with the Go tool chain.
    57  To build your code with the race detector enabled, just add the
    58  `-race` flag to the command line:
    59  
    60  	$ go test -race mypkg    // test the package
    61  	$ go run -race mysrc.go  // compile and run the program
    62  	$ go build -race mycmd   // build the command
    63  	$ go install -race mypkg // install the package
    64  
    65  To try out the race detector for yourself, fetch and run this example program:
    66  
    67  	$ go get -race code.google.com/p/go.blog/support/racy
    68  	$ racy
    69  
    70  * Examples
    71  
    72  Here are two examples of real issues caught by the race detector.
    73  
    74  ** Example 1: Timer.Reset
    75  
    76  The first example is a simplified version of an actual bug found by the race
    77  detector. It uses a timer to print a message after a random duration between 0
    78  and 1 second. It does so repeatedly for five seconds.
    79  It uses [[http://golang.org/pkg/time/#AfterFunc][`time.AfterFunc`]] to create a
    80  [[http://golang.org/pkg/time/#Timer][`Timer`]] for the first message and then
    81  uses the [[http://golang.org/pkg/time/#Timer.Reset][`Reset`]] method to
    82  schedule the next message, re-using the `Timer` each time.
    83  
    84  .play -numbers race-detector/timer.go /func main/,$
    85  
    86  This looks like reasonable code, but under certain circumstances it fails in a surprising way:
    87  
    88  	panic: runtime error: invalid memory address or nil pointer dereference
    89  	[signal 0xb code=0x1 addr=0x8 pc=0x41e38a]
    90  
    91  	goroutine 4 [running]:
    92  	time.stopTimer(0x8, 0x12fe6b35d9472d96)
    93  		src/pkg/runtime/ztime_linux_amd64.c:35 +0x25
    94  	time.(*Timer).Reset(0x0, 0x4e5904f, 0x1)
    95  		src/pkg/time/sleep.go:81 +0x42
    96  	main.func·001()
    97  		race.go:14 +0xe3
    98  	created by time.goFunc
    99  		src/pkg/time/sleep.go:122 +0x48
   100  
   101  What's going on here? Running the program with the race detector enabled is more illuminating:
   102  
   103  	==================
   104  	WARNING: DATA RACE
   105  	Read by goroutine 5:
   106  	  main.func·001()
   107  	     race.go:14 +0x169
   108  
   109  	Previous write by goroutine 1:
   110  	  main.main()
   111  	      race.go:15 +0x174
   112  
   113  	Goroutine 5 (running) created at:
   114  	  time.goFunc()
   115  	      src/pkg/time/sleep.go:122 +0x56
   116  	  timerproc()
   117  	     src/pkg/runtime/ztime_linux_amd64.c:181 +0x189
   118  	==================
   119  
   120  The race detector shows the problem: an unsynchronized read and write of the
   121  variable `t` from different goroutines. If the initial timer duration is very
   122  small, the timer function may fire before the main goroutine has assigned a
   123  value to `t` and so the call to `t.Reset` is made with a nil `t`.
   124  
   125  To fix the race condition we change the code to read and write the variable
   126  `t` only from the main goroutine:
   127  
   128  .play -numbers race-detector/timer-fixed.go /func main/,/^}/
   129  
   130  Here the main goroutine is wholly responsible for setting and resetting the
   131  `Timer` `t` and a new reset channel communicates the need to reset the timer in
   132  a thread-safe way.
   133  
   134  A simpler but less efficient approach is to
   135  [[http://play.golang.org/p/kuWTrY0pS4][avoid reusing timers]].
   136  
   137  ** Example 2: ioutil.Discard
   138  
   139  The second example is more subtle.
   140  
   141  The `ioutil` package's
   142  [[http://golang.org/pkg/io/ioutil/#Discard][`Discard`]] object implements 
   143  [[http://golang.org/pkg/io/#Writer][`io.Writer`]],
   144  but discards all the data written to it.
   145  Think of it like `/dev/null`: a place to send data that you need to read but
   146  don't want to store.
   147  It is commonly used with [[http://golang.org/pkg/io/#Copy][`io.Copy`]]
   148  to drain a reader, like this:
   149  
   150  	io.Copy(ioutil.Discard, reader)
   151  
   152  Back in July 2011 the Go team noticed that using `Discard` in this way was
   153  inefficient: the `Copy` function allocates an internal 32 kB buffer each time it
   154  is called, but when used with `Discard` the buffer is unnecessary since we're
   155  just throwing the read data away.
   156  We thought that this idiomatic use of `Copy` and `Discard` should not be so costly.
   157  
   158  The fix was simple.
   159  If the given `Writer` implements a `ReadFrom` method, a `Copy` call like this:
   160  
   161  	io.Copy(writer, reader)
   162  
   163  is delegated to this potentially more efficient call:
   164  
   165  	writer.ReadFrom(reader)
   166  
   167  We
   168  [[https://code.google.com/p/go/source/detail?r=13faa632ba3a#][added a ReadFrom method]]
   169  to Discard's underlying type, which has an internal buffer that is shared
   170  between all its users.
   171  We knew this was theoretically a race condition, but since all writes to the
   172  buffer should be thrown away we didn't think it was important.
   173  
   174  When the race detector was implemented it immediately
   175  [[https://code.google.com/p/go/issues/detail?id=3970][flagged this code]] as racy.
   176  Again, we considered that the code might be problematic, but decided that the
   177  race condition wasn't "real".
   178  To avoid the "false positive" in our build we implemented
   179  [[https://code.google.com/p/go/source/detail?r=1e55cf10aa4f][a non-racy version]]
   180  that is enabled only when the race detector is running.
   181  
   182  But a few months later [[http://bradfitz.com/][Brad]] encountered a
   183  [[https://code.google.com/p/go/issues/detail?id=4589][frustrating and strange bug]].
   184  After a few days of debugging, he narrowed it down to a real race condition
   185  caused by `ioutil.Discard`.
   186  
   187  Here is the known-racy code in `io/ioutil`, where `Discard` is a
   188  `devNull` that shares a single buffer between all of its users.
   189  
   190  .code race-detector/blackhole.go /var blackHole/,/^}/
   191  
   192  Brad's program includes a `trackDigestReader` type, which wraps an `io.Reader`
   193  and records the hash digest of what it reads.
   194  
   195  	type trackDigestReader struct {
   196  		r io.Reader
   197  		h hash.Hash
   198  	}
   199  
   200  	func (t trackDigestReader) Read(p []byte) (n int, err error) {
   201  		n, err = t.r.Read(p)
   202  		t.h.Write(p[:n])
   203  		return
   204  	}
   205  
   206  For example, it could be used to compute the SHA-1 hash of a file while reading it:
   207  
   208  	tdr := trackDigestReader{r: file, h: sha1.New()}
   209  	io.Copy(writer, tdr)
   210  	fmt.Printf("File hash: %x", tdr.h.Sum(nil))
   211  
   212  In some cases there would be nowhere to write the data—but still a need to hash
   213  the file—and so `Discard` would be used:
   214  
   215  	io.Copy(ioutil.Discard, tdr)
   216  
   217  But in this case the `blackHole` buffer isn't just a black hole; it is a
   218  legitimate place to store the data between reading it from the source
   219  `io.Reader` and writing it to the `hash.Hash`.
   220  With multiple goroutines hashing files simultaneously, each sharing the same
   221  `blackHole` buffer, the race condition manifested itself by corrupting the data
   222  between reading and hashing.
   223  No errors or panics occurred, but the hashes were wrong. Nasty!
   224  
   225  	func (t trackDigestReader) Read(p []byte) (n int, err error) {
   226  		// the buffer p is blackHole
   227  		n, err = t.r.Read(p)
   228  		// p may be corrupted by another goroutine here,
   229  		// between the Read above and the Write below
   230  		t.h.Write(p[:n])
   231  		return
   232  	}
   233  
   234  The bug was finally
   235  [[https://code.google.com/p/go/source/detail?r=4b61f121966b][fixed]]
   236  by giving a unique buffer to each use of `ioutil.Discard`, eliminating the race
   237  condition on the shared buffer.
   238  
   239  * Conclusions
   240  
   241  The race detector is a powerful tool for checking the correctness of concurrent
   242  programs. It will not issue false positives, so take its warnings seriously.
   243  But it is only as good as your tests; you must make sure they thoroughly
   244  exercise the concurrent properties of your code so that the race detector can
   245  do its job.
   246  
   247  What are you waiting for? Run `"go`test`-race"` on your code today!