github.com/xushiwei/go@v0.0.0-20130601165731-2b9d83f45bc9/doc/articles/race_detector.html

github.com/xushiwei/go@v0.0.0-20130601165731-2b9d83f45bc9/doc/articles/race_detector.html (about)

     1  <!--{
     2  	"Title": "Data Race Detector",
     3  	"Template": true
     4  }-->
     5  
     6  <h2 id="Introduction">Introduction</h2>
     7  
     8  <p>
     9  Data races are among the most common and hardest to debug types of bugs in concurrent systems.
    10  A data race occurs when two goroutines access the same variable concurrently and at least one of the accesses is a write.
    11  See the <a href="/ref/mem/">The Go Memory Model</a> for details.
    12  </p>
    13  
    14  <p>
    15  Here is an example of a data race that can lead to crashes and memory corruption:
    16  </p>
    17  
    18  <pre>
    19  func main() {
    20  	c := make(chan bool)
    21  	m := make(map[string]string)
    22  	go func() {
    23  		m["1"] = "a" // First conflicting access.
    24  		c &lt;- true
    25  	}()
    26  	m["2"] = "b" // Second conflicting access.
    27  	&lt;-c
    28  	for k, v := range m {
    29  		fmt.Println(k, v)
    30  	}
    31  }
    32  </pre>
    33  
    34  <h2 id="Usage">Usage</h2>
    35  
    36  <p>
    37  To help diagnose such bugs, Go includes a built-in data race detector.
    38  To use it, add the <code>-race</code> flag to the go command:
    39  </p>
    40  
    41  <pre>
    42  $ go test -race mypkg    // to test the package
    43  $ go run -race mysrc.go  // to run the source file
    44  $ go build -race mycmd   // to build the command
    45  $ go install -race mypkg // to install the package
    46  </pre>
    47  
    48  <h2 id="Report_Format">Report Format</h2>
    49  
    50  <p>
    51  When the race detector finds a data race in the program, it prints a report.
    52  The report contains stack traces for conflicting accesses, as well as stacks where the involved goroutines were created.
    53  Here is an example:
    54  </p>
    55  
    56  <pre>
    57  WARNING: DATA RACE
    58  Read by goroutine 185:
    59    net.(*pollServer).AddFD()
    60        src/pkg/net/fd_unix.go:89 +0x398
    61    net.(*pollServer).WaitWrite()
    62        src/pkg/net/fd_unix.go:247 +0x45
    63    net.(*netFD).Write()
    64        src/pkg/net/fd_unix.go:540 +0x4d4
    65    net.(*conn).Write()
    66        src/pkg/net/net.go:129 +0x101
    67    net.func·060()
    68        src/pkg/net/timeout_test.go:603 +0xaf
    69  
    70  Previous write by goroutine 184:
    71    net.setWriteDeadline()
    72        src/pkg/net/sockopt_posix.go:135 +0xdf
    73    net.setDeadline()
    74        src/pkg/net/sockopt_posix.go:144 +0x9c
    75    net.(*conn).SetDeadline()
    76        src/pkg/net/net.go:161 +0xe3
    77    net.func·061()
    78        src/pkg/net/timeout_test.go:616 +0x3ed
    79  
    80  Goroutine 185 (running) created at:
    81    net.func·061()
    82        src/pkg/net/timeout_test.go:609 +0x288
    83  
    84  Goroutine 184 (running) created at:
    85    net.TestProlongTimeout()
    86        src/pkg/net/timeout_test.go:618 +0x298
    87    testing.tRunner()
    88        src/pkg/testing/testing.go:301 +0xe8
    89  </pre>
    90  
    91  <h2 id="Options">Options</h2>
    92  
    93  <p>
    94  The <code>GORACE</code> environment variable sets race detector options.
    95  The format is:
    96  </p>
    97  
    98  <pre>
    99  GORACE="option1=val1 option2=val2"
   100  </pre>
   101  
   102  <p>
   103  The options are:
   104  </p>
   105  
   106  <ul>
   107  <li>
   108  <code>log_path</code> (default <code>stderr</code>): The race detector writes
   109  its report to a file named <code>log_path.<em>pid</em></code>.
   110  The special names <code>stdout</code>
   111  and <code>stderr</code> cause reports to be written to standard output and
   112  standard error, respectively.
   113  </li>
   114  
   115  <li>
   116  <code>exitcode</code> (default <code>66</code>): The exit status to use when
   117  exiting after a detected race.
   118  </li>
   119  
   120  <li>
   121  <code>strip_path_prefix</code> (default <code>""</code>): Strip this prefix
   122  from all reported file paths, to make reports more concise.
   123  </li>
   124  
   125  <li>
   126  <code>history_size</code> (default <code>1</code>): The per-goroutine memory
   127  access history is <code>32K * 2**history_size elements</code>.
   128  Increasing this value can avoid a "failed to restore the stack" error in reports, at the
   129  cost of increased memory usage.
   130  </li>
   131  </ul>
   132  
   133  <p>
   134  Example:
   135  </p>
   136  
   137  <pre>
   138  $ GORACE="log_path=/tmp/race/report strip_path_prefix=/my/go/sources/" go test -race
   139  </pre>
   140  
   141  <h2 id="Excluding_Tests">Excluding Tests</h2>
   142  
   143  <p>
   144  When you build with <code>-race</code> flag, the <code>go</code> command defines additional
   145  <a href="/pkg/go/build/#hdr-Build_Constraints">build tag</a> <code>race</code>.
   146  You can use the tag to exclude some code and tests when running the race detector.
   147  Some examples:
   148  </p>
   149  
   150  <pre>
   151  // +build !race
   152  
   153  package foo
   154  
   155  // The test contains a data race. See issue 123.
   156  func TestFoo(t *testing.T) {
   157  	// ...
   158  }
   159  
   160  // The test fails under the race detector due to timeouts.
   161  func TestBar(t *testing.T) {
   162  	// ...
   163  }
   164  
   165  // The test takes too long under the race detector.
   166  func TestBaz(t *testing.T) {
   167  	// ...
   168  }
   169  </pre>
   170  
   171  <h2 id="How_To_Use">How To Use</h2>
   172  
   173  <p>
   174  To start, run your tests using the race detector (<code>go test -race</code>).
   175  The race detector only finds races that happen at runtime, so it can't find
   176  races in code paths that are not executed.
   177  If your tests have incomplete coverage,
   178  you may find more races by running a binary built with <code>-race</code> under a realistic
   179  workload.
   180  </p>
   181  
   182  <h2 id="Typical_Data_Races">Typical Data Races</h2>
   183  
   184  <p>
   185  Here are some typical data races.  All of them can be detected with the race detector.
   186  </p>
   187  
   188  <h3 id="Race_on_loop_counter">Race on loop counter</h3>
   189  
   190  <pre>
   191  func main() {
   192  	var wg sync.WaitGroup
   193  	wg.Add(5)
   194  	for i := 0; i < 5; i++ {
   195  		go func() {
   196  			fmt.Println(i) // Not the 'i' you are looking for.
   197  			wg.Done()
   198  		}()
   199  	}
   200  	wg.Wait()
   201  }
   202  </pre>
   203  
   204  <p>
   205  The variable <code>i</code> in the function literal is the same variable used by the loop, so
   206  the read in the goroutine races with the loop increment.
   207  (This program typically prints 55555, not 01234.)
   208  The program can be fixed by making a copy of the variable:
   209  </p>
   210  
   211  <pre>
   212  func main() {
   213  	var wg sync.WaitGroup
   214  	wg.Add(5)
   215  	for i := 0; i < 5; i++ {
   216  		go func(j int) {
   217  			fmt.Println(j) // Good. Read local copy of the loop counter.
   218  			wg.Done()
   219  		}(i)
   220  	}
   221  	wg.Wait()
   222  }
   223  </pre>
   224  
   225  <h3 id="Accidentally_shared_variable">Accidentally shared variable</h3>
   226  
   227  <pre>
   228  // ParallelWrite writes data to file1 and file2, returns the errors.
   229  func ParallelWrite(data []byte) chan error {
   230  	res := make(chan error, 2)
   231  	f1, err := os.Create("file1")
   232  	if err != nil {
   233  		res &lt;- err
   234  	} else {
   235  		go func() {
   236  			// This err is shared with the main goroutine,
   237  			// so the write races with the write below.
   238  			_, err = f1.Write(data)
   239  			res &lt;- err
   240  			f1.Close()
   241  		}()
   242  	}
   243  	f2, err := os.Create("file2") // The second conflicting write to err.
   244  	if err != nil {
   245  		res &lt;- err
   246  	} else {
   247  		go func() {
   248  			_, err = f2.Write(data)
   249  			res &lt;- err
   250  			f2.Close()
   251  		}()
   252  	}
   253  	return res
   254  }
   255  </pre>
   256  
   257  <p>
   258  The fix is to introduce new variables in the goroutines (note the use of <code>:=</code>):
   259  </p>
   260  
   261  <pre>
   262  			...
   263  			_, err := f1.Write(data)
   264  			...
   265  			_, err := f2.Write(data)
   266  			...
   267  </pre>
   268  
   269  <h3 id="Unprotected_global_variable">Unprotected global variable</h3>
   270  
   271  <p>
   272  If the following code is called from several goroutines, it leads to races on the <code>service</code> map.
   273  Concurrent reads and writes of the same map are not safe:
   274  </p>
   275  
   276  <pre>
   277  var service map[string]net.Addr
   278  
   279  func RegisterService(name string, addr net.Addr) {
   280  	service[name] = addr
   281  }
   282  
   283  func LookupService(name string) net.Addr {
   284  	return service[name]
   285  }
   286  </pre>
   287  
   288  <p>
   289  To make the code safe, protect the accesses with a mutex:
   290  </p>
   291  
   292  <pre>
   293  var (
   294  	service   map[string]net.Addr
   295  	serviceMu sync.Mutex
   296  )
   297  
   298  func RegisterService(name string, addr net.Addr) {
   299  	serviceMu.Lock()
   300  	defer serviceMu.Unlock()
   301  	service[name] = addr
   302  }
   303  
   304  func LookupService(name string) net.Addr {
   305  	serviceMu.Lock()
   306  	defer serviceMu.Unlock()
   307  	return service[name]
   308  }
   309  </pre>
   310  
   311  <h3 id="Primitive_unprotected_variable">Primitive unprotected variable</h3>
   312  
   313  <p>
   314  Data races can happen on variables of primitive types as well (<code>bool</code>, <code>int</code>, <code>int64</code>, etc.),
   315  as in this example:
   316  </p>
   317  
   318  <pre>
   319  type Watchdog struct{ last int64 }
   320  
   321  func (w *Watchdog) KeepAlive() {
   322  	w.last = time.Now().UnixNano() // First conflicting access.
   323  }
   324  
   325  func (w *Watchdog) Start() {
   326  	go func() {
   327  		for {
   328  			time.Sleep(time.Second)
   329  			// Second conflicting access.
   330  			if w.last < time.Now().Add(-10*time.Second).UnixNano() {
   331  				fmt.Println("No keepalives for 10 seconds. Dying.")
   332  				os.Exit(1)
   333  			}
   334  		}
   335  	}()
   336  }
   337  </pre>
   338  
   339  <p>
   340  Even such "innocent" data races can lead to hard-to-debug problems caused by
   341  non-atomicity of the memory accesses,
   342  interference with compiler optimizations,
   343  or reordering issues accessing processor memory .
   344  </p>
   345  
   346  <p>
   347  A typical fix for this race is to use a channel or a mutex.
   348  To preserve the lock-free behavior, one can also use the
   349  <a href="/pkg/sync/atomic/"><code>sync/atomic</code></a> package.
   350  </p>
   351  
   352  <pre>
   353  type Watchdog struct{ last int64 }
   354  
   355  func (w *Watchdog) KeepAlive() {
   356  	atomic.StoreInt64(&amp;w.last, time.Now().UnixNano())
   357  }
   358  
   359  func (w *Watchdog) Start() {
   360  	go func() {
   361  		for {
   362  			time.Sleep(time.Second)
   363  			if atomic.LoadInt64(&amp;w.last) < time.Now().Add(-10*time.Second).UnixNano() {
   364  				fmt.Println("No keepalives for 10 seconds. Dying.")
   365  				os.Exit(1)
   366  			}
   367  		}
   368  	}()
   369  }
   370  </pre>
   371  
   372  <h2 id="Supported_Systems">Supported Systems</h2>
   373  
   374  <p>
   375  The race detector runs on <code>darwin/amd64</code>, <code>linux/amd64</code>, and <code>windows/amd64</code>.
   376  </p>
   377  
   378  <h2 id="Runtime_Overheads">Runtime Overhead</h2>
   379  
   380  <p>
   381  The cost of race detection varies by program, but for a typical program, memory
   382  usage may increase by 5-10x and execution time by 2-20x.
   383  </p>