github.com/graybobo/golang.org-package-offline-cache@v0.0.0-20200626051047-6608995c132f/x/talks/2014/research2.slide (about)

     1  More Research Problems of Implementing Go
     2  
     3  Dmitry Vyukov
     4  Google
     5  
     6  http://golang.org/
     7  
     8  * About Go
     9  
    10  Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.
    11  
    12  Design began in late 2007.
    13  
    14  - Robert Griesemer, Rob Pike, Ken Thompson
    15  - Russ Cox, Ian Lance Taylor
    16  
    17  Became open source in November 2009.
    18  
    19  Developed entirely in the open; very active community.
    20  Language stable as of Go 1, early 2012.
    21  Work continues.
    22  
    23  * Motivation for Go
    24  
    25  .image research2/datacenter.jpg
    26  
    27  * Motivation for Go
    28  
    29  Started as an answer to software problems at Google:
    30  
    31  - multicore processors
    32  - networked systems
    33  - massive computation clusters
    34  - scale: 10⁷⁺ lines of code
    35  - scale: 10³⁺ programmers
    36  - scale: 10⁶⁺ machines (design point)
    37  
    38  Deployed: parts of YouTube, dl.google.com, Blogger, Google Code, Google Fiber, ...
    39  
    40  * Go
    41  
    42  A simple but powerful and fun language.
    43  
    44  - start with C, remove complex parts
    45  - add interfaces, concurrency
    46  - also: garbage collection, closures, reflection, strings, ...
    47  
    48  For more background on design:
    49  
    50  - [[http://commandcenter.blogspot.com/2012/06/less-is-exponentially-more.html][Less is exponentially more]]
    51  - [[http://talks.golang.org/2012/splash.article][Go at Google: Language Design in the Service of Software Engineering]]
    52  
    53  * Research and Go
    54  
    55  Go is designed for building production systems at Google.
    56  
    57  - Goal: make that job easier, faster, better.
    58  - Non-goal: break new ground in programming language research
    59  
    60  Plenty of research questions about how to implement Go well.
    61  
    62  - Concurrency
    63  - Scheduling
    64  - Garbage collection
    65  - Race and deadlock detection
    66  - Testing of the implementation
    67  
    68  
    69  
    70  
    71  * Concurrency
    72  
    73  .image research2/busy.jpg
    74  
    75  * Concurrency
    76  
    77  Go provides two important concepts:
    78  
    79  A goroutine is a thread of control within the program, with its own local variables and stack. Cheap, easy to create.
    80  
    81  A channel carries typed messages between goroutines.
    82  
    83  * Concurrency
    84  
    85  .play research2/hello.go
    86  
    87  * Concurrency: CSP
    88  
    89  Channels adopted from Hoare's Communicating Sequential Processes.
    90  
    91  - Orthogonal to rest of language
    92  - Can keep familiar model for computation
    93  - Focus on _composition_ of regular code
    94  
    95  Go _enables_ simple, safe concurrent programming.
    96  It doesn't _forbid_ bad programming.
    97  
    98  Caveat: not purely memory safe; sharing is legal.
    99  Passing a pointer over a channel is idiomatic.
   100  
   101  Experience shows this is practical.
   102  
   103  * Concurrency
   104  
   105  Sequential network address resolution, given a work list:
   106  
   107  .play research2/addr1.go /lookup/+1,/^}/-1
   108  
   109  * Concurrency
   110  
   111  Concurrent network address resolution, given a work list:
   112  
   113  .play research2/addr2.go /lookup/+1,/^}/-1
   114  
   115  * Concurrency
   116  
   117  Select statements: switch for communication.
   118  
   119  .play research2/select.go /select/,/^}/-1
   120  
   121  That's select that makes efficient implementation difficult.
   122  
   123  * Implementing Concurrency
   124  
   125  Challenge: Make channel communication scale
   126  
   127  - start with one global channel lock
   128  - per-channel locks, locked in address order for multi-channel operations
   129  
   130  Research question: lock-free channels?
   131  
   132  
   133  
   134  
   135  * Scheduling
   136  
   137  .image research2/gophercomplex6.jpg
   138  
   139  * Scheduling
   140  
   141  On the one hand we have arbitrary user programs:
   142  
   143  - fine-grained goroutines, coarse-grained goroutines or a mix of both
   144  - computational goroutines, IO-bound goroutines or a mix of both
   145  - arbitrary dynamic communication patterns
   146  - busy, idle, bursty programs
   147  
   148  No user hints!
   149  
   150  * Scheduling
   151  
   152  On the other hand we have complex hardware topology:
   153  
   154  - per-core caches
   155  - caches shared between cores
   156  - cores shared between hyper threads (HT)
   157  - multiple processors with non-uniform memory access (NUMA)
   158  
   159  * Scheduling
   160  
   161  Challenge: make it all magically work efficiently
   162  
   163  - start with one global lock for all scheduler state
   164  - distributed work-stealing scheduler with per-"processor" state
   165  - integrated network poller into scheduler
   166  - lock-free work queues
   167  
   168  * Scheduling
   169  
   170  Current scheduler:
   171  
   172   ┌─┐         ┌─┐         ┌─┐         ┌─┐                  ┌─┐
   173   │ │         │ │         │ │         │ │                  │ │
   174   ├─┤         ├─┤         ├─┤         ├─┤                  ├─┤ Global
   175   │ │         │G│         │ │         │ │                  │ │ state
   176   ├─┤         ├─┤         ├─┤         ├─┤                  ├─┤
   177   │G│         │G│         │G│         │ │                  │G│
   178   ├─┤         ├─┤         ├─┤         ├─┤                  ├─┤
   179   │G│         │G│         │G│         │G│                  │G│
   180   └┬┘         └┬┘         └┬┘         └┬┘                  └─┘
   181    │           │           │           │
   182    ↓           ↓           ↓           ↓
   183   ┌─┬──────┐  ┌─┬──────┐  ┌─┬──────┐  ┌─┬──────┐     ┌────┐┌──────┐┌───────┐
   184   │P│mcache│  │P│mcache│  │P│mcache│  │P│mcache│     │heap││timers││netpoll│
   185   └┬┴──────┘  └┬┴──────┘  └┬┴──────┘  └┬┴──────┘     └────┘└──────┘└───────┘
   186    │           │           │           │
   187    ↓           ↓           ↓           ↓
   188   ┌─┐         ┌─┐         ┌─┐         ┌─┐               ┌─┐ ┌─┐ ┌─┐
   189   │M│         │M│         │M│         │M│               │M│ │M│ │M│
   190   └─┘         └─┘         └─┘         └─┘               └─┘ └─┘ └─┘
   191  
   192  G - goroutine; P - logical processor; M - OS thread (machine)
   193  
   194  * Scheduling
   195  
   196  Want:
   197  
   198  - temporal locality to exploit caches
   199  - spatial locality to exploit NUMA
   200  - schedule mostly LIFO but ensure weak fairness
   201  - allocate local memory and stacks
   202  - scan local memory in GC
   203  - collocate communicating goroutines
   204  - distribute non-communicating goroutines
   205  - distribute timers and network poller
   206  - poll network on the same core where last read was issued
   207  
   208  
   209  
   210  
   211  
   212  * Garbage Collection
   213  
   214  * Garbage Collection
   215  
   216  Garbage collection simplifies APIs.
   217  
   218  - In C and C++, too much API design (and too much programming effort!) is about memory management.
   219  
   220  Fundamental to concurrency: too hard to track ownership otherwise.
   221  
   222  Fundamental to interfaces: memory management details do not bifurcate otherwise-similar APIs.
   223  
   224  Of course, adds cost, latency, complexity in run time system.
   225  
   226  * Garbage Collection
   227  
   228  Plenty of research about garbage collection, mostly in Java context.
   229  
   230  - Parallel stop-the-world
   231  - CMS: concurrent mark-and-sweep, stop-the-world compaction
   232  - G1: region-based incremental copying collector
   233  
   234  Java collectors usually:
   235  
   236  - are generational/incremental because allocation rate is high
   237  - compact memory to support generations
   238  - have pauses because concurrent compaction is tricky and slow
   239  
   240  * Garbage Collection
   241  
   242  But Go is very different!
   243  
   244  - User can avoid lots of allocations by embedding objects:
   245  
   246  	type Point struct {
   247  		X, Y int
   248  	}
   249  	type Rectangle struct {
   250  		Min, Max Point
   251  	}
   252  
   253  - Less pointers.
   254  - Lots of stack allocations.
   255  - Interior pointers are allowed:
   256  
   257  	p := &rect.Max
   258  
   259  - Hundreds of thousands of stacks (goroutines)
   260  - No object headers so far
   261  
   262  * Implementing Garbage Collection
   263  
   264  Current GC: stop the world, parallel mark, start the world, concurrent sweep.
   265  Concurrent mark is almost ready.
   266  
   267  Cannot reuse Java GC algorithms directly.
   268  
   269  Research question: what GC algorithm is the best fit for Go?
   270  Do we need generations? Do we need compaction? What are efficient data structures that support interior pointers?
   271  
   272  
   273  
   274  
   275  * Race and deadlock detection
   276  
   277  .image research2/race.png 160 600
   278  
   279  * Race detection
   280  
   281  Based on ThreadSanitizer runtime, originally mainly targeted C/C++.
   282  Traditional happens-before race detector based on vector clocks (devil in details!).
   283  Works fine for Go, except:
   284  
   285   $ go run -race lots_of_goroutines.go
   286   race: limit on 8192 simultaneously alive goroutines is exceeded, dying
   287  
   288  Research question: race dectector that efficiently supports hundreds of thousands of goroutines?
   289  
   290  * Deadlock detection
   291  
   292  Deadlock on mutexes due to lock order inversion:
   293  
   294   // thread 1                       // thread 2
   295   pthread_mutex_lock(&m1);          pthread_mutex_lock(&m2);
   296   pthread_mutex_lock(&m2);          pthread_mutex_lock(&m1);
   297   ...                               ...
   298   pthread_mutex_unlock(&m2);        pthread_mutex_unlock(&m1);
   299   pthread_mutex_unlock(&m1);        pthread_mutex_unlock(&m2);
   300  
   301  Lock order inversions are easy to detect:
   302  
   303  - build "M1 is locked under M2" relation.
   304  - if it becomes cyclic, there is a potential deadlock.
   305  - whenever a new edge is added to the graph, do DFS to find cycles.
   306  
   307  * Deadlock detection
   308  
   309  Go has channels and mutexes. Channels are semaphores. A mutex can be unlocked in
   310  a different goroutine, so it is essentially a binary semaphore too.
   311  
   312  Deadlock example:
   313  
   314  	// Parallel file tree walk.
   315  	func worker(pendingItems chan os.FileInfo)
   316  		for f := range pendingItems {
   317  			if f.IsDir() {
   318  				filepath.Walk(f.Name(), func(path string, info os.FileInfo, err error) error {
   319  					pendingItems <- info
   320  				})
   321  			} else {
   322  				visit(f)
   323  			}
   324  		}
   325  	}
   326  
   327  pendingItems channel has limited capacity. All workers can block on send to pendingItems.
   328  
   329  * Deadlock detection
   330  
   331  Another deadlock example:
   332  
   333   var (
   334   	c = make(chan T, 100)
   335   	mtx sync.RWMutex
   336    )
   337   
   338   // goroutine 1      // goroutine 2         // goroutine 3
   339   // does send        // does receive        // "resizes" the channel
   340   mtx.RLock()         mtx.RLock()            mtx.Lock()
   341   c <- v              v := <-c               tmp := make(chan T, 200)
   342   mtx.RUnlock()       mtx.RUnlock()          copyAll(c, tmp)
   343                                              c = tmp
   344                                              mtx.Unlock()
   345  
   346  RWMutex is fair for both readers and writers: when a writer arrives, new readers are not let to enter the critical section.
   347  Goroutine 1 blocks on chan send; then goroutine 3 blocks on mtx.Lock; then goroutine 2 blocks on mtx.RLock.
   348  
   349  * Deadlock detection
   350  
   351  Research question: how to detect deadlocks on semaphores?
   352  
   353  No known theory to date.
   354  
   355  
   356  
   357  
   358  * Testing of the implementation
   359  
   360  .image research2/gopherswrench.jpg 240 405
   361  
   362  * Testing of the implementation
   363  
   364  So now we have a new language with several complex implementations:
   365  
   366  - lexer
   367  - parser
   368  - transformation and optimization passes
   369  - code generation
   370  - linker
   371  - channel and map operations
   372  - garbage collector
   373  - ...
   374  
   375  *How*do*you*test*it?*
   376  
   377  * Testing of the implementation
   378  
   379  Csmith is a tool that generates random C programs that statically and dynamically conform to the C99 standard.
   380  
   381  .image research2/csmith.png
   382  
   383  * Testing of the implementation
   384  
   385  Gosmith is a tool that generates random Go programs that statically and dynamically conform to the Go standard.
   386  
   387  Turned out to be much simpler than C: no undefined behavior all around!
   388  
   389  - no unitialized variables
   390  - no concurrent mutations between sequence points (x[i++] = --i)
   391  - no UB during signed overflow
   392  - total 191 kinds of undefined behavior and 52 kinds of unspecified behavior in C
   393  
   394  * Testing of the implementation
   395  
   396  But generates uninteresting programs from execution point of view: most of them deadlock or crash on nil deref.
   397  
   398  Trophies:
   399  
   400  - 31 bugs in gc compiler
   401  - 18 bugs in gccgo compiler
   402  - 5 bugs in llgo compiler
   403  - 1 bug in gofmt
   404  - 3 bugs in the spec
   405  
   406  .image research2/emoji.png
   407  
   408  * Testing of the implementation
   409  
   410  Research question: how to generate random *interesting*concurrent* Go programs?
   411  
   412  Must:
   413  
   414  - create and wait for goroutines
   415  - communicate over channels
   416  - protect data with mutexes (reader-writer)
   417  - pass data ownership between goroutines (explicitly and implicitly)
   418  
   419  Must not:
   420  
   421  - deadlock
   422  - cause data races
   423  - have non-deterministic results
   424  
   425  
   426  
   427  
   428  
   429  * Research and Go
   430  
   431  Plenty of research questions about how to implement Go well.
   432  
   433  - Concurrency
   434  - Scheduling
   435  - Garbage collection
   436  - Race and deadlock detection
   437  - Testing of the implementation
   438  - [Polymorphism]
   439  - [Program translation]
   440