github.com/graybobo/golang.org-package-offline-cache@v0.0.0-20200626051047-6608995c132f/x/talks/2012/splash.article (about) 1 Go at Google: Language Design in the Service of Software Engineering 2 3 Rob Pike 4 Google, Inc. 5 @rob_pike 6 http://golang.org/s/plusrob 7 http://golang.org 8 9 * Abstract 10 11 (This is a modified version of the keynote talk given by Rob Pike 12 at the SPLASH 2012 conference in Tucson, Arizona, on October 25, 2012.) 13 14 The Go programming language was conceived in late 2007 as an answer to 15 some of the problems we were seeing developing software infrastructure 16 at Google. 17 The computing landscape today is almost unrelated to the environment 18 in which the languages being used, mostly C++, Java, and Python, had 19 been created. 20 The problems introduced by multicore processors, networked systems, 21 massive computation clusters, and the web programming model were being 22 worked around rather than addressed head-on. 23 Moreover, the scale has changed: today's server programs comprise tens 24 of millions of lines of code, are worked on by hundreds or even 25 thousands of programmers, and are updated literally every day. 26 To make matters worse, build times, even on large compilation 27 clusters, have stretched to many minutes, even hours. 28 29 Go was designed and developed to make working in this environment more 30 productive. 31 Besides its better-known aspects such as built-in concurrency and 32 garbage collection, Go's design considerations include rigorous 33 dependency management, the adaptability of software architecture as 34 systems grow, and robustness across the boundaries between components. 35 36 This article explains how these issues were addressed while building 37 an efficient, compiled programming language that feels lightweight and 38 pleasant. 39 Examples and explanations will be taken from the real-world problems 40 faced at Google. 41 42 * Introduction 43 44 Go is a compiled, concurrent, garbage-collected, statically typed language 45 developed at Google. 46 It is an open source project: Google 47 imports the public repository rather than the other way around. 48 49 Go is efficient, scalable, and productive. Some programmers find it fun 50 to work in; others find it unimaginative, even boring. 51 In this article we 52 will explain why those are not contradictory positions. 53 Go was designed to address the problems faced in software development 54 at Google, which led to a language that is not a breakthrough research language 55 but is nonetheless an excellent tool for engineering large software projects. 56 57 * Go at Google 58 59 Go is a programming language designed by Google to help solve Google's problems, and Google has big problems. 60 61 The hardware is big and the software is big. 62 There are many millions of lines of software, with servers mostly in C++ 63 and lots of Java and Python for the other pieces. 64 Thousands of engineers work on the code, 65 at the "head" of a single tree comprising all the software, 66 so from day to day there are significant changes to all levels of the tree. 67 A large 68 [[http://google-engtools.blogspot.com/2011/06/build-in-cloud-accessing-source-code.html][custom-designed distributed build system]] 69 makes development at this scale feasible, but it's still big. 70 71 And of course, all this software runs on zillions of machines, which are treated as a modest number of independent, networked compute clusters. 72 73 .image splash/datacenter.jpg 74 75 In short, development at Google is big, can be slow, and is often clumsy. But it _is_ effective. 76 77 The goals of the Go project were to eliminate the slowness and clumsiness of software development at Google, 78 and thereby to make the process more productive and scalable. 79 The language was designed by and for people who write—and read and debug and maintain—large software systems. 80 81 Go's purpose is therefore _not_ to do research into programming language design; 82 it is to improve the working environment for its designers and their coworkers. 83 Go is more about software engineering than programming language research. 84 Or to rephrase, it is about language design in the service of software engineering. 85 86 But how can a language help software engineering? 87 The rest of this article is an answer to that question. 88 89 * Pain points 90 91 When Go launched, some claimed it was missing particular features or methodologies that were regarded as _de_rigueur_ for a modern language. 92 How could Go be worthwhile in the absence of these facilities? 93 Our answer to that is that the properties Go _does_ have address the issues that make large-scale software development difficult. 94 These issues include: 95 96 - slow builds 97 - uncontrolled dependencies 98 - each programmer using a different subset of the language 99 - poor program understanding (code hard to read, poorly documented, and so on) 100 - duplication of effort 101 - cost of updates 102 - version skew 103 - difficulty of writing automatic tools 104 - cross-language builds 105 106 Individual features of a language don't address these issues. 107 A larger view of software engineering is required, and 108 in the design of Go we tried to focus on solutions to _these_ problems. 109 110 As a simple, self-contained example, consider the representation of program structure. 111 Some observers objected to Go's C-like block structure with braces, preferring the use of spaces for indentation, in the style of Python or Haskell. 112 However, we have had extensive experience tracking down build and test failures caused by cross-language builds where a Python snippet embedded in another language, 113 for instance through a SWIG invocation, 114 is subtly and _invisibly_ broken by a change in the indentation of the surrounding code. 115 Our position is therefore that, although spaces for indentation is nice for small programs, it doesn't scale well, 116 and the bigger and more heterogeneous the code base, the more trouble it can cause. 117 It is better to forgo convenience for safety and dependability, so Go has brace-bounded blocks. 118 119 * Dependencies in C and C++ 120 121 A more substantial illustration of scaling and other issues arises in the handling of package dependencies. 122 We begin the discussion with a review of how they work in C and C++. 123 124 ANSI C, first standardized in 1989, promoted the idea of `#ifndef` "guards" in the standard header files. 125 The idea, which is ubiquitous now, is that each header file be bracketed with a conditional compilation clause so that the file may be included multiple times without error. 126 For instance, the Unix header file `<sys/stat.h>` looks schematically like this: 127 128 /* Large copyright and licensing notice */ 129 #ifndef _SYS_STAT_H_ 130 #define _SYS_STAT_H_ 131 /* Types and other definitions */ 132 #endif 133 134 The intent is that the C preprocessor reads in the file but disregards the contents on 135 the second and subsequent 136 readings of the file. 137 The symbol `_SYS_STAT_H_`, defined the first time the file is read, "guards" the invocations that follow. 138 139 This design has some nice properties, most important that each header file can safely `#include` 140 all its dependencies, even if other header files will also include them. 141 If that rule is followed, it permits orderly code that, for instance, sorts the `#include` 142 clauses alphabetically. 143 144 But it scales very badly. 145 146 In 1984, a compilation of `ps.c`, the source to the Unix `ps` command, was observed 147 to `#include` `<sys/stat.h>` 37 times by the time all the preprocessing had been done. 148 Even though the contents are discarded 36 times while doing so, most C 149 implementations would open the file, read it, and scan it all 37 times. 150 Without great cleverness, in fact, that behavior is required by the potentially 151 complex macro semantics of the C preprocessor. 152 153 The effect on software is the gradual accumulation of `#include` clauses in C programs. 154 It won't break a program to add them, and it's very hard to know when they are no 155 longer needed. 156 Deleting a `#include` and compiling the program again isn't even sufficient to test that, 157 since another `#include` might itself contain a `#include` that pulls it in anyway. 158 159 Technically speaking, it does not have to be like that. 160 Realizing the long-term problems with the use of `#ifndef` guards, the designers 161 of the Plan 9 libraries took a different, non-ANSI-standard approach. 162 In Plan 9, header files were forbidden from containing further `#include` clauses; all 163 `#includes` were required to be in the top-level C file. 164 This required some discipline, of course—the programmer was required to list 165 the necessary dependencies exactly once, in the correct order—but documentation 166 helped and in practice it worked very well. 167 The result was that, no matter how many dependencies a C source file had, 168 each `#include` file was read exactly once when compiling that file. 169 And, of course, it was also easy to see if an `#include` was necessary by taking 170 it out: the edited program would compile if and only if the dependency was unnecessary. 171 172 The most important result of the Plan 9 approach was much faster compilation: the amount of 173 I/O the compilation requires can be dramatically less than when compiling a program 174 using libraries with `#ifndef` guards. 175 176 Outside of Plan 9, though, the "guarded" approach is accepted practice for C and C++. 177 In fact, C++ exacerbates the problem by using the same approach at finer granularity. 178 By convention, C++ programs are usually structured with one header file per class, or perhaps 179 small set of related classes, a grouping much smaller than, say, `<stdio.h>`. 180 The dependency tree is therefore much more intricate, reflecting not library dependencies but the full type hierarchy. 181 Moreover, C++ header files usually contain real code—type, method, and template 182 declarations—not just the simple constants and function signatures typical of a C header file. 183 Thus not only does C++ push more to the compiler, what it pushes is harder to compile, 184 and each invocation of the compiler must reprocess this information. 185 When building a large C++ binary, the compiler might be taught thousands of times how to 186 represent a string by processing the header file `<string>`. 187 (For the record, around 1984 Tom Cargill observed that the use of the 188 C preprocessor for dependency management would be a long-term liability for C++ and 189 should be addressed.) 190 191 The construction of a single C++ binary at Google can open and read hundreds of individual header files 192 tens of thousands of times. 193 In 2007, build engineers at Google instrumented the compilation of a major Google binary. 194 The file contained about two thousand files that, if simply concatenated together, totaled 4.2 megabytes. 195 By the time the `#includes` had been expanded, over 8 gigabytes were being delivered to the input of the compiler, a blow-up of 2000 bytes for every C++ source byte. 196 197 As another data point, in 2003 Google's build system was moved from a single Makefile to a per-directory design 198 with better-managed, more explicit dependencies. 199 A typical binary shrank about 40% in file size, just from having more accurate dependencies recorded. 200 Even so, the properties of C++ (or C for that matter) make it impractical to verify those dependencies automatically, 201 and today we still do not have an accurate understanding of the dependency requirements 202 of large Google C++ binaries. 203 204 The consequence of these uncontrolled dependencies and massive scale is that it is 205 impractical to build Google server binaries on a single computer, so 206 a large distributed compilation system was created. 207 With this system, involving many machines, much caching, and 208 much complexity (the build system is a large program in its own right), builds at 209 Google are practical, if still cumbersome. 210 211 Even with the distributed build system, a large Google build can still take many minutes. 212 That 2007 binary took 45 minutes using a precursor distributed build system; today's 213 version of the same program takes 27 minutes, but of course the program and its 214 dependencies have grown in the interim. 215 The engineering effort required to scale up the build system has barely been able 216 to stay ahead of the growth of the software it is constructing. 217 218 * Enter Go 219 220 When builds are slow, there is time to think. 221 The origin myth for Go states that it was during one of those 45 minute builds 222 that Go was conceived. It was believed to be worth trying to design a new language 223 suitable for writing large Google programs such as web servers, 224 with software engineering considerations that would improve the quality 225 of life of Google programmers. 226 227 Although the discussion so far has focused on dependencies, 228 there are many other issues that need attention. 229 The primary considerations for any language to succeed in this context are: 230 231 - It must work at scale, for large programs with large numbers of dependencies, with large teams of programmers working on them. 232 233 - It must be familiar, roughly C-like. Programmers working at Google are early in their careers and are most familiar with procedural languages, particularly from the C family. The need to get programmers productive quickly in a new language means that the language cannot be too radical. 234 235 - It must be modern. C, C++, and to some extent Java are quite old, designed before the advent of multicore machines, networking, and web application development. There are features of the modern world that are better met by newer approaches, such as built-in concurrency. 236 237 With that background, then, let us look at the design of Go from a software engineering perspective. 238 239 * Dependencies in Go 240 241 Since we've taken a detailed look at dependencies in C and C++, a good place to start 242 our tour is to see how Go handles them. 243 Dependencies are defined, syntactically and semantically, by the language. 244 They are explicit, clear, and "computable", which is to say, easy to write tools to analyze. 245 246 The syntax is that, after the `package` clause (the subject of the next section), 247 each source file may have one or more import statements, comprising the 248 `import` keyword and a string constant identifying the package to be imported 249 into this source file (only): 250 251 import "encoding/json" 252 253 The first step to making Go scale, dependency-wise, is that the _language_ defines 254 that unused dependencies are a compile-time error (not a warning, an _error_). 255 If the source file imports a package it does not use, the program will not compile. 256 This guarantees by construction that the dependency tree for any Go program 257 is precise, that it has no extraneous edges. That, in turn, guarantees that no 258 extra code will be compiled when building the program, which minimizes 259 compilation time. 260 261 There's another step, this time in the implementation of the compilers, that 262 goes even further to guarantee efficiency. 263 Consider a Go program with three packages and this dependency graph: 264 265 - package `A` imports package `B`; 266 - package `B` imports package `C`; 267 - package `A` does _not_ import package `C` 268 269 This means that package `A` uses `C` only transitively through its use of `B`; 270 that is, no identifiers from `C` are mentioned in the source code to `A`, 271 even if some of the items `A` is using from `B` do mention `C`. 272 For instance, package `A` might reference a `struct` type defined in `B` that has a field with 273 a type defined in `C` but that `A` does not reference itself. 274 As a motivating example, imagine that `A` imports a formatted I/O package 275 `B` that uses a buffered I/O implementation provided by `C`, but that `A` does 276 not itself invoke buffered I/O. 277 278 To build this program, first, `C` is compiled; 279 dependent packages must be built before the packages that depend on them. 280 Then `B` is compiled; finally `A` is compiled, and then the program can be linked. 281 282 When `A` is compiled, the compiler reads the object file for `B`, not its source code. 283 That object file for `B` contains all the type information necessary for the compiler 284 to execute the 285 286 import "B" 287 288 clause in the source code for `A`. That information includes whatever information 289 about `C` that clients of `B` will need at compile time. 290 In other words, when `B` is compiled, the generated object file includes type 291 information for all dependencies of `B` that affect the public interface of `B`. 292 293 This design has the important 294 effect that when the compiler executes an import clause, 295 _it_opens_exactly_one_file_, the object file identified by the string in the import clause. 296 This is, of course, reminiscent of the Plan 9 C (as opposed to ANSI C) 297 approach to dependency management, except that, in effect, the compiler 298 writes the header file when the Go source file is compiled. 299 The process is more automatic and even 300 more efficient than in Plan 9 C, though: the data being read when evaluating the import is just 301 "exported" data, not general program source code. The effect on overall 302 compilation time can be huge, and scales well as 303 the code base grows. The time to execute the dependency graph, and 304 hence to compile, can be exponentially less than in the "include of 305 include file" model of C and C++. 306 307 It's worth mentioning that this general approach to dependency management 308 is not original; the ideas go back to the 1970s and flow through languages like 309 Modula-2 and Ada. In the C family Java has elements of this approach. 310 311 To make compilation even more efficient, the object file is arranged so the export 312 data is the first thing in the file, so the compiler can stop reading as soon 313 as it reaches the end of that section. 314 315 This approach to dependency management is the single biggest reason 316 why Go compilations are faster than C or C++ compilations. 317 Another factor is that Go places the export data in the object file; some 318 languages require the author to write or the compiler to 319 generate a second file with that information. That's twice as many files 320 to open. In Go there is only one file to open to import a package. 321 Also, the single file approach means that the export data (or header 322 file, in C/C++) can never go out of date relative to the object file. 323 324 For the record, we measured the compilation of a large Google program 325 written in Go to see how the source code fanout compared to the C++ 326 analysis done earlier. We found it was about 40X, which is 327 fifty times better than C++ (as well as being simpler and hence faster 328 to process), but it's still bigger than we expected. There are two reasons for 329 this. First, we found a bug: the Go compiler was generating a substantial 330 amount of data in the export section that did not need to be there. Second, 331 the export data uses a verbose encoding that could be improved. 332 We plan to address these issues. 333 334 Nonetheless, a factor of fifty less to do turns minutes into seconds, 335 coffee breaks into interactive builds. 336 337 Another feature of the Go dependency graph is that it has no cycles. 338 The language defines that there can be no circular imports in the graph, 339 and the compiler and linker both check that they do not exist. 340 Although they are occasionally useful, circular imports introduce 341 significant problems at scale. 342 They require the compiler to deal with larger sets of source files 343 all at once, which slows down incremental builds. 344 More important, when allowed, in our experience such imports end up 345 entangling huge swaths of the source tree into large subpieces that are 346 difficult to manage independently, bloating binaries and complicating 347 initialization, testing, refactoring, releasing, and other tasks of 348 software development. 349 350 The lack of circular imports causes occasional annoyance but keeps the tree clean, 351 forcing a clear demarcation between packages. As with many of the 352 design decisions in Go, it forces the programmer to think earlier about a 353 larger-scale issue (in this case, package boundaries) that if left until 354 later may never be addressed satisfactorily. 355 356 Through the design of the standard library, great effort was spent on controlling 357 dependencies. It can be better to copy a little code than to pull in a big 358 library for one function. (A test in the system build complains if new core 359 dependencies arise.) Dependency hygiene trumps code reuse. 360 One example of this in practice is that 361 the (low-level) `net` package has its own integer-to-decimal conversion routine 362 to avoid depending on the bigger and dependency-heavy formatted I/O package. 363 Another is that the string conversion package `strconv` has a private implementation 364 of the definition of 'printable' characters rather than pull in the large Unicode 365 character class tables; that `strconv` honors the Unicode standard is verified by the 366 package's tests. 367 368 * Packages 369 370 The design of Go's package system combines some of the properties of libraries, 371 name spaces, and modules into a single construct. 372 373 Every Go source file, for instance `"encoding/json/json.go"`, starts with a package clause, like this: 374 375 package json 376 377 where `json` is the "package name", a simple identifier. 378 Package names are usually concise. 379 380 To use a package, the importing source file identifies it by its _package_path_ 381 in the import clause. 382 The meaning of "path" is not specified by the language, but in 383 practice and by convention it is the slash-separated directory path of the 384 source package in the repository, here: 385 386 import "encoding/json" 387 388 Then the package name (as distinct from path) is used to qualify items from 389 the package in the importing source file: 390 391 var dec = json.NewDecoder(reader) 392 393 This design provides clarity. 394 One may always tell whether a name is local to package from its syntax: `Name` vs. `pkg.Name`. 395 (More on this later.) 396 397 For our example, the package path is `"encoding/json"` while the package name is `json`. 398 Outside the standard repository, the convention is to place the 399 project or company name at the root of the name space: 400 401 import "google/base/go/log" 402 403 It's important to recognize that package _paths_ are unique, 404 but there is no such requirement for package _names_. 405 The path must uniquely identify the package to be imported, while the 406 name is just a convention for how clients of the package can refer to its 407 contents. 408 The package name need not be unique and can be overridden 409 in each importing source file by providing a local identifier in the 410 import clause. These two imports both reference packages that 411 call themselves `package` `log`, but to import them in a single source 412 file one must be (locally) renamed: 413 414 import "log" // Standard package 415 import googlelog "google/base/go/log" // Google-specific package 416 417 Every company might have its own `log` package but 418 there is no need to make the package name unique. 419 Quite the opposite: Go style suggests keeping package names short and clear 420 and obvious in preference to worrying about collisions. 421 422 Another example: there are many `server` packages in Google's code base. 423 424 * Remote packages 425 426 An important property of Go's package system is that the package path, 427 being in general an arbitrary string, can be co-opted to refer to remote 428 repositories by having it identify the URL of the site serving the repository. 429 430 Here is how to use the `doozer` package from `github`. The `go` `get` command 431 uses the `go` build tool to fetch the repository from the site and install it. 432 Once installed, it can be imported and used like any regular package. 433 434 $ go get github.com/4ad/doozer // Shell command to fetch package 435 436 import "github.com/4ad/doozer" // Doozer client's import statement 437 438 var client doozer.Conn // Client's use of package 439 440 It's worth noting that the `go` `get` command downloads dependencies 441 recursively, a property made possible only because the dependencies are 442 explicit. 443 Also, the allocation of the space of import paths is delegated to URLs, 444 which makes the naming of packages decentralized and therefore scalable, 445 in contrast to centralized registries used by other languages. 446 447 * Syntax 448 449 Syntax is the user interface of a programming language. Although it has 450 limited effect on the semantics of the language, which is arguably the 451 more important component, syntax determines the readability and hence 452 clarity of the language. Also, syntax is critical to tooling: if the language 453 is hard to parse, automated tools are hard to write. 454 455 Go was therefore designed with clarity and tooling in mind, and has 456 a clean syntax. 457 Compared to other languages in the C family, its 458 grammar is modest in size, with only 25 keywords (C99 has 459 37; C++11 has 84; the numbers continue to grow). 460 More important, 461 the grammar is regular and therefore easy to parse (mostly; there 462 are a couple of quirks we might have fixed but didn't discover early 463 enough). 464 Unlike C and Java and especially C++, Go can be parsed without 465 type information or a symbol table; 466 there is no type-specific context. The grammar is 467 easy to reason about and therefore tools are easy to write. 468 469 One of the details of Go's syntax that surprises C programmers is that 470 the declaration syntax is closer to Pascal's than to C's. 471 The declared name appears before the type and there are more keywords: 472 473 var fn func([]int) int 474 type T struct { a, b int } 475 476 as compared to C's 477 478 int (*fn)(int[]); 479 struct T { int a, b; } 480 481 Declarations introduced by keyword are easier to parse both for people and 482 for computers, and having the type syntax not be the expression syntax 483 as it is in C has a significant effect on parsing: it adds grammar 484 but eliminates ambiguity. 485 But there is a nice side effect, too: for initializing declarations, 486 one can drop the `var` keyword and just take the type of the variable 487 from that of the expression. These two declarations are equivalent; 488 the second is shorter and idiomatic: 489 490 var buf *bytes.Buffer = bytes.NewBuffer(x) // explicit 491 buf := bytes.NewBuffer(x) // derived 492 493 There is a blog post at [[http://golang.org/s/decl-syntax][golang.org/s/decl-syntax]] with more detail about the syntax of declarations in Go and 494 why it is so different from C. 495 496 Function syntax is straightforward for simple functions. 497 This example declares the function `Abs`, which accepts a single 498 variable `x` of type `T` and returns a single `float64` value: 499 500 func Abs(x T) float64 501 502 A method is just a function with a special parameter, its _receiver_, 503 which can be passed to the function using the standard "dot" notation. 504 Method declaration syntax places the receiver in parentheses before the 505 function name. Here is the same function, now as a method of type `T`: 506 507 func (x T) Abs() float64 508 509 And here is a variable (closure) with a type `T` argument; Go has first-class 510 functions and closures: 511 512 negAbs := func(x T) float64 { return -Abs(x) } 513 514 Finally, in Go functions can return multiple values. A common case is to 515 return the function result and an `error` value as a pair, like this: 516 517 func ReadByte() (c byte, err error) 518 519 c, err := ReadByte() 520 if err != nil { ... } 521 522 We'll talk more about errors later. 523 524 One feature missing from Go is that it 525 does not support default function arguments. This was a deliberate 526 simplification. Experience tells us that defaulted arguments make it 527 too easy to patch over API design flaws by adding more arguments, 528 resulting in too many arguments with interactions that are 529 difficult to disentangle or even understand. 530 The lack of default arguments requires more functions or methods to be defined, 531 as one function cannot hold the entire interface, 532 but that leads to a clearer API that is easier to understand. 533 Those functions all need separate names, too, which makes it clear 534 which combinations exist, as well as encouraging more 535 thought about naming, a critical aspect of clarity and readability. 536 537 One mitigating factor for the lack of default arguments is that Go 538 has easy-to-use, type-safe support for variadic functions. 539 540 * Naming 541 542 Go takes an unusual approach to defining the _visibility_ of an identifier, 543 the ability for a client of a package to use the item named by the identifier. 544 Unlike, for instance, `private` and `public` keywords, in Go the name itself 545 carries the information: the case of the initial letter of the identifier 546 determines the visibility. If the initial character is an upper case letter, 547 the identifier is _exported_ (public); otherwise it is not: 548 549 - upper case initial letter: `Name` is visible to clients of package 550 - otherwise: `name` (or `_Name`) is not visible to clients of package 551 552 This rule applies to variables, types, functions, methods, constants, fields... 553 everything. That's all there is to it. 554 555 This was not an easy design decision. 556 We spent over a year struggling to 557 define the notation to specify an identifier's visibility. 558 Once we settled on using the case of the name, we soon realized it had 559 become one of the most important properties about the language. 560 The name is, after all, what clients of the package use; putting 561 the visibility in the name rather than its type means that it's always 562 clear when looking at an identifier whether it is part of the public API. 563 After using Go for a while, it feels burdensome when going back to 564 other languages that require looking up the declaration to discover 565 this information. 566 567 The result is, again, clarity: the program source text expresses the 568 programmer's meaning simply. 569 570 Another simplification is that Go has a very compact scope hierarchy: 571 572 - universe (predeclared identifiers such as `int` and `string`) 573 - package (all the source files of a package live at the same scope) 574 - file (for package import renames only; not very important in practice) 575 - function (the usual) 576 - block (the usual) 577 578 There is no scope for name space or class or other wrapping 579 construct. Names come from very few places in Go, and all names 580 follow the same scope hierarchy: at any given location in the source, 581 an identifier denotes exactly one language object, independent of how 582 it is used. (The only exception is statement labels, the targets of `break` 583 statements and the like; they always have function scope.) 584 585 This has consequences for clarity. Notice for instance that methods 586 declare an explicit receiver and that it must be used to access fields and 587 methods of the type. There is no implicit `this`. That is, one always 588 writes 589 590 rcvr.Field 591 592 (where rcvr is whatever name is chosen for the receiver variable) 593 so all the elements of the type always appear lexically bound to 594 a value of the receiver type. Similarly, a package qualifier is always present 595 for imported names; one writes `io.Reader` not `Reader`. 596 Not only is this clear, it frees up the identifier `Reader` as a useful 597 name to be used in any package. There are in fact multiple exported 598 identifiers in the standard library with name `Reader`, or `Printf` 599 for that matter, yet which one is being referred to is always unambiguous. 600 601 Finally, these rules combine to guarantee that, other than the top-level 602 predefined names such as `int`, (the first component of) every name is 603 always declared in the current package. 604 605 In short, names are local. In C, C++, or Java the name `y` could refer to anything. 606 In Go, `y` (or even `Y`) is always defined within the package, 607 while the interpretation of `x.Y` is clear: find `x` locally, `Y` belongs to it. 608 609 These rules provide an important property for scaling because they guarantee 610 that adding an exported name to a package can never break a client 611 of that package. The naming rules decouple packages, providing 612 scaling, clarity, and robustness. 613 614 There is one more aspect of naming to be mentioned: method lookup 615 is always by name only, not by signature (type) of the method. 616 In other words, a single type can never have two methods with the same name. 617 Given a method `x.M`, there's only ever one `M` associated with `x`. 618 Again, this makes it easy to identify which method is referred to given 619 only the name. 620 It also makes the implementation of method invocation simple. 621 622 * Semantics 623 624 The semantics of Go statements is generally C-like. It is a compiled, statically typed, 625 procedural language with pointers and so on. By design, it should feel 626 familiar to programmers accustomed to languages in the C family. 627 When launching a new language 628 it is important that the target audience be able to learn it quickly; rooting Go 629 in the C family helps make sure that young programmers, most of whom 630 know Java, JavaScript, and maybe C, should find Go easy to learn. 631 632 That said, Go makes many small changes to C semantics, mostly in the 633 service of robustness. These include: 634 635 - there is no pointer arithmetic 636 - there are no implicit numeric conversions 637 - array bounds are always checked 638 - there are no type aliases (after `type`X`int`, `X` and `int` are distinct types not aliases) 639 - `++` and `--` are statements not expressions 640 - assignment is not an expression 641 - it is legal (encouraged even) to take the address of a stack variable 642 - and many more 643 644 There are some much bigger changes too, stepping far from the traditional 645 C, C++, and even Java models. These include linguistic support for: 646 647 - concurrency 648 - garbage collection 649 - interface types 650 - reflection 651 - type switches 652 653 The following sections provide brief discussions of two of these topics in Go, 654 concurrency and garbage collection, 655 mostly from a software engineering perspective. 656 For a full discussion of the language semantics and uses see the many 657 resources on the [[http://golang.org]] web site. 658 659 * Concurrency 660 661 Concurrency is important to the modern computing environment with its 662 multicore machines running web servers with multiple clients, 663 what might be called the typical Google program. 664 This kind of software is not especially well served by C++ or Java, 665 which lack sufficient concurrency support at the language level. 666 667 Go embodies a variant of CSP with first-class channels. 668 CSP was chosen partly due to familiarity (one of us had worked on 669 predecessor languages that built on CSP's ideas), but also because 670 CSP has the property that it is easy to add to a procedural programming 671 model without profound changes to that model. 672 That is, given a C-like language, CSP can be added to the language 673 in a mostly orthogonal way, providing extra expressive power without 674 constraining the language's other uses. In short, the rest of the 675 language can remain "ordinary". 676 677 The approach is thus the composition of independently executing 678 functions of otherwise regular procedural code. 679 680 The resulting language allows us to couple concurrency with computation 681 smoothly. Consider a web server that must verify security certificates for 682 each incoming client call; in Go it is easy to construct the software using 683 CSP to manage the clients as independently executing procedures but 684 to have the full power of an efficient compiled language available for 685 the expensive cryptographic calculations. 686 687 In summary, CSP is practical for Go and for Google. When writing 688 a web server, the canonical Go program, the model is a great fit. 689 690 There is one important caveat: Go is not purely memory safe in the presence 691 of concurrency. Sharing is legal and passing a pointer over a channel is idiomatic 692 (and efficient). 693 694 Some concurrency and functional programming experts are disappointed 695 that Go does not take a write-once approach to value semantics 696 in the context of concurrent computation, that Go is not more like 697 Erlang for example. 698 Again, the reason is largely about familiarity and suitability for the 699 problem domain. Go's concurrent features work well in a context 700 familiar to most programmers. 701 Go _enables_ simple, safe concurrent 702 programming but does not _forbid_ bad programming. 703 We compensate by convention, training programmers to think 704 about message passing as a version of ownership control. The motto is, 705 "Don't communicate by sharing memory, share memory by communicating." 706 707 Our limited experience with programmers new to both Go and concurrent 708 programming shows that this is a practical approach. Programmers 709 enjoy the simplicity that support for concurrency brings to network 710 software, and simplicity engenders robustness. 711 712 * Garbage collection 713 714 For a systems language, garbage collection can be a controversial feature, 715 yet we spent very little time deciding that Go would be a 716 garbage-collected language. 717 Go has no explicit memory-freeing operation: the only way allocated 718 memory returns to the pool is through the garbage collector. 719 720 It was an easy decision to make because memory management 721 has a profound effect on the way a language works in practice. 722 In C and C++, too much programming effort is spent on memory allocation 723 and freeing. 724 The resulting designs tend to expose details of memory management 725 that could well be hidden; conversely memory considerations 726 limit how they can be used. By contrast, garbage collection makes interfaces 727 easier to specify. 728 729 Moreover, in a concurrent object-oriented language it's almost essential 730 to have automatic memory management because the ownership of a piece 731 of memory can be tricky to manage as it is passed around among concurrent 732 executions. It's important to separate behavior from resource management. 733 734 The language is much easier to use because of garbage collection. 735 736 Of course, garbage collection brings significant costs: general overhead, 737 latency, and complexity of the implementation. Nonetheless, we believe 738 that the benefits, which are mostly felt by the programmer, outweigh 739 the costs, which are largely borne by the language implementer. 740 741 Experience with Java in particular as a server language has made some 742 people nervous about garbage collection in a user-facing system. 743 The overheads are uncontrollable, latencies can be large, and much 744 parameter tuning is required for good performance. 745 Go, however, is different. Properties of the language mitigate some of these 746 concerns. Not all of them of course, but some. 747 748 The key point is that Go gives the programmer tools to limit allocation 749 by controlling the layout of data structures. Consider this simple 750 type definition of a data structure containing a buffer (array) of bytes: 751 752 type X struct { 753 a, b, c int 754 buf [256]byte 755 } 756 757 In Java, the `buf` field would require a second allocation and accesses 758 to it a second level of indirection. In Go, however, the buffer is allocated 759 in a single block of memory along with the containing struct and no 760 indirection is required. For systems programming, this design can have a 761 better performance as well as reducing the number 762 of items known to the collector. At scale it can make a significant 763 difference. 764 765 As a more direct example, in Go it is easy and efficient to provide 766 second-order allocators, for instance an arena allocator that allocates 767 a large array of structs and links them together with a free list. 768 Libraries that repeatedly use many small structures like this can, 769 with modest prearrangement, generate no garbage yet 770 be efficient and responsive. 771 772 Although Go is a garbage collected language, therefore, a knowledgeable 773 programmer can limit the pressure placed on the collector and thereby 774 improve performance. (Also, the Go installation comes with good tools 775 for studying the dynamic memory performance of a running program.) 776 777 To give the programmer this flexibility, Go must support 778 what we call _interior_pointers_ to objects 779 allocated in the heap. The `X.buf` field in the example above lives 780 within the struct but it is legal to capture the address of this inner field, 781 for instance to pass it to an I/O routine. In Java, as in many garbage-collected 782 languages, it is not possible to construct an interior pointer like this, 783 but in Go it is idiomatic. 784 This design point affects which collection algorithms can be used, 785 and may make them more difficult, but after careful thought we decided 786 that it was necessary to allow interior pointers because of the benefits 787 to the programmer and the ability to reduce pressure on the (perhaps 788 harder to implement) collector. 789 So far, our experience comparing similar Go and Java programs shows 790 that use of interior pointers can have a significant effect on total arena size, 791 latency, and collection times. 792 793 In summary, Go is garbage collected but gives the programmer 794 some tools to control collection overhead. 795 796 The garbage collector remains an active area of development. 797 The current design is a parallel mark-and-sweep collector and there remain 798 opportunities to improve its performance or perhaps even its design. 799 (The language specification does not mandate any particular implementation 800 of the collector.) 801 Still, if the programmer takes care to use memory wisely, 802 the current implementation works well for production use. 803 804 * Composition not inheritance 805 806 Go takes an unusual approach to object-oriented programming, allowing 807 methods on any type, not just classes, but without any form of type-based inheritance 808 like subclassing. 809 This means there is no type hierarchy. 810 This was an intentional design choice. 811 Although type hierarchies have been used to build much successful 812 software, it is our opinion that the model has been overused and that it 813 is worth taking a step back. 814 815 Instead, Go has _interfaces_, an idea that has been discussed at length elsewhere (see 816 [[http://research.swtch.com/interfaces]] 817 for example), but here is a brief summary. 818 819 In Go an interface is _just_ a set of methods. For instance, here is the definition 820 of the `Hash` interface from the standard library. 821 822 type Hash interface { 823 Write(p []byte) (n int, err error) 824 Sum(b []byte) []byte 825 Reset() 826 Size() int 827 BlockSize() int 828 } 829 830 All data types that implement these methods satisfy this interface implicitly; 831 there is no `implements` declaration. 832 That said, interface satisfaction is statically checked at compile time 833 so despite this decoupling interfaces are type-safe. 834 835 A type will usually satisfy many interfaces, each corresponding 836 to a subset of its methods. For example, any type that satisfies the `Hash` 837 interface also satisfies the `Writer` interface: 838 839 type Writer interface { 840 Write(p []byte) (n int, err error) 841 } 842 843 This fluidity of interface satisfaction encourages a different approach 844 to software construction. But before explaining that, we should explain 845 why Go does not have subclassing. 846 847 Object-oriented programming provides a powerful insight: that the 848 _behavior_ of data can be generalized independently of the 849 _representation_ of that data. 850 The model works best when the behavior (method set) is fixed, 851 but once you subclass a type and add a method, 852 _the_behaviors_are_no_longer_identical_. 853 If instead the set of behaviors is fixed, such as in Go's statically 854 defined interfaces, the uniformity of behavior enables data and 855 programs to be composed uniformly, orthogonally, and safely. 856 857 One extreme example is the Plan 9 kernel, in which all system data items 858 implemented exactly the same interface, a file system API defined 859 by 14 methods. 860 This uniformity permitted a level of object composition seldom 861 achieved in other systems, even today. 862 Examples abound. Here's one: A system could import (in Plan 9 terminology) a TCP 863 stack to a computer that didn't have TCP or even Ethernet, and over that network 864 connect to a machine with a different CPU architecture, import its `/proc` tree, 865 and run a local debugger to do breakpoint debugging of the remote process. 866 This sort of operation was workaday on Plan 9, nothing special at all. 867 The ability to do such things fell out of the design; it required no special 868 arrangement (and was all done in plain C). 869 870 We argue that this compositional style of system construction has been 871 neglected by the languages that push for design by type hierarchy. 872 Type hierarchies result in brittle code. 873 The hierarchy must be designed early, often as the first step of 874 designing the program, and early decisions can be difficult to change once 875 the program is written. 876 As a consequence, the model encourages early overdesign as the 877 programmer tries to predict every possible use the software might 878 require, adding layers of type and abstraction just in case. 879 This is upside down. 880 The way pieces of a system interact should adapt as it grows, 881 not be fixed at the dawn of time. 882 883 Go therefore encourages _composition_ over inheritance, using 884 simple, often one-method interfaces to define trivial behaviors 885 that serve as clean, comprehensible boundaries between components. 886 887 Consider the `Writer` interface shown above, which is defined in 888 package `io`: Any item that has a `Write` method with this 889 signature works well with the complementary `Reader` interface: 890 891 type Reader interface { 892 Read(p []byte) (n int, err error) 893 } 894 895 These two complementary methods allow type-safe chaining 896 with rich behaviors, like generalized Unix pipes. 897 Files, buffers, networks, 898 encryptors, compressors, image encoders, and so on can all be 899 connected together. 900 The `Fprintf` formatted I/O routine takes an `io.Writer` rather than, 901 as in C, a `FILE*`. 902 The formatted printer has no knowledge of what it is writing to; it may 903 be a image encoder that is in turn writing to a compressor that 904 is in turn writing to an encryptor that is in turn writing to a network 905 connection. 906 907 Interface composition is a different style of programming, and 908 people accustomed to type hierarchies need to adjust their thinking to 909 do it well, but the result is an adaptability of 910 design that is harder to achieve through type hierarchies. 911 912 Note too that the elimination of the type hierarchy also eliminates 913 a form of dependency hierarchy. 914 Interface satisfaction allows the program to grow organically without 915 predetermined contracts. 916 And it is a linear form of growth; a change to an interface affects 917 only the immediate clients of that interface; there is no subtree to update. 918 The lack of `implements` declarations disturbs some people but 919 it enables programs to grow naturally, gracefully, and safely. 920 921 Go's interfaces have a major effect on program design. 922 One place we see this is in the use of functions that take interface 923 arguments. These are _not_ methods, they are functions. 924 Some examples should illustrate their power. 925 `ReadAll` returns a byte slice (array) holding all the data that can 926 be read from an `io.Reader`: 927 928 func ReadAll(r io.Reader) ([]byte, error) 929 930 Wrappers—functions that take an interface and return an interface—are 931 also widespread. 932 Here are some prototypes. 933 `LoggingReader` logs every `Read` call on the incoming `Reader`. 934 `LimitingReader` stops reading after `n` bytes. 935 `ErrorInjector` aids testing by simulating I/O errors. 936 And there are many more. 937 938 func LoggingReader(r io.Reader) io.Reader 939 func LimitingReader(r io.Reader, n int64) io.Reader 940 func ErrorInjector(r io.Reader) io.Reader 941 942 The designs are nothing like hierarchical, subtype-inherited methods. 943 They are looser (even _ad_hoc_), organic, decoupled, independent, and therefore scalable. 944 945 * Errors 946 947 Go does not have an exception facility in the conventional sense, 948 that is, there is no control structure associated with error handling. 949 (Go does provide mechanisms for handling exceptional situations 950 such as division by zero. A pair of built-in functions 951 called `panic` and `recover` allow the programmer to protect 952 against such things. However, these functions 953 are intentionally clumsy, rarely used, and not integrated 954 into the library the way, say, Java libraries use exceptions.) 955 956 The key language feature for error handling is a pre-defined 957 interface type called `error` that represents a value that has an 958 `Error` method returning a string: 959 960 type error interface { 961 Error() string 962 } 963 964 Libraries use the `error` type to return a description of the error. 965 Combined with the ability for functions to return multiple 966 values, it's easy to return the computed result along with an 967 error value, if any. 968 For instance, the equivalent 969 to C's `getchar` does not return an out-of-band value at EOF, 970 nor does it throw an exception; it just returns an `error` value 971 alongside the character, with a `nil` `error` value signifying success. 972 Here is the signature of the `ReadByte` method of the buffered 973 I/O package's `bufio.Reader` type: 974 975 func (b *Reader) ReadByte() (c byte, err error) 976 977 This is a clear and simple design, easily understood. 978 Errors are just values and programs compute with 979 them as they would compute with values of any other type. 980 981 It was a deliberate choice not to incorporate exceptions in Go. 982 Although a number of critics disagree with this decision, there 983 are several reasons we believe it makes for better software. 984 985 First, there is nothing truly exceptional about errors in computer programs. 986 For instance, the inability to open a file is a common issue that 987 does not deserve special linguistic constructs; `if` and `return` are fine. 988 989 f, err := os.Open(fileName) 990 if err != nil { 991 return err 992 } 993 994 Also, if errors use special control structures, error handling distorts 995 the control flow for a program that handles errors. 996 The Java-like style of `try-catch-finally` blocks interlaces multiple overlapping flows 997 of control that interact in complex ways. 998 Although in contrast Go makes it more 999 verbose to check errors, the explicit design keeps the flow of control 1000 straightforward—literally. 1001 1002 There is no question the resulting code can be longer, 1003 but the clarity and simplicity of such code offsets its verbosity. 1004 Explicit error checking forces the programmer to think about 1005 errors—and deal with them—when they arise. Exceptions make 1006 it too easy to _ignore_ them rather than _handle_ them, passing 1007 the buck up the call stack until it is too late to fix the problem or 1008 diagnose it well. 1009 1010 * Tools 1011 1012 Software engineering requires tools. 1013 Every language operates in an environment with other languages 1014 and myriad tools to compile, edit, debug, profile, test, and run programs. 1015 1016 Go's syntax, package system, naming conventions, and other features 1017 were designed to make tools easy to write, and the library 1018 includes a lexer, parser, and type checker for the language. 1019 1020 Tools to manipulate Go programs are so easy to write that 1021 many such tools have been created, 1022 some with interesting consequences for software engineering. 1023 1024 The best known of these is `gofmt`, the Go source code formatter. 1025 From the beginning of the project, we intended Go programs 1026 to be formatted by machine, eliminating an entire class of argument 1027 between programmers: how do I lay out my code? 1028 `Gofmt` is run on all Go programs we write, and most of the open 1029 source community uses it too. 1030 It is run as a "presubmit" check for the code repositories to 1031 make sure that all checked-in Go programs are formatted the same. 1032 1033 `Gofmt` is often cited by users as one of Go's best features even 1034 though it is not part of the language. 1035 The existence and use of `gofmt` means that 1036 from the beginning, the community has always 1037 seen Go code as `gofmt` formats it, so Go programs have a single 1038 style that is now familiar to everyone. Uniform presentation 1039 makes code easier to read and therefore faster to work on. 1040 Time not spent on formatting is time saved. 1041 `Gofmt` also affects scalability: since all code looks the same, 1042 teams find it easier to work together or with others' code. 1043 1044 `Gofmt` enabled another class of tools that we did not foresee as clearly. 1045 The program works by parsing the source code and reformatting it 1046 from the parse tree itself. 1047 This makes it possible to _edit_ the parse tree before formatting it, 1048 so a suite of automatic refactoring tools sprang up. 1049 These are easy to write, can be semantically rich because they work 1050 directly on the parse tree, and automatically produce canonically 1051 formatted code. 1052 1053 The first example was a `-r` (rewrite) flag on `gofmt` itself, which 1054 uses a simple pattern-matching language to enable expression-level 1055 rewrites. For instance, one day we introduced a default value for the 1056 right-hand side of a slice expression: the length itself. The entire 1057 Go source tree was updated to use this default with the single 1058 command: 1059 1060 gofmt -r 'a[b:len(a)] -> a[b:]' 1061 1062 A key point about this transformation is that, because the input and 1063 output are both in the canonical format, the only changes made to 1064 the source code are semantic ones. 1065 1066 A similar but more intricate process allowed `gofmt` to be used to 1067 update the tree when the language no longer required semicolons 1068 as statement terminators if the statement ended at a newline. 1069 1070 Another important tool is `gofix`, which runs tree-rewriting modules 1071 written in Go itself that are therefore are capable of more advanced 1072 refactorings. 1073 The `gofix` tool allowed us to make sweeping changes to APIs and language 1074 features leading up to the release of Go 1, including a change to the syntax 1075 for deleting entries from a map, a radically different API for manipulating 1076 time values, and many more. 1077 As these changes rolled out, users could update all their code by running 1078 the simple command 1079 1080 gofix 1081 1082 Note that these tools allow us to _update_ code even if the old code still 1083 works. 1084 As a result, Go repositories are easy to keep up to date as libraries evolve. 1085 Old APIs can be deprecated quickly and automatically so only one version 1086 of the API needs to be maintained. 1087 For example, we recently changed Go's protocol buffer implementation to use 1088 "getter" functions, which were not in the interface before. 1089 We ran `gofix` on _all_ of Google's Go code to update all programs that 1090 use protocol buffers, and now there is only one version of the API in use. 1091 Similar sweeping changes to the C++ or Java libraries are almost infeasible 1092 at the scale of Google's code base. 1093 1094 The existence of a parsing package in the standard Go library has enabled 1095 a number of other tools as well. Examples include the `go` tool, which 1096 manages program construction including acquiring packages from 1097 remote repositories; 1098 the `godoc` document extractor, 1099 a program to verify that the API compatibility contract is maintained as 1100 the library is updated, and many more. 1101 1102 Although tools like these are rarely mentioned in the context of language 1103 design, they are an integral part of a language's ecosystem and the fact 1104 that Go was designed with tooling in mind has a huge effect on the 1105 development of the language, its libraries, and its community. 1106 1107 * Conclusion 1108 1109 Go's use is growing inside Google. 1110 1111 Several big user-facing services use it, including `youtube.com` and `dl.google.com` 1112 (the download server that delivers Chrome, Android and other downloads), 1113 as well as our own [[http://golang.org][golang.org]]. 1114 And of course many small ones do, mostly 1115 built using Google App Engine's native support for Go. 1116 1117 Many other companies use Go as well; the list is very long, but a few of the 1118 better known are: 1119 1120 - BBC Worldwide 1121 - Canonical 1122 - Heroku 1123 - Nokia 1124 - SoundCloud 1125 1126 It looks like Go is meeting its goals. Still, it's too early to declare it a success. 1127 We don't have enough experience yet, especially with big programs (millions 1128 of lines of code) to know whether the attempts to build a scalable language 1129 have paid off. All the indicators are positive though. 1130 1131 On a smaller scale, some minor things aren't quite right and might get 1132 tweaked in a later (Go 2?) version of the language. For instance, there are 1133 too many forms of variable declaration syntax, programmers are 1134 easily confused by the behavior of nil values inside non-nil interfaces, 1135 and there are many library and interface details that could use another 1136 round of design. 1137 1138 It's worth noting, though, that `gofix` and `gofmt` gave us the opportunity to 1139 fix many other problems during the leadup to Go version 1. 1140 Go as it is today is therefore much closer to what the designers wanted 1141 than it would have been without these tools, which were themselves 1142 enabled by the language's design. 1143 1144 Not everything was fixed, though. We're still learning (but the language 1145 is frozen for now). 1146 1147 A significant weakness of the language is that the implementation still 1148 needs work. The compilers' generated code and the performance of the 1149 runtime in particular should be better, and work continues on them. 1150 There is progress already; in fact some benchmarks show a 1151 doubling of performance with the development version today compared 1152 to the first release of Go version 1 early in 2012. 1153 1154 * Summary 1155 1156 Software engineering guided the design of Go. 1157 More than most general-purpose 1158 programming languages, Go was designed to address a set of software engineering 1159 issues that we had been exposed to in the construction of large server software. 1160 Offhand, that might make Go sound rather dull and industrial, but in fact 1161 the focus on clarity, simplicity and composability throughout the design 1162 instead resulted in a productive, fun language that many programmers 1163 find expressive and powerful. 1164 1165 The properties that led to that include: 1166 1167 - Clear dependencies 1168 - Clear syntax 1169 - Clear semantics 1170 - Composition over inheritance 1171 - Simplicity provided by the programming model (garbage collection, concurrency) 1172 - Easy tooling (the `go` tool, `gofmt`, `godoc`, `gofix`) 1173 1174 If you haven't tried Go already, we suggest you do. 1175 1176 1177 .link http://golang.org http://golang.org 1178 1179 .image splash/appenginegophercolor.jpg 1180