github.com/graybobo/golang.org-package-offline-cache@v0.0.0-20200626051047-6608995c132f/x/blog/content/slices.article

github.com/graybobo/golang.org-package-offline-cache@v0.0.0-20200626051047-6608995c132f/x/blog/content/slices.article (about)

     1  Arrays, slices (and strings): The mechanics of 'append'
     2  26 Sep 2013
     3  Tags: array, slice, string, copy, append
     4  
     5  Rob Pike
     6  
     7  * Introduction
     8  
     9  One of the most common features of procedural programming languages is
    10  the concept of an array.
    11  Arrays seem like simple things but there are many questions that must be
    12  answered when adding them to a language, such as:
    13  
    14  - fixed-size or variable-size?
    15  - is the size part of the type?
    16  - what do multidimensional arrays look like?
    17  - does the empty array have meaning?
    18  
    19  The answers to these questions affect whether arrays are just
    20  a feature of the language or a core part of its design.
    21  
    22  In the early development of Go, it took about a year to decide the answers
    23  to these questions before the design felt right.
    24  The key step was the introduction of _slices_, which built on fixed-size
    25  _arrays_ to give a flexible, extensible data structure.
    26  To this day, however, programmers new to Go often stumble over the way slices
    27  work, perhaps because experience from other languages has colored their thinking.
    28  
    29  In this post we'll attempt to clear up the confusion.
    30  We'll do so by building up the pieces to explain how the `append` built-in function
    31  works, and why it works the way it does.
    32  
    33  * Arrays
    34  
    35  Arrays are an important building block in Go, but like the foundation of a building
    36  they are often hidden below more visible components.
    37  We must talk about them briefly before we move on to the more interesting,
    38  powerful, and prominent idea of slices.
    39  
    40  Arrays are not often seen in Go programs because
    41  the size of an array is part of its type, which limits its expressive power.
    42  
    43  The declaration
    44  
    45  .code slices/prog010.go /var buffer/
    46  
    47  declares the variable `buffer`, which holds 256 bytes.
    48  The type of `buffer` includes its size, `[256]byte`.
    49  An array with 512 bytes would be of the distinct type `[512]byte`.
    50  
    51  The data associated with an array is just that: an array of elements.
    52  Schematically, our buffer looks like this in memory,
    53  
    54  	buffer: byte byte byte ... 256 times ... byte byte byte
    55  
    56  That is, the variable holds 256 bytes of data and nothing else. We can
    57  access its elements with the familiar indexing syntax, `buffer[0]`, `buffer[1]`,
    58  and so on through `buffer[255]`. (The index range 0 through 255 covers
    59  256 elements.) Attempting to index `buffer` with a value outside this
    60  range will crash the program.
    61  
    62  There is a built-in function called `len` that returns the number of elements
    63  of an array or slice and also of a few other data types.
    64  For arrays, it's obvious what `len` returns.
    65  In our example, `len(buffer)` returns the fixed value 256.
    66  
    67  Arrays have their place—they are a good representation of a transformation
    68  matrix for instance—but their most common purpose in Go is to hold storage
    69  for a slice.
    70  
    71  * Slices: The slice header
    72  
    73  Slices are where the action is, but to use them well one must understand
    74  exactly what they are and what they do.
    75  
    76  A slice is a data structure describing a contiguous section of an array
    77  stored separately from the slice variable itself.
    78  _A_slice_is_not_an_array_.
    79  A slice _describes_ a piece of an array.
    80  
    81  Given our `buffer` array variable from the previous section, we could create
    82  a slice that describes elements 100 through 150 (to be precise, 100 through 149,
    83  inclusive) by _slicing_ the array:
    84  
    85  .code slices/prog010.go /var slice/
    86  
    87  In that snippet we used the full variable declaration to be explicit.
    88  The variable `slice` has type `[]byte`, pronounced "slice of bytes",
    89  and is initialized from the array, called
    90  `buffer`, by slicing elements 100 (inclusive) through 150 (exclusive).
    91  The more idiomatic syntax would drop the type, which is set by the initializing expression:
    92  
    93  	var slice = buffer[100:150]
    94  
    95  Inside a function we could use the short declaration form,
    96  
    97  	slice := buffer[100:150]
    98  
    99  What exactly is this slice variable?
   100  It's not quite the full story, but for now think of a
   101  slice as a little data structure with two elements: a length and a pointer to an element
   102  of a array.
   103  You can think of it as being built like this behind the scenes:
   104  
   105  	type sliceHeader struct {
   106  		Length        int
   107  		ZerothElement *byte
   108  	}
   109  
   110  	slice := sliceHeader{
   111  		Length:        50,
   112  		ZerothElement: &buffer[100],
   113  	}
   114  
   115  Of course, this is just an illustration.
   116  Despite what this snippet says that `sliceHeader` struct is not visible
   117  to the programmer, and the type
   118  of the element pointer depends on the type of the elements,
   119  but this gives the general idea of the mechanics.
   120  
   121  So far we've used a slice operation on an array, but we can also slice a slice, like this:
   122  
   123  	slice2 := slice[5:10]
   124  
   125  Just as before, this operation creates a new slice, in this case with elements
   126  5 through 9 (inclusive) of the original slice, which means elements
   127  105 through 109 of the original array.
   128  The underlying `sliceHeader` struct for the `slice2` variable looks like
   129  this:
   130  
   131  	slice2 := sliceHeader{
   132  		Length:        5,
   133  		ZerothElement: &buffer[105],
   134  	}
   135  
   136  Notice that this header still points to the same underlying array, stored in
   137  the `buffer` variable.
   138  
   139  We can also _reslice_, which is to say slice a slice and store the result back in
   140  the original slice structure. After
   141  
   142  	slice = slice[5:10]
   143  
   144  the `sliceHeader` structure for the `slice` variable looks just like it did for the `slice2`
   145  variable.
   146  You'll see reslicing used often, for example to truncate a slice. This statement drops
   147  the first and last elements of our slice:
   148  
   149  	slice = slice[1:len(slice)-1]
   150  
   151  [Exercise: Write out what the `sliceHeader` struct looks like after this assignment.]
   152  
   153  You'll often hear experienced Go programmers talk about the "slice header"
   154  because that really is what's stored in a slice variable.
   155  For instance, when you call a function that takes a slice as an argument, such as
   156  [[http://golang.org/pkg/bytes/#IndexRune][bytes.IndexRune]], that header is
   157  what gets passed to the function.
   158  In this call,
   159  
   160  	slashPos := bytes.IndexRune(slice, '/')
   161  
   162  the `slice` argument that is passed to the `IndexRune` function is, in fact,
   163  a "slice header".
   164  
   165  There's one more data item in the slice header, which we talk about below,
   166  but first let's see what the existence of the slice header means when you
   167  program with slices.
   168  
   169  * Passing slices to functions
   170  
   171  It's important to understand that even though a slice contains a pointer,
   172  it is itself a value.
   173  Under the covers, it is a struct value holding a pointer and a length.
   174  It is _not_ a pointer to a struct.
   175  
   176  This matters.
   177  
   178  When we called `IndexRune` in the previous example,
   179  it was passed a _copy_ of the slice header.
   180  That behavior has important ramifications.
   181  
   182  Consider this simple function:
   183  
   184  .code slices/prog010.go /^func/,/^}/
   185  
   186  It does just what its name implies, iterating over the indices of a slice
   187  (using a `for` `range` loop), incrementing its elements.
   188  
   189  Try it:
   190  
   191  .play -edit slices/prog010.go /^func main/,/^}/
   192  
   193  (You can edit and re-execute these runnable snippets if you want to explore.)
   194  
   195  Even though the slice _header_ is passed by value, the header includes
   196  a pointer to elements of an array, so both the original slice header
   197  and the copy of the header passed to the function describe the same
   198  array.
   199  Therefore, when the function returns, the modified elements can
   200  be seen through the original slice variable.
   201  
   202  The argument to the function really is a copy, as this example shows:
   203  
   204  .play -edit slices/prog020.go /^func/,$
   205  
   206  Here we see that the _contents_ of a slice argument can be modified by a function,
   207  but its _header_ cannot.
   208  The length stored in the `slice` variable is not modified by the call to the function,
   209  since the function is passed a copy of the slice header, not the original.
   210  Thus if we want to write a function that modifies the header, we must return it as a result
   211  parameter, just as we have done here.
   212  The `slice` variable is unchanged but the returned value has the new length,
   213  which is then stored in `newSlice`,
   214  
   215  * Pointers to slices: Method receivers
   216  
   217  Another way to have a function modify the slice header is to pass a pointer to it.
   218  Here's a variant of our previous example that does this:
   219  
   220  .play -edit slices/prog030.go /^func/,$
   221  
   222  It seems clumsy in that example, especially dealing with the extra level of indirection
   223  (a temporary variable helps),
   224  but there is one common case where you see pointers to slices.
   225  It is idiomatic to use a pointer receiver for a method that modifies a slice.
   226  
   227  Let's say we wanted to have a method on a slice that truncates it at the final slash.
   228  We could write it like this:
   229  
   230  .play -edit slices/prog040.go /^type/,$
   231  
   232  If you run this example you'll see that it works properly, updating the slice in the caller.
   233  
   234  [Exercise: Change the type of the receiver to be a value rather
   235  than a pointer and run it again. Explain what happens.]
   236  
   237  On the other hand, if we wanted to write a method for `path` that upper-cases
   238  the ASCII letters in the path (parochially ignoring non-English names), the method could
   239  be a value because the value receiver will still point to the same underlying array.
   240  
   241  .play -edit slices/prog050.go /^type/,$
   242  
   243  Here the `ToUpper` method uses two variables in the `for` `range` construct
   244  to capture the index and slice element.
   245  This form of loop avoids writing `p[i]` multiple times in the body.
   246  
   247  [Exercise: Convert the `ToUpper` method to use a pointer receiver and see if its behavior changes.]
   248  
   249  [Advanced exercise: Convert the `ToUpper` method to handle Unicode letters, not just ASCII.]
   250  
   251  * Capacity
   252  
   253  Look at the following function that extends its argument slice of `ints` by one element:
   254  
   255  .code slices/prog060.go /^func Extend/,/^}/
   256  
   257  (Why does it need to return the modified slice?) Now run it:
   258  
   259  .play -edit slices/prog060.go /^func main/,/^}/
   260  
   261  See how the slice grows until... it doesn't.
   262  
   263  It's time to talk about the third component of the slice header: its _capacity_.
   264  Besides the array pointer and length, the slice header also stores its capacity:
   265  
   266  	type sliceHeader struct {
   267  		Length        int
   268  		Capacity      int
   269  		ZerothElement *byte
   270  	}
   271  
   272  The `Capacity` field records how much space the underlying array actually has; it is the maximum
   273  value the `Length` can reach.
   274  Trying to grow the slice beyond its capacity will step beyond the limits of the array and will trigger a panic.
   275  
   276  After our example slice is created by
   277  
   278  	slice := iBuffer[0:0]
   279  
   280  its header looks like this:
   281  
   282  	slice := sliceHeader{
   283  		Length:        0,
   284  		Capacity:      10,
   285  		ZerothElement: &iBuffer[0],
   286  	}
   287  
   288  The `Capacity` field is equal to the length of the underlying array,
   289  minus the index in the array of the first element of the slice (zero in this case).
   290  If you want to inquire what the capacity is for a slice, use the built-in function `cap`:
   291  
   292  	if cap(slice) == len(slice) {
   293  		fmt.Println("slice is full!")
   294  	}
   295  
   296  * Make
   297  
   298  What if we want to grow the slice beyond its capacity?
   299  You can't!
   300  By definition, the capacity is the limit to growth.
   301  But you can achieve an equivalent result by allocating a new array, copying the data over, and modifying
   302  the slice to describe the new array.
   303  
   304  Let's start with allocation.
   305  We could use the `new` built-in function to allocate a bigger array
   306  and then slice the result,
   307  but it is simpler to use the `make` built-in function instead.
   308  It allocates a new array and
   309  creates a slice header to describe it, all at once.
   310  The `make` function takes three arguments: the type of the slice, its initial length, and its capacity, which is the
   311  length of the array that `make` allocates to hold the slice data.
   312  This call creates a slice of length 10 with room for 5 more (15-10), as you can see by running it:
   313  
   314  .play -edit slices/prog070.go /slice/,/fmt/
   315  
   316  This snippet doubles the capacity of our `int` slice but keeps its length the same:
   317  
   318  .play -edit slices/prog080.go /slice/,/OMIT/
   319  
   320  After running this code the slice has much more room to grow before needing another reallocation.
   321  
   322  When creating slices, it's often true that the length and capacity will be same.
   323  The `make` built-in has a shorthand for this common case.
   324  The length argument defaults to the capacity, so you can leave it out
   325  to set them both to the same value.
   326  After
   327  
   328  	gophers := make([]Gopher, 10)
   329  
   330  the `gophers` slice has both its length and capacity set to 10.
   331  
   332  * Copy
   333  
   334  When we doubled the capacity of our slice in the previous section,
   335  we wrote a loop to copy the old data to the new slice.
   336  Go has a built-in function, `copy`, to make this easier.
   337  Its arguments are two slices, and it copies the data from the right-hand argument to the left-hand argument.
   338  Here's our example rewritten to use `copy`:
   339  
   340  .play -edit slices/prog090.go /newSlice/,/newSlice/
   341  
   342  The `copy` function is smart.
   343  It only copies what it can, paying attention to the lengths of both arguments.
   344  In other words, the number of elements it copies is the minimum of the lengths of the two slices.
   345  This can save a little bookkeeping.
   346  Also, `copy` returns an integer value, the number of elements it copied, although it's not always worth checking.
   347  
   348  The `copy` function also gets things right when source and destination overlap, which means it can be used to shift
   349  items around in a single slice.
   350  Here's how to use `copy` to insert a value into the middle of a slice.
   351  
   352  .code slices/prog100.go /Insert/,/^}/
   353  
   354  There are a couple of things to notice in this function.
   355  First, of course, it must return the updated slice because its length has changed.
   356  Second, it uses a convenient shorthand.
   357  The expression
   358  
   359  	slice[i:]
   360  
   361  means exactly the same as
   362  
   363  	slice[i:len(slice)]
   364  
   365  Also, although we haven't used the trick yet, we can leave out the first element of a slice expression too;
   366  it defaults to zero. Thus
   367  
   368  	slice[:]
   369  
   370  just means the slice itself, which is useful when slicing an array.
   371  This expression is the shortest way to say "a slice describing all the elements of the array":
   372  
   373  	array[:]
   374  
   375  Now that's out of the way, let's run our `Insert` function.
   376  
   377  .play -edit slices/prog100.go /make/,/OMIT/
   378  
   379  * Append: An example
   380  
   381  A few sections back, we wrote an `Extend` function that extends a slice by one element.
   382  It was buggy, though, because if the slice's capacity was too small, the function would
   383  crash.
   384  (Our `Insert` example has the same problem.)
   385  Now we have the pieces in place to fix that, so let's write a robust implementation of
   386  `Extend` for integer slices.
   387  
   388  .code slices/prog110.go /func Extend/,/^}/
   389  
   390  In this case it's especially important to return the slice, since when it reallocates
   391  the resulting slice describes a completely different array.
   392  Here's a little snippet to demonstrate what happens as the slice fills up:
   393  
   394  .play -edit slices/prog110.go /START/,/END/
   395  
   396  Notice the reallocation when the initial array of size 5 is filled up.
   397  Both the capacity and the address of the zeroth element change when the new array is allocated.
   398  
   399  With the robust `Extend` function as a guide we can write an even nicer function that lets
   400  us extend the slice by multiple elements.
   401  To do this, we use Go's ability to turn a list of function arguments into a slice when the
   402  function is called.
   403  That is, we use Go's variadic function facility.
   404  
   405  Let's call the function `Append`.
   406  For the first version, we can just call `Extend` repeatedly so the mechanism of the variadic function is clear.
   407  The signature of `Append` is this:
   408  
   409  	func Append(slice []int, items ...int) []int
   410  
   411  What that says is that `Append` takes one argument, a slice, followed by zero or more
   412  `int` arguments.
   413  Those arguments are exactly a slice of `int` as far as the implementation
   414  of `Append` is concerned, as you can see:
   415  
   416  .code slices/prog120.go /Append/,/^}/
   417  
   418  Notice the `for` `range` loop iterating over the elements of the `items` argument, which has implied type `[]int`.
   419  Also notice the use of the blank identifier `_` to discard the index in the loop, which we don't need in this case.
   420  
   421  Try it:
   422  
   423  .play -edit slices/prog120.go /START/,/END/
   424  
   425  Another new technique is in this example is that we initialize the slice by writing a composite literal,
   426  which consists of the type of the slice followed by its elements in braces:
   427  
   428  .code slices/prog120.go /slice := /
   429  
   430  The `Append` function is interesting for another reason.
   431  Not only can we append elements, we can append a whole second slice
   432  by "exploding" the slice into arguments using the `...` notation at the call site:
   433  
   434  .play -edit slices/prog130.go /START/,/END/
   435  
   436  Of course, we can make `Append` more efficient by allocating no more than once,
   437  building on the innards of `Extend`:
   438  
   439  .code slices/prog140.go /Append/,/^}/
   440  
   441  Here, notice how we use `copy` twice, once to move the slice data to the newly
   442  allocated memory, and then to copy the appending items to the end of the old data.
   443  
   444  Try it; the behavior is the same as before:
   445  
   446  .play -edit slices/prog140.go /START/,/END/
   447  
   448  * Append: The built-in function
   449  
   450  And so we arrive at the motivation for the design of the `append` built-in function.
   451  It does exactly what our `Append` example does, with equivalent efficiency, but it
   452  works for any slice type.
   453  
   454  A weakness of Go is that any generic-type operations must be provided by the
   455  run-time. Some day that may change, but for now, to make working with slices
   456  easier, Go provides a built-in generic `append` function.
   457  It works the same as our `int` slice version, but for _any_ slice type.
   458  
   459  Remember, since the slice header is always updated by a call to `append`, you need
   460  to save the returned slice after the call.
   461  In fact, the compiler won't let you call append without saving the result.
   462  
   463  Here are some one-liners intermingled with print statements. Try them, edit them and explore:
   464  
   465  .play -edit slices/prog150.go /START/,/END/
   466  
   467  It's worth taking a moment to think about the final one-liner of that example in detail to understand
   468  how the design of slices makes it possible for this simple call to work correctly.
   469  
   470  There are lots more examples of `append`, `copy`, and other ways to use slices
   471  on the community-built
   472  [[https://golang.org/wiki/SliceTricks]["Slice Tricks" Wiki page]].
   473  
   474  * Nil
   475  
   476  As an aside, with our newfound knowledge we can see what the representation of a `nil` slice is.
   477  Naturally, it is the zero value of the slice header:
   478  
   479  	sliceHeader{
   480  		Length:        0,
   481  		Capacity:      0,
   482  		ZerothElement: nil,
   483  	}
   484  
   485  or just
   486  
   487  	sliceHeader{}
   488  
   489  The key detail is that the element pointer is `nil` too. The slice created by
   490  
   491  	array[0:0]
   492  
   493  has length zero (and maybe even capacity zero) but its pointer is not `nil`, so
   494  it is not a nil slice.
   495  
   496  As should be clear, an empty slice can grow (assuming it has non-zero capacity), but a `nil`
   497  slice has no array to put values in and can never grow to hold even one element.
   498  
   499  That said, a `nil` slice is functionally equivalent to a zero-length slice, even though it points
   500  to nothing.
   501  It has length zero and can be appended to, with allocation.
   502  As an example, look at the one-liner above that copies a slice by appending
   503  to a `nil` slice.
   504  
   505  * Strings
   506  
   507  Now a brief section about strings in Go in the context of slices.
   508  
   509  Strings are actually very simple: they are just read-only slices of bytes with a bit
   510  of extra syntactic support from the language.
   511  
   512  Because they are read-only, there is no need for a capacity (you can't grow them),
   513  but otherwise for most purposes you can treat them just like read-only slices
   514  of bytes.
   515  
   516  For starters, we can index them to access individual bytes:
   517  
   518  	slash := "/usr/ken"[0] // yields the byte value '/'.
   519  
   520  We can slice a string to grab a substring:
   521  
   522  	usr := "/usr/ken"[0:4] // yields the string "/usr"
   523  
   524  It should be obvious now what's going on behind the scenes when we slice a string.
   525  
   526  We can also take a normal slice of bytes and create a string from it with the simple conversion:
   527  
   528  	str := string(slice)
   529  
   530  and go in the reverse direction as well:
   531  
   532  	slice := []byte(usr)
   533  
   534  The array underlying a string is hidden from view; there is no way to access its contents
   535  except through the string. That means that when we do either of these conversions, a
   536  copy of the array must be made.
   537  Go takes care of this, of course, so you don't have to.
   538  After either of these conversions, modifications to
   539  the array underlying the byte slice don't affect the corresponding string.
   540  
   541  An important consequence of this slice-like design for strings is that
   542  creating a substring is very efficient.
   543  All that needs to happen
   544  is the creation of a two-word string header. Since the string is read-only, the original
   545  string and the string resulting from the slice operation can share the same array safely.
   546  
   547  A historical note: The earliest implementation of strings always allocated, but when slices
   548  were added to the language, they provided a model for efficient string handling. Some of
   549  the benchmarks saw huge speedups as a result.
   550  
   551  There's much more to strings, of course, but they are a topic for another post.
   552  
   553  * Conclusion
   554  
   555  To understand how slices work, it helps to understand how they are implemented.
   556  There is a little data structure, the slice header, that is the item associated with the slice
   557  variable, and that header describes a section of a separately allocated array.
   558  When we pass slice values around, the header gets copied but the array it points
   559  to is always shared.
   560  
   561  Once you appreciate how they work, slices become not only easy to use, but
   562  powerful and expressive, especially with the help of the `copy` and `append`
   563  built-in functions.
   564  
   565  * More reading
   566  
   567  There's lots to find around the intertubes about slices in Go.
   568  As mentioned earlier,
   569  the [[https://golang.org/wiki/SliceTricks]["Slice Tricks" Wiki page]]
   570  has many examples.
   571  The [[http://blog.golang.org/go-slices-usage-and-internals][Go Slices]] blog post
   572  describes the memory layout details with clear diagrams.
   573  Russ Cox's [[http://research.swtch.com/godata][Go Data Structures]] article includes
   574  a discussion of slices along with some of Go's other internal data structures.
   575  
   576  There is much more material available, but the best way to learn about slices is to use them.