github.com/feiyang21687/docker@v1.5.0/docs/sources/articles/dockerfile_best-practices.md

github.com/feiyang21687/docker@v1.5.0/docs/sources/articles/dockerfile_best-practices.md (about)

     1  page_title: Best Practices for Writing Dockerfiles
     2  page_description: Hints, tips and guidelines for writing clean, reliable Dockerfiles
     3  page_keywords: Examples, Usage, base image, docker, documentation, dockerfile, best practices, hub, official repo
     4  
     5  # Best practices for writing Dockerfiles
     6  
     7  ## Overview
     8  
     9  Docker can build images automatically by reading the instructions from a
    10  `Dockerfile`, a text file that contains all the commands, in order, needed to
    11  build a given image. `Dockerfile`s adhere to a specific format and use a
    12  specific set of instructions. You can learn the basics on the 
    13  [Dockerfile Reference](https://docs.docker.com/reference/builder/) page. If
    14  you’re new to writing `Dockerfile`s, you should start there.
    15  
    16  This document covers the best practices and methods recommended by Docker,
    17  Inc. and the Docker community for creating easy-to-use, effective
    18  `Dockerfile`s. We strongly suggest you follow these recommendations (in fact,
    19  if you’re creating an Official Image, you *must* adhere to these practices).
    20  
    21  You can see many of these practices and recommendations in action in the [buildpack-deps `Dockerfile`](https://github.com/docker-library/buildpack-deps/blob/master/jessie/Dockerfile).
    22  
    23  > Note: for more detailed explanations of any of the Dockerfile commands
    24  >mentioned here, visit the [Dockerfile Reference](https://docs.docker.com/reference/builder/) page.
    25  
    26  ## General guidelines and recommendations
    27  
    28  ### Containers should be ephemeral
    29  
    30  The container produced by the image your `Dockerfile` defines should be as
    31  ephemeral as possible. By “ephemeral,” we mean that it can be stopped and
    32  destroyed and a new one built and put in place with an absolute minimum of
    33  set-up and configuration.
    34  
    35  ### Use [a .dockerignore file](https://docs.docker.com/reference/builder/#the-dockerignore-file)
    36  
    37  For faster uploading and efficiency during `docker build`, you should use
    38  a `.dockerignore` file to exclude files or directories from the build
    39  context and final image. For example, unless`.git` is needed by your build
    40  process or scripts, you should add it to `.dockerignore`, which can save many
    41  megabytes worth of upload time.
    42  
    43  ### Avoid installing unnecessary packages
    44  
    45  In order to reduce complexity, dependencies, file sizes, and build times, you
    46  should avoid installing extra or unnecessary packages just because they
    47  might be “nice to have.” For example, you don’t need to include a text editor
    48  in a database image.
    49  
    50  ### Run only one process per container
    51  
    52  In almost all cases, you should only run a single process in a single
    53  container. Decoupling applications into multiple containers makes it much
    54  easier to scale horizontally and reuse containers. If that service depends on
    55  another service, make use of [container linking](https://docs.docker.com/userguide/dockerlinks/).
    56  
    57  ### Minimize the number of layers
    58  
    59  You need to find the balance between readability (and thus long-term
    60  maintainability) of the `Dockerfile` and minimizing the number of layers it
    61  uses. Be strategic and cautious about the number of layers you use.
    62  
    63  ### Sort multi-line arguments
    64  
    65  Whenever possible, ease later changes by sorting multi-line arguments
    66  alphanumerically. This will help you avoid duplication of packages and make the
    67  list much easier to update. This also makes PRs a lot easier to read and
    68  review. Adding a space before a backslash (`\`) helps as well. 
    69  
    70  Here’s an example from the [`buildpack-deps` image](https://github.com/docker-library/buildpack-deps):
    71  
    72      RUN apt-get update && apt-get install -y \
    73        bzr \
    74        cvs \
    75        git \
    76        mercurial \
    77        subversion
    78  
    79  ### Build cache
    80  
    81  During the process of building an image Docker will step through the
    82  instructions in your `Dockerfile` executing each in the order specified.
    83  As each instruction is examined Docker will look for an existing image in its
    84  cache that it can reuse, rather than creating a new (duplicate) image.
    85  If you do not want to use the cache at all you can use the ` --no-cache=true`
    86  option on the `docker build` command.
    87  
    88  However, if you do let Docker use its cache then it is very important to
    89  understand when it will, and will not, find a matching image. The basic rules
    90  that Docker will follow are outlined below:
    91  
    92  * Starting with a base image that is already in the cache, the next
    93  instruction is compared against all child images derived from that base
    94  image to see if one of them was built using the exact same instruction. If
    95  not, the cache is invalidated.
    96  
    97  * In most cases simply comparing the instruction in the `Dockerfile` with one
    98  of the child images is sufficient.  However, certain instructions require
    99  a little more examination and explanation.
   100  
   101  * In the case of the `ADD` and `COPY` instructions, the contents of the file(s)
   102  being put into the image are examined. Specifically, a checksum is done
   103  of the file(s) and then that checksum is used during the cache lookup.
   104  If anything has changed in the file(s), including its metadata,
   105  then the cache is invalidated.
   106  
   107  * Aside from the `ADD` and `COPY` commands cache checking will not look at the
   108  files in the container to determine a cache match. For example, when processing
   109  a `RUN apt-get -y update` command the files updated in the container
   110  will not be examined to determine if a cache hit exists.  In that case just
   111  the command string itself will be used to find a match.
   112  
   113  Once the cache is invalidated, all subsequent `Dockerfile` commands will
   114  generate new images and the cache will not be used.
   115  
   116  ## The Dockerfile instructions
   117  
   118  Below you'll find recommendations for the best way to write the
   119  various instructions available for use in a `Dockerfile`.
   120  
   121  ### [`FROM`](https://docs.docker.com/reference/builder/#from)
   122  
   123  Whenever possible, use current Official Repositories as the basis for your
   124  image. We recommend the [Debian image](https://registry.hub.docker.com/_/debian/)
   125  since it’s very tightly controlled and kept extremely minimal (currently under
   126  100 mb), while still being a full distribution.
   127  
   128  ### [`RUN`](https://docs.docker.com/reference/builder/#run)
   129  
   130  As always, to make your `Dockerfile` more readable, understandable, and
   131  maintainable, put long or complex `RUN` statements on multiple lines separated
   132  with backslashes.
   133  
   134  Probably the most common use-case for `RUN` is an application of `apt-get`.
   135  When using `apt-get`, here are a few things to keep in mind:
   136  
   137  * Don’t do `RUN apt-get update` on a single line. This will cause
   138  caching issues if the referenced archive gets updated, which will make your
   139  subsequent `apt-get install` fail without comment.
   140  
   141  * Avoid `RUN apt-get upgrade` or `dist-upgrade`, since many of the “essential”
   142  packages from the base images will fail to upgrade inside an unprivileged
   143  container. If a base package is out of date, you should contact its
   144  maintainers. If you know there’s a particular package, `foo`, that needs to be
   145  updated, use `apt-get install -y foo` and it will update automatically.
   146  
   147  * Do write instructions like:
   148  
   149      RUN apt-get update && apt-get install -y package-bar package-foo package-baz
   150  
   151  Writing the instruction this way not only makes it easier to read
   152  and maintain, but also, by including `apt-get update`, ensures that the cache
   153  will naturally be busted and the latest versions will be installed with no
   154  further coding or manual intervention required.
   155  
   156  * Further natural cache-busting can be realized by version-pinning packages
   157  (e.g., `package-foo=1.3.*`). This will force retrieval of that version
   158  regardless of what’s in the cache.
   159  Writing your `apt-get` code this way will greatly ease maintenance and reduce
   160  failures due to unanticipated changes in required packages.
   161  
   162  #### Example
   163  
   164  Below is a well-formed `RUN` instruction that demonstrates the above
   165  recommendations. Note that the last package, `s3cmd`, specifies a version
   166  `1.1.0*`. If the image previously used an older version, specifying the new one
   167  will cause a cache bust of `apt-get update` and ensure the installation of
   168  the new version (which in this case had a new, required feature).
   169  
   170      RUN apt-get update && apt-get install -y \
   171          aufs-tools \
   172          automake \
   173          btrfs-tools \
   174          build-essential \
   175          curl \
   176          dpkg-sig \
   177          git \
   178          iptables \
   179          libapparmor-dev \
   180          libcap-dev \
   181          libsqlite3-dev \
   182          lxc=1.0* \
   183          mercurial \
   184          parallel \
   185          reprepro \
   186          ruby1.9.1 \
   187          ruby1.9.1-dev \
   188          s3cmd=1.1.0*
   189  
   190  Writing the instruction this way also helps you avoid potential duplication of
   191  a given package because it is much easier to read than an instruction like:
   192  
   193      RUN apt-get install -y package-foo && apt-get install -y package-bar
   194      
   195  ### [`CMD`](https://docs.docker.com/reference/builder/#cmd)
   196  
   197  The `CMD` instruction should be used to run the software contained by your
   198  image, along with any arguments. `CMD` should almost always be used in the
   199  form of `CMD [“executable”, “param1”, “param2”…]`. Thus, if the image is for a
   200  service (Apache, Rails, etc.), you would run something like
   201  `CMD ["apache2","-DFOREGROUND"]`. Indeed, this form of the instruction is
   202  recommended for any service-based image.
   203  
   204  In most other cases, `CMD` should be given an interactive shell (bash, python,
   205  perl, etc), for example, `CMD ["perl", "-de0"]`, `CMD ["python"]`, or
   206  `CMD [“php”, “-a”]`. Using this form means that when you execute something like
   207  `docker run -it python`, you’ll get dropped into a usable shell, ready to go.
   208  `CMD` should rarely be used in the manner of `CMD [“param”, “param”]` in
   209  conjunction with [`ENTRYPOINT`](https://docs.docker.com/reference/builder/#entrypoint), unless
   210  you and your expected users are already quite familiar with how `ENTRYPOINT`
   211  works. 
   212  
   213  ### [`EXPOSE`](https://docs.docker.com/reference/builder/#expose)
   214  
   215  The `EXPOSE` instruction indicates the ports on which a container will listen
   216  for connections. Consequently, you should use the common, traditional port for
   217  your application. For example, an image containing the Apache web server would
   218  use `EXPOSE 80`, while an image containing MongoDB would use `EXPOSE 27017` and
   219  so on.
   220  
   221  For external access, your users can execute `docker run` with a flag indicating
   222  how to map the specified port to the port of their choice.
   223  For container linking, Docker provides environment variables for the path from
   224  the recipient container back to the source (ie, `MYSQL_PORT_3306_TCP`).
   225  
   226  ### [`ENV`](https://docs.docker.com/reference/builder/#env)
   227  
   228  In order to make new software easier to run, you can use `ENV` to update the
   229  `PATH` environment variable for the software your container installs. For
   230  example, `ENV PATH /usr/local/nginx/bin:$PATH` will ensure that `CMD [“nginx”]`
   231  just works.
   232  
   233  The `ENV` instruction is also useful for providing required environment
   234  variables specific to services you wish to containerize, such as Postgres’s
   235  `PGDATA`.
   236  
   237  Lastly, `ENV` can also be used to set commonly used version numbers so that
   238  version bumps are easier to maintain, as seen in the following example:
   239  
   240      ENV PG_MAJOR 9.3
   241      ENV PG_VERSION 9.3.4
   242      RUN curl -SL http://example.com/postgres-$PG_VERSION.tar.xz | tar -xJC /usr/src/postgress && …
   243      ENV PATH /usr/local/postgres-$PG_MAJOR/bin:$PATH
   244  
   245  Similar to having constant variables in a program (as opposed to hard-coding
   246  values), this approach lets you change a single `ENV` instruction to
   247  auto-magically bump the version of the software in your container.
   248  
   249  ### [`ADD`](https://docs.docker.com/reference/builder/#add) or [`COPY`](https://docs.docker.com/reference/builder/#copy)
   250  
   251  Although `ADD` and `COPY` are functionally similar, generally speaking, `COPY`
   252  is preferred. That’s because it’s more transparent than `ADD`. `COPY` only
   253  supports the basic copying of local files into the container, while `ADD` has
   254  some features (like local-only tar extraction and remote URL support) that are
   255  not immediately obvious. Consequently, the best use for `ADD` is local tar file
   256  auto-extraction into the image, as in `ADD rootfs.tar.xz /`.
   257  
   258  If you have multiple `Dockerfile` steps that use different files from your
   259  context, `COPY` them individually, rather than all at once. This will ensure that
   260  each step's build cache is only invalidated (forcing the step to be re-run) if the
   261  specifically required files change.
   262  
   263  For example:
   264  
   265      COPY requirements.txt /tmp/
   266      RUN pip install /tmp/requirements.txt
   267      COPY . /tmp/
   268  
   269  Results in fewer cache invalidations for the `RUN` step, than if you put the
   270  `COPY . /tmp/` before it.
   271  
   272  Because image size matters, using `ADD` to fetch packages from remote URLs is
   273  strongly discouraged; you should use `curl` or `wget` instead. That way you can
   274  delete the files you no longer need after they've been extracted and you won't
   275  have to add another layer in your image. For example, you should avoid doing
   276  things like:
   277  
   278      ADD http://example.com/big.tar.xz /usr/src/things/
   279      RUN tar -xJf /usr/src/things/big.tar.xz -C /usr/src/things
   280      RUN make -C /usr/src/things all
   281  
   282  And instead, do something like:
   283  
   284      RUN mkdir -p /usr/src/things \
   285          && curl -SL http://example.com/big.tar.gz \
   286          | tar -xJC /usr/src/things \
   287          && make -C /usr/src/things all
   288  
   289  For other items (files, directories) that do not require `ADD`’s tar
   290  auto-extraction capability, you should always use `COPY`.
   291  
   292  ### [`ENTRYPOINT`](https://docs.docker.com/reference/builder/#entrypoint)
   293  
   294  The best use for `ENTRYPOINT` is as a helper script. Using `ENTRYPOINT` for
   295  other tasks can make your code harder to understand. For example,
   296  
   297  ....docker run -it official-image bash
   298  
   299  is much easier to understand than
   300  
   301  ....docker run -it --entrypoint bash official-image -i
   302  
   303  This is especially true for new Docker users, who might naturally assume the
   304  above command will work fine. In cases where an image uses `ENTRYPOINT` for
   305  anything other than just a wrapper script, the command will fail and the
   306  beginning user will then be forced to learn about `ENTRYPOINT` and
   307  `--entrypoint`.
   308  
   309  In order to avoid a situation where commands are run without clear visibility
   310  to the user, make sure your script ends with something like `exec "$@"` (see
   311  [the exec builtin command](http://wiki.bash-hackers.org/commands/builtin/exec)).
   312  After the entrypoint completes, the script will transparently bootstrap the command
   313  invoked by the user, making what has been run clear to the user (for example,
   314  `docker run -it mysql mysqld --some --flags` will transparently run
   315  `mysqld --some --flags` after `ENTRYPOINT` runs `initdb`).
   316  
   317  For example, let’s look at the `Dockerfile` for the
   318  [Postgres Official Image](https://github.com/docker-library/postgres).
   319  It refers to the following script: 
   320  
   321  ```bash
   322  #!/bin/bash
   323  set -e
   324  
   325  if [ "$1" = 'postgres' ]; then
   326      chown -R postgres "$PGDATA"
   327  
   328      if [ -z "$(ls -A "$PGDATA")" ]; then
   329          gosu postgres initdb
   330      fi
   331  
   332      exec gosu postgres "$@"
   333  fi
   334  
   335  exec "$@"
   336  ```
   337  
   338  That script then gets copied into the container and run via `ENTRYPOINT` on
   339  container startup:
   340  
   341      COPY ./docker-entrypoint.sh /
   342      ENTRYPOINT ["/docker-entrypoint.sh"]
   343  
   344  ### [`VOLUME`](https://docs.docker.com/reference/builder/#volume)
   345  
   346  The `VOLUME` instruction should be used to expose any database storage area,
   347  configuration storage, or files/folders created by your docker container. You
   348  are strongly encouraged to use `VOLUME` for any mutable and/or user-serviceable
   349  parts of your image.
   350  
   351  ### [`USER`](https://docs.docker.com/reference/builder/#user)
   352  
   353  If a service can run without privileges, use `USER` to change to a non-root
   354  user. Start by creating the user and group in the `Dockerfile` with something
   355  like `RUN groupadd -r postgres && useradd -r -g postgres postgres`.
   356  
   357  > **Note:** Users and groups in an image get a non-deterministic
   358  > UID/GID in that the “next” UID/GID gets assigned regardless of image
   359  > rebuilds. So, if it’s critical, you should assign an explicit UID/GID.
   360  
   361  You should avoid installing or using `sudo` since it has unpredictable TTY and
   362  signal-forwarding behavior that can cause more problems than it solves. If
   363  you absolutely need functionality similar to `sudo` (e.g., initializing the
   364  daemon as root but running it as non-root), you may be able to use
   365  [“gosu”](https://github.com/tianon/gosu). 
   366  
   367  Lastly, to reduce layers and complexity, avoid switching `USER` back
   368  and forth frequently.
   369  
   370  ### [`WORKDIR`](https://docs.docker.com/reference/builder/#workdir)
   371  
   372  For clarity and reliability, you should always use absolute paths for your
   373  `WORKDIR`. Also, you should use `WORKDIR` instead of  proliferating
   374  instructions like `RUN cd … && do-something`, which are hard to read,
   375  troubleshoot, and maintain.
   376  
   377  ### [`ONBUILD`](https://docs.docker.com/reference/builder/#onbuild)
   378  
   379  `ONBUILD` is only useful for images that are going to be built `FROM` a given
   380  image. For example, you would use `ONBUILD` for a language stack image that
   381  builds arbitrary user software written in that language within the
   382  `Dockerfile`, as you can see in [Ruby’s `ONBUILD` variants](https://github.com/docker-library/ruby/blob/master/2.1/onbuild/Dockerfile). 
   383  
   384  Images built from `ONBUILD` should get a separate tag, for example:
   385  `ruby:1.9-onbuild` or `ruby:2.0-onbuild`.
   386  
   387  Be careful when putting `ADD` or `COPY` in `ONBUILD`. The “onbuild” image will
   388  fail catastrophically if the new build's context is missing the resource being
   389  added. Adding a separate tag, as recommended above, will help mitigate this by
   390  allowing the `Dockerfile` author to make a choice.
   391  
   392  ## Examples For Official Repositories
   393  
   394  These Official Repos have exemplary `Dockerfile`s:
   395  
   396  * [Go](https://registry.hub.docker.com/_/golang/)
   397  * [Perl](https://registry.hub.docker.com/_/perl/)
   398  * [Hy](https://registry.hub.docker.com/_/hylang/)
   399  * [Rails](https://registry.hub.docker.com/_/rails)
   400  
   401  ## Additional Resources:
   402  
   403  * [Dockerfile Reference](https://docs.docker.com/reference/builder/#onbuild)
   404  * [More about Base Images](https://docs.docker.com/articles/baseimages/)
   405  * [More about Automated Builds](https://docs.docker.com/docker-hub/builds/)
   406  * [Guidelines for Creating Official 
   407  Repositories](https://docs.docker.com/docker-hub/official_repos/)