github.com/feiyang21687/docker@v1.5.0/docs/sources/articles/security.md (about)

     1  page_title: Docker Security
     2  page_description: Review of the Docker Daemon attack surface
     3  page_keywords: Docker, Docker documentation, security
     4  
     5  # Docker Security
     6  
     7  There are three major areas to consider when reviewing Docker security:
     8  
     9   - the intrinsic security of the kernel and its support for
    10     namespaces and cgroups;
    11   - the attack surface of the Docker daemon itself;
    12   - loopholes in the container configuration profile, either by default,
    13     or when customized by users.
    14   - the "hardening" security features of the kernel and how they
    15     interact with containers.
    16  
    17  ## Kernel Namespaces
    18  
    19  Docker containers are very similar to LXC containers, and they have
    20  similar security features. When you start a container with `docker
    21  run`, behind the scenes Docker creates a set of namespaces and control
    22  groups for the container.
    23  
    24  **Namespaces provide the first and most straightforward form of
    25  isolation**: processes running within a container cannot see, and even
    26  less affect, processes running in another container, or in the host
    27  system.
    28  
    29  **Each container also gets its own network stack**, meaning that a
    30  container doesn't get privileged access to the sockets or interfaces
    31  of another container. Of course, if the host system is setup
    32  accordingly, containers can interact with each other through their
    33  respective network interfaces — just like they can interact with
    34  external hosts. When you specify public ports for your containers or use
    35  [*links*](/userguide/dockerlinks)
    36  then IP traffic is allowed between containers. They can ping each other,
    37  send/receive UDP packets, and establish TCP connections, but that can be
    38  restricted if necessary. From a network architecture point of view, all
    39  containers on a given Docker host are sitting on bridge interfaces. This
    40  means that they are just like physical machines connected through a
    41  common Ethernet switch; no more, no less.
    42  
    43  How mature is the code providing kernel namespaces and private
    44  networking? Kernel namespaces were introduced [between kernel version
    45  2.6.15 and
    46  2.6.26](http://lxc.sourceforge.net/index.php/about/kernel-namespaces/).
    47  This means that since July 2008 (date of the 2.6.26 release, now 5 years
    48  ago), namespace code has been exercised and scrutinized on a large
    49  number of production systems. And there is more: the design and
    50  inspiration for the namespaces code are even older. Namespaces are
    51  actually an effort to reimplement the features of [OpenVZ](
    52  http://en.wikipedia.org/wiki/OpenVZ) in such a way that they could be
    53  merged within the mainstream kernel. And OpenVZ was initially released
    54  in 2005, so both the design and the implementation are pretty mature.
    55  
    56  ## Control Groups
    57  
    58  Control Groups are another key component of Linux Containers. They
    59  implement resource accounting and limiting. They provide many
    60  useful metrics, but they also help ensure that each container gets
    61  its fair share of memory, CPU, disk I/O; and, more importantly, that a
    62  single container cannot bring the system down by exhausting one of those
    63  resources.
    64  
    65  So while they do not play a role in preventing one container from
    66  accessing or affecting the data and processes of another container, they
    67  are essential to fend off some denial-of-service attacks. They are
    68  particularly important on multi-tenant platforms, like public and
    69  private PaaS, to guarantee a consistent uptime (and performance) even
    70  when some applications start to misbehave.
    71  
    72  Control Groups have been around for a while as well: the code was
    73  started in 2006, and initially merged in kernel 2.6.24.
    74  
    75  ## Docker Daemon Attack Surface
    76  
    77  Running containers (and applications) with Docker implies running the
    78  Docker daemon. This daemon currently requires `root` privileges, and you
    79  should therefore be aware of some important details.
    80  
    81  First of all, **only trusted users should be allowed to control your
    82  Docker daemon**. This is a direct consequence of some powerful Docker
    83  features. Specifically, Docker allows you to share a directory between
    84  the Docker host and a guest container; and it allows you to do so
    85  without limiting the access rights of the container. This means that you
    86  can start a container where the `/host` directory will be the `/` directory
    87  on your host; and the container will be able to alter your host filesystem
    88  without any restriction. This is similar to how virtualization systems
    89  allow filesystem resource sharing. Nothing prevents you from sharing your
    90  root filesystem (or even your root block device) with a virtual machine.
    91  
    92  This has a strong security implication: for example, if you instrument Docker
    93  from a web server to provision containers through an API, you should be
    94  even more careful than usual with parameter checking, to make sure that
    95  a malicious user cannot pass crafted parameters causing Docker to create
    96  arbitrary containers.
    97  
    98  For this reason, the REST API endpoint (used by the Docker CLI to
    99  communicate with the Docker daemon) changed in Docker 0.5.2, and now
   100  uses a UNIX socket instead of a TCP socket bound on 127.0.0.1 (the
   101  latter being prone to cross-site-scripting attacks if you happen to run
   102  Docker directly on your local machine, outside of a VM). You can then
   103  use traditional UNIX permission checks to limit access to the control
   104  socket.
   105  
   106  You can also expose the REST API over HTTP if you explicitly decide so.
   107  However, if you do that, being aware of the above mentioned security
   108  implication, you should ensure that it will be reachable only from a
   109  trusted network or VPN; or protected with e.g., `stunnel` and client SSL
   110  certificates. You can also secure them with [HTTPS and
   111  certificates](/articles/https/).
   112  
   113  The daemon is also potentially vulnerable to other inputs, such as image
   114  loading from either disk with 'docker load', or from the network with
   115  'docker pull'. This has been a focus of improvement in the community,
   116  especially for 'pull' security. While these overlap, it should be noted
   117  that 'docker load' is a mechanism for backup and restore and is not
   118  currently considered a secure mechanism for loading images. As of
   119  Docker 1.3.2, images are now extracted in a chrooted subprocess on
   120  Linux/Unix platforms, being the first-step in a wider effort toward
   121  privilege separation.
   122  
   123  Eventually, it is expected that the Docker daemon will run restricted
   124  privileges, delegating operations well-audited sub-processes,
   125  each with its own (very limited) scope of Linux capabilities, 
   126  virtual network setup, filesystem management, etc. That is, most likely,
   127  pieces of the Docker engine itself will run inside of containers.
   128  
   129  Finally, if you run Docker on a server, it is recommended to run
   130  exclusively Docker in the server, and move all other services within
   131  containers controlled by Docker. Of course, it is fine to keep your
   132  favorite admin tools (probably at least an SSH server), as well as
   133  existing monitoring/supervision processes (e.g., NRPE, collectd, etc).
   134  
   135  ## Linux Kernel Capabilities
   136  
   137  By default, Docker starts containers with a restricted set of
   138  capabilities. What does that mean?
   139  
   140  Capabilities turn the binary "root/non-root" dichotomy into a
   141  fine-grained access control system. Processes (like web servers) that
   142  just need to bind on a port below 1024 do not have to run as root: they
   143  can just be granted the `net_bind_service` capability instead. And there
   144  are many other capabilities, for almost all the specific areas where root
   145  privileges are usually needed.
   146  
   147  This means a lot for container security; let's see why!
   148  
   149  Your average server (bare metal or virtual machine) needs to run a bunch
   150  of processes as root. Those typically include SSH, cron, syslogd;
   151  hardware management tools (e.g., load modules), network configuration
   152  tools (e.g., to handle DHCP, WPA, or VPNs), and much more. A container is
   153  very different, because almost all of those tasks are handled by the
   154  infrastructure around the container:
   155  
   156   - SSH access will typically be managed by a single server running on
   157     the Docker host;
   158   - `cron`, when necessary, should run as a user
   159     process, dedicated and tailored for the app that needs its
   160     scheduling service, rather than as a platform-wide facility;
   161   - log management will also typically be handed to Docker, or by
   162     third-party services like Loggly or Splunk;
   163   - hardware management is irrelevant, meaning that you never need to
   164     run `udevd` or equivalent daemons within
   165     containers;
   166   - network management happens outside of the containers, enforcing
   167     separation of concerns as much as possible, meaning that a container
   168     should never need to perform `ifconfig`,
   169     `route`, or ip commands (except when a container
   170     is specifically engineered to behave like a router or firewall, of
   171     course).
   172  
   173  This means that in most cases, containers will not need "real" root
   174  privileges *at all*. And therefore, containers can run with a reduced
   175  capability set; meaning that "root" within a container has much less
   176  privileges than the real "root". For instance, it is possible to:
   177  
   178   - deny all "mount" operations;
   179   - deny access to raw sockets (to prevent packet spoofing);
   180   - deny access to some filesystem operations, like creating new device
   181     nodes, changing the owner of files, or altering attributes (including
   182     the immutable flag);
   183   - deny module loading;
   184   - and many others.
   185  
   186  This means that even if an intruder manages to escalate to root within a
   187  container, it will be much harder to do serious damage, or to escalate
   188  to the host.
   189  
   190  This won't affect regular web apps; but malicious users will find that
   191  the arsenal at their disposal has shrunk considerably! By default Docker
   192  drops all capabilities except [those
   193  needed](https://github.com/docker/docker/blob/master/daemon/execdriver/native/template/default_template.go),
   194  a whitelist instead of a blacklist approach. You can see a full list of
   195  available capabilities in [Linux
   196  manpages](http://man7.org/linux/man-pages/man7/capabilities.7.html).
   197  
   198  One primary risk with running Docker containers is that the default set
   199  of capabilities and mounts given to a container may provide incomplete
   200  isolation, either independently, or when used in combination with
   201  kernel vulnerabilities.
   202  
   203  Docker supports the addition and removal of capabilities, allowing use
   204  of a non-default profile. This may make Docker more secure through
   205  capability removal, or less secure through the addition of capabilities.
   206  The best practice for users would be to remove all capabilities except
   207  those explicitly required for their processes.
   208  
   209  ## Other Kernel Security Features
   210  
   211  Capabilities are just one of the many security features provided by
   212  modern Linux kernels. It is also possible to leverage existing,
   213  well-known systems like TOMOYO, AppArmor, SELinux, GRSEC, etc. with
   214  Docker.
   215  
   216  While Docker currently only enables capabilities, it doesn't interfere
   217  with the other systems. This means that there are many different ways to
   218  harden a Docker host. Here are a few examples.
   219  
   220   - You can run a kernel with GRSEC and PAX. This will add many safety
   221     checks, both at compile-time and run-time; it will also defeat many
   222     exploits, thanks to techniques like address randomization. It doesn't
   223     require Docker-specific configuration, since those security features
   224     apply system-wide, independent of containers.
   225   - If your distribution comes with security model templates for
   226     Docker containers, you can use them out of the box. For instance, we
   227     ship a template that works with AppArmor and Red Hat comes with SELinux
   228     policies for Docker. These templates provide an extra safety net (even
   229     though it overlaps greatly with capabilities).
   230   - You can define your own policies using your favorite access control
   231     mechanism.
   232  
   233  Just like there are many third-party tools to augment Docker containers
   234  with e.g., special network topologies or shared filesystems, you can
   235  expect to see tools to harden existing Docker containers without
   236  affecting Docker's core.
   237  
   238  Recent improvements in Linux namespaces will soon allow to run
   239  full-featured containers without root privileges, thanks to the new user
   240  namespace. This is covered in detail [here](
   241  http://s3hh.wordpress.com/2013/07/19/creating-and-using-containers-without-privilege/).
   242  Moreover, this will solve the problem caused by sharing filesystems
   243  between host and guest, since the user namespace allows users within
   244  containers (including the root user) to be mapped to other users in the
   245  host system.
   246  
   247  Today, Docker does not directly support user namespaces, but they
   248  may still be utilized by Docker containers on supported kernels,
   249  by directly using the clone syscall, or utilizing the 'unshare'
   250  utility. Using this, some users may find it possible to drop
   251  more capabilities from their process as user namespaces provide
   252  an artifical capabilities set. Likewise, however, this artifical
   253  capabilities set may require use of 'capsh' to restrict the
   254  user-namespace capabilities set when using 'unshare'.
   255  
   256  Eventually, it is expected that Docker will direct, native support
   257  for user-namespaces, simplifying the process of hardening containers.
   258  
   259  ## Conclusions
   260  
   261  Docker containers are, by default, quite secure; especially if you take
   262  care of running your processes inside the containers as non-privileged
   263  users (i.e., non-`root`).
   264  
   265  You can add an extra layer of safety by enabling Apparmor, SELinux,
   266  GRSEC, or your favorite hardening solution.
   267  
   268  Last but not least, if you see interesting security features in other
   269  containerization systems, these are simply kernels features that may
   270  be implemented in Docker as well. We welcome users to submit issues,
   271  pull requests, and communicate via the mailing list.
   272  
   273  References:
   274  * [Docker Containers: How Secure Are They? (2013)](
   275  http://blog.docker.com/2013/08/containers-docker-how-secure-are-they/).
   276  * [On the Security of Containers (2014)](https://medium.com/@ewindisch/on-the-security-of-containers-2c60ffe25a9e).