github.com/slene/docker@v1.8.0-rc1/docs/articles/security.md (about)

     1  <!--[metadata]>
     2  +++
     3  title = "Docker security"
     4  description = "Review of the Docker Daemon attack surface"
     5  keywords = ["Docker, Docker documentation,  security"]
     6  [menu.main]
     7  parent = "smn_administrate"
     8  weight = 2
     9  +++
    10  <![end-metadata]-->
    11  
    12  # Docker security
    13  
    14  There are three major areas to consider when reviewing Docker security:
    15  
    16   - the intrinsic security of the kernel and its support for
    17     namespaces and cgroups;
    18   - the attack surface of the Docker daemon itself;
    19   - loopholes in the container configuration profile, either by default,
    20     or when customized by users.
    21   - the "hardening" security features of the kernel and how they
    22     interact with containers.
    23  
    24  ## Kernel namespaces
    25  
    26  Docker containers are very similar to LXC containers, and they have
    27  similar security features. When you start a container with
    28  `docker run`, behind the scenes Docker creates a set of namespaces and control
    29  groups for the container.
    30  
    31  **Namespaces provide the first and most straightforward form of
    32  isolation**: processes running within a container cannot see, and even
    33  less affect, processes running in another container, or in the host
    34  system.
    35  
    36  **Each container also gets its own network stack**, meaning that a
    37  container doesn't get privileged access to the sockets or interfaces
    38  of another container. Of course, if the host system is setup
    39  accordingly, containers can interact with each other through their
    40  respective network interfaces — just like they can interact with
    41  external hosts. When you specify public ports for your containers or use
    42  [*links*](/userguide/dockerlinks)
    43  then IP traffic is allowed between containers. They can ping each other,
    44  send/receive UDP packets, and establish TCP connections, but that can be
    45  restricted if necessary. From a network architecture point of view, all
    46  containers on a given Docker host are sitting on bridge interfaces. This
    47  means that they are just like physical machines connected through a
    48  common Ethernet switch; no more, no less.
    49  
    50  How mature is the code providing kernel namespaces and private
    51  networking? Kernel namespaces were introduced [between kernel version
    52  2.6.15 and
    53  2.6.26](http://lxc.sourceforge.net/index.php/about/kernel-namespaces/).
    54  This means that since July 2008 (date of the 2.6.26 release, now 5 years
    55  ago), namespace code has been exercised and scrutinized on a large
    56  number of production systems. And there is more: the design and
    57  inspiration for the namespaces code are even older. Namespaces are
    58  actually an effort to reimplement the features of [OpenVZ](
    59  http://en.wikipedia.org/wiki/OpenVZ) in such a way that they could be
    60  merged within the mainstream kernel. And OpenVZ was initially released
    61  in 2005, so both the design and the implementation are pretty mature.
    62  
    63  ## Control groups
    64  
    65  Control Groups are another key component of Linux Containers. They
    66  implement resource accounting and limiting. They provide many
    67  useful metrics, but they also help ensure that each container gets
    68  its fair share of memory, CPU, disk I/O; and, more importantly, that a
    69  single container cannot bring the system down by exhausting one of those
    70  resources.
    71  
    72  So while they do not play a role in preventing one container from
    73  accessing or affecting the data and processes of another container, they
    74  are essential to fend off some denial-of-service attacks. They are
    75  particularly important on multi-tenant platforms, like public and
    76  private PaaS, to guarantee a consistent uptime (and performance) even
    77  when some applications start to misbehave.
    78  
    79  Control Groups have been around for a while as well: the code was
    80  started in 2006, and initially merged in kernel 2.6.24.
    81  
    82  ## Docker daemon attack surface
    83  
    84  Running containers (and applications) with Docker implies running the
    85  Docker daemon. This daemon currently requires `root` privileges, and you
    86  should therefore be aware of some important details.
    87  
    88  First of all, **only trusted users should be allowed to control your
    89  Docker daemon**. This is a direct consequence of some powerful Docker
    90  features. Specifically, Docker allows you to share a directory between
    91  the Docker host and a guest container; and it allows you to do so
    92  without limiting the access rights of the container. This means that you
    93  can start a container where the `/host` directory will be the `/` directory
    94  on your host; and the container will be able to alter your host filesystem
    95  without any restriction. This is similar to how virtualization systems
    96  allow filesystem resource sharing. Nothing prevents you from sharing your
    97  root filesystem (or even your root block device) with a virtual machine.
    98  
    99  This has a strong security implication: for example, if you instrument Docker
   100  from a web server to provision containers through an API, you should be
   101  even more careful than usual with parameter checking, to make sure that
   102  a malicious user cannot pass crafted parameters causing Docker to create
   103  arbitrary containers.
   104  
   105  For this reason, the REST API endpoint (used by the Docker CLI to
   106  communicate with the Docker daemon) changed in Docker 0.5.2, and now
   107  uses a UNIX socket instead of a TCP socket bound on 127.0.0.1 (the
   108  latter being prone to cross-site-scripting attacks if you happen to run
   109  Docker directly on your local machine, outside of a VM). You can then
   110  use traditional UNIX permission checks to limit access to the control
   111  socket.
   112  
   113  You can also expose the REST API over HTTP if you explicitly decide to do so.
   114  However, if you do that, being aware of the above mentioned security
   115  implication, you should ensure that it will be reachable only from a
   116  trusted network or VPN; or protected with e.g., `stunnel` and client SSL
   117  certificates. You can also secure them with [HTTPS and
   118  certificates](/articles/https/).
   119  
   120  The daemon is also potentially vulnerable to other inputs, such as image
   121  loading from either disk with 'docker load', or from the network with
   122  'docker pull'. This has been a focus of improvement in the community,
   123  especially for 'pull' security. While these overlap, it should be noted
   124  that 'docker load' is a mechanism for backup and restore and is not
   125  currently considered a secure mechanism for loading images. As of
   126  Docker 1.3.2, images are now extracted in a chrooted subprocess on
   127  Linux/Unix platforms, being the first-step in a wider effort toward
   128  privilege separation.
   129  
   130  Eventually, it is expected that the Docker daemon will run restricted
   131  privileges, delegating operations well-audited sub-processes,
   132  each with its own (very limited) scope of Linux capabilities, 
   133  virtual network setup, filesystem management, etc. That is, most likely,
   134  pieces of the Docker engine itself will run inside of containers.
   135  
   136  Finally, if you run Docker on a server, it is recommended to run
   137  exclusively Docker in the server, and move all other services within
   138  containers controlled by Docker. Of course, it is fine to keep your
   139  favorite admin tools (probably at least an SSH server), as well as
   140  existing monitoring/supervision processes (e.g., NRPE, collectd, etc).
   141  
   142  ## Linux kernel capabilities
   143  
   144  By default, Docker starts containers with a restricted set of
   145  capabilities. What does that mean?
   146  
   147  Capabilities turn the binary "root/non-root" dichotomy into a
   148  fine-grained access control system. Processes (like web servers) that
   149  just need to bind on a port below 1024 do not have to run as root: they
   150  can just be granted the `net_bind_service` capability instead. And there
   151  are many other capabilities, for almost all the specific areas where root
   152  privileges are usually needed.
   153  
   154  This means a lot for container security; let's see why!
   155  
   156  Your average server (bare metal or virtual machine) needs to run a bunch
   157  of processes as root. Those typically include SSH, cron, syslogd;
   158  hardware management tools (e.g., load modules), network configuration
   159  tools (e.g., to handle DHCP, WPA, or VPNs), and much more. A container is
   160  very different, because almost all of those tasks are handled by the
   161  infrastructure around the container:
   162  
   163   - SSH access will typically be managed by a single server running on
   164     the Docker host;
   165   - `cron`, when necessary, should run as a user
   166     process, dedicated and tailored for the app that needs its
   167     scheduling service, rather than as a platform-wide facility;
   168   - log management will also typically be handed to Docker, or by
   169     third-party services like Loggly or Splunk;
   170   - hardware management is irrelevant, meaning that you never need to
   171     run `udevd` or equivalent daemons within
   172     containers;
   173   - network management happens outside of the containers, enforcing
   174     separation of concerns as much as possible, meaning that a container
   175     should never need to perform `ifconfig`,
   176     `route`, or ip commands (except when a container
   177     is specifically engineered to behave like a router or firewall, of
   178     course).
   179  
   180  This means that in most cases, containers will not need "real" root
   181  privileges *at all*. And therefore, containers can run with a reduced
   182  capability set; meaning that "root" within a container has much less
   183  privileges than the real "root". For instance, it is possible to:
   184  
   185   - deny all "mount" operations;
   186   - deny access to raw sockets (to prevent packet spoofing);
   187   - deny access to some filesystem operations, like creating new device
   188     nodes, changing the owner of files, or altering attributes (including
   189     the immutable flag);
   190   - deny module loading;
   191   - and many others.
   192  
   193  This means that even if an intruder manages to escalate to root within a
   194  container, it will be much harder to do serious damage, or to escalate
   195  to the host.
   196  
   197  This won't affect regular web apps; but malicious users will find that
   198  the arsenal at their disposal has shrunk considerably! By default Docker
   199  drops all capabilities except [those
   200  needed](https://github.com/docker/docker/blob/master/daemon/execdriver/native/template/default_template.go),
   201  a whitelist instead of a blacklist approach. You can see a full list of
   202  available capabilities in [Linux
   203  manpages](http://man7.org/linux/man-pages/man7/capabilities.7.html).
   204  
   205  One primary risk with running Docker containers is that the default set
   206  of capabilities and mounts given to a container may provide incomplete
   207  isolation, either independently, or when used in combination with
   208  kernel vulnerabilities.
   209  
   210  Docker supports the addition and removal of capabilities, allowing use
   211  of a non-default profile. This may make Docker more secure through
   212  capability removal, or less secure through the addition of capabilities.
   213  The best practice for users would be to remove all capabilities except
   214  those explicitly required for their processes.
   215  
   216  ## Other kernel security features
   217  
   218  Capabilities are just one of the many security features provided by
   219  modern Linux kernels. It is also possible to leverage existing,
   220  well-known systems like TOMOYO, AppArmor, SELinux, GRSEC, etc. with
   221  Docker.
   222  
   223  While Docker currently only enables capabilities, it doesn't interfere
   224  with the other systems. This means that there are many different ways to
   225  harden a Docker host. Here are a few examples.
   226  
   227   - You can run a kernel with GRSEC and PAX. This will add many safety
   228     checks, both at compile-time and run-time; it will also defeat many
   229     exploits, thanks to techniques like address randomization. It doesn't
   230     require Docker-specific configuration, since those security features
   231     apply system-wide, independent of containers.
   232   - If your distribution comes with security model templates for
   233     Docker containers, you can use them out of the box. For instance, we
   234     ship a template that works with AppArmor and Red Hat comes with SELinux
   235     policies for Docker. These templates provide an extra safety net (even
   236     though it overlaps greatly with capabilities).
   237   - You can define your own policies using your favorite access control
   238     mechanism.
   239  
   240  Just like there are many third-party tools to augment Docker containers
   241  with e.g., special network topologies or shared filesystems, you can
   242  expect to see tools to harden existing Docker containers without
   243  affecting Docker's core.
   244  
   245  Recent improvements in Linux namespaces will soon allow to run
   246  full-featured containers without root privileges, thanks to the new user
   247  namespace. This is covered in detail [here](
   248  http://s3hh.wordpress.com/2013/07/19/creating-and-using-containers-without-privilege/).
   249  Moreover, this will solve the problem caused by sharing filesystems
   250  between host and guest, since the user namespace allows users within
   251  containers (including the root user) to be mapped to other users in the
   252  host system.
   253  
   254  Today, Docker does not directly support user namespaces, but they
   255  may still be utilized by Docker containers on supported kernels,
   256  by directly using the clone syscall, or utilizing the 'unshare'
   257  utility. Using this, some users may find it possible to drop
   258  more capabilities from their process as user namespaces provide
   259  an artificial capabilities set. Likewise, however, this artificial
   260  capabilities set may require use of 'capsh' to restrict the
   261  user-namespace capabilities set when using 'unshare'.
   262  
   263  Eventually, it is expected that Docker will have direct, native support
   264  for user-namespaces, simplifying the process of hardening containers.
   265  
   266  ## Conclusions
   267  
   268  Docker containers are, by default, quite secure; especially if you take
   269  care of running your processes inside the containers as non-privileged
   270  users (i.e., non-`root`).
   271  
   272  You can add an extra layer of safety by enabling AppArmor, SELinux,
   273  GRSEC, or your favorite hardening solution.
   274  
   275  Last but not least, if you see interesting security features in other
   276  containerization systems, these are simply kernels features that may
   277  be implemented in Docker as well. We welcome users to submit issues,
   278  pull requests, and communicate via the mailing list.
   279  
   280  References:
   281  * [Docker Containers: How Secure Are They? (2013)](
   282  http://blog.docker.com/2013/08/containers-docker-how-secure-are-they/).
   283  * [On the Security of Containers (2014)](https://medium.com/@ewindisch/on-the-security-of-containers-2c60ffe25a9e).